Statistical software for worried students: Appearances matter

Let’s be honest. Most students of statistics are taking statistics because they have to. I asked my class of 100 business students who choose to take the quantitative methods course if they did not have to. Two hands went up.

Face it – statistics is necessary but not often embraced.

But actually it is worse than that. For many people statistics is the most dreaded course they are required to take. It can be the barrier to achieving their career goals as a psychologist, marketer or physician. (And it should be required for many other careers, such as journalism, law and sports commentator.)

Consequently, we have worried students in our statistics courses. We want them to succeed, and to do that we need to reduce their worry. One decision that will affect their engagement and success is the choice of computer package. This decision rightly causes consternation to instructors. It is telling that one of the most frequently and consistently accessed posts on this blog is Excel, SPSS, Minitab or R. It has been viewed 55,000 times in the last five years.

The problem of which package to use is no easier to solve than it was five years ago when I wrote the post. I am helping a tertiary institution to re-develop their on-line course in statistics. This is really fun – applying all the great advice and ideas from ”

Guidelines for Assessment and Instruction in Statistics” or GAISE. They asked for advice on what statistics package to use. And I am torn.

Here is what I want from a statistical teaching package:

- Easy to use
- Attractive to look at (See “Appearances Matter” below)
- Helpful output
- Good instructional materials with videos etc (as this is an online course)
- Supports good pedagogy

If I’m honest I also want it to have the following characteristics:

- Guidance for students as to what is sensible
- Only the tests and options I want them to use in my course – not too many choices
- An interpretation of the output
- Data handling capabilities, including missing values
- A pop up saying “Are you sure you want to make a three dimensional pie-chart?”

Is this too much to ask?

Possibly.

Here is the thing. There are two objectives for introductory statistics courses that partly overlap and partly conflict. We want students to

- Learn what statistics is all about
- Learn how to do statistics.

They probably should not conflict, but they require different things from your software. If all we want the students to do is perform the statistical tests, then something like Excel is not a bad choice, as they get to learn Excel as well, which could be handy for c.v. expansion and job-getting. If we are more concerned about learning what statistics is all about, then an exploratory package like Tinkerplots or iNZight could be useful.

Ideally I would like students to learn both what statistics is all about and how to do it. But most of all, I want them to feel happy about doing statistical analysis.

Eye-appeal is important for overcoming fear. I am confident in mathematics, but a journal article with a page of Greek letters and mathematical symbols, makes me anxious. The Latex font makes me nervous. And an ugly logo puts me off a package. I know it is shallow. But it is a thing, and I suspect I am far from alone. Marketing people know that the choice of colour, word, placement – all sorts of superficial things effect whether a product sells. We need to sell our product, statistics, and to do that, it needs to be attractive. It may well be that the people who design software are less affected by appearance, but they are not the consumers.

This is important: Most of our students will never do another statistical analysis.

Think about it :

Most of our students will never do another statistical analysis.

Here are the implications: It is important for the students to learn what statistics is about, where it is needed, potential problems and good communication and critique of statistical results. It is not important for students to learn how to program or use a complex package.

Students need to experience statistical analysis, to understand the process. They may also discover the excitement of a new set of data to explore, and the anticipation of an interesting result. These students may decide to study more statistics, at which time they will need to learn to operate a more comprehensive package. They will also be motivated to do so because they have chosen to continue to learn statistics.

In my previous post I talked about Excel, SPSS, Minitab and R. I used to teach with Excel, and I know many of my past students have been grateful they learned it. But now I know better, and cannot, hand on heart recommend Excel as the main software. Students need to be able to play with the data, to look at various graphs, and get a feel for variation and structure. Excel’s graphing and data-handling capabilities, particularly with regard to missing values, are not helpful. The histograms are disastrous. Excel is useful for teaching students how to do statistics, but not what statistics is all about.

SPSS was a personal favourite, but it has been a while since I used it. It is fairly expensive, and chances are the students will never use it again. I’m not sure how well it does data exploration. Minitab is another nice little package. Both of these are probably overkill for an introductory statistics course.

R is a useful and versatile statistical language for higher level statistical analysis and learning but it is not suitable for worried students. It is unattractive.

R Commander is a graphical user interface for R. It is free, and potentially friendlier than R. It comes with a book. I am told it is a helpful introduction to R. R Commander is also unattractive. The book was formatted in Latex. The installation guide looks daunting. That is enough to make me reluctant – and I like statistics!

I have used iNZight a lot. It was developed at the University of Auckland for use in their statistics course and in New Zealand schools. The full version is free and can be installed on PC and Mac computers, though there may be issues with running it on a Mac. The iNZight lite, web-based version is fine. It is free and works on any platform. I really like how easy it is to generate various plots to explore the data. You put in the data, and the graphs appear almost instantly. IiNZIght encourages engagement with the data, rather than doing things to data.

For a face-to-face course I would choose iNZight Lite. For an online course I would be a little concerned about the level of support material available. The newer version of iNZight, and iNZight lite have benefitted from some graphic design input. I like the colours and the new logo.

I’ve heard about Genstat for some time, as an alternative to iNZight for New Zealand schools, particularly as it does bootstrapping. So I requested an inspection copy. It has a friendly vibe. I like the dialog box suggesting the graph you might like try. It lacks the immediacy of iNZight lite. It has the multiple window thing going on, which can be tricky to navigate. I was pleased at the number of sample data sets.

NZGrapher is popular in New Zealand schools. It was created by a high school teacher in his spare time, and is attractive and lean. It is free, funded by donations and advertisements. You enter a data set, and it creates a wide range of graphs. It does not have the traditional tests that you would want in an introductory statistics course, as it is aimed at the NZ school curriculum requirements.

Statcrunch is a more attractive, polished package, with a wide range of supporting materials. I think this would give confidence to the students. It is specifically designed for teaching and learning and is almost conversational in approach. I have not had the opportunity to try out Statcrunch. It looks inviting, and was created by Webster West, a respected statistics educator. It is now distributed by Pearson.

I recently had my attention drawn to this new package. It is free, well-supported and has a clean, attractive interface. It has a vibe similar to SPSS. I like the immediate response as you begin your analysis. Jasp is free, and I was able to download it easily. It is not as graphical as iNZight, but is more traditional in its approach. For a course emphasising doing statistics, I like the look of this.

So there you have it. I have mentioned only a few packages, but I hope my musings have got you thinking about what to look for in a package. If I were teaching an introductory statistics course, I would use iNZight Lite, Jasp, and possibly Excel. I would use iNZight Lite for data exploration. I might use Jasp for hypothesis tests, confidence intervals and model fitting. And if possible I would teach Pivot Tables in Excel, and use it for any probability calculations.

This is a very important topic and I would appreciate input. Have I missed an important contender? What do you look for in a statistical package for an introductory statistics course? As a student, how important is it to you for the software to be attractive?

## 23 Comments

[…] in April 2018: I have written a further post, covering other aspects and other […]

I teach a 1st year undergrad public health subject (epidemiology and biostatistics). We changed stats packages between 2014 (SPSS) and 2015 (Graph Pad Prism) for a couple of reasons.

1. Cost. As you said above, SPSS is expensive & the university licence only covers student lab computers and staff. Graph Pad Prism on the other hand, can be downloaded by students FREE onto their own device (laptop/computer – not tablet unfortunately).

2. Timetabling. Due to SPSS licencing, we had to book computer labs with the software installed for a few tutorials. Then students had to use the same computer labs in their own time to get the assignment done. Compare this to Graph Pad Prism – students have the software on their own device, so we could run all tutorials in the same rooms.

3. Other subjects. The 2nd & 3rd year units in the course used Graph Pad Prism for analysis of pracs etc.

Even though – as a biostatistican – Graph Pad Prism is more clunky than a “proper” stats package, we decided to use it. This meant that I also had to learn how to use it – and I made some screen capture videos to demonstrate to students how to do different common tests.

Penny

Hi Penny,

Thanks for that input. How did the students like Graph Pad Prism?

Hi,

I was going to ask if you had come across GraphPad Prism. I’m a statistical consultant, mainly working with biomedical scientists within my university. Although I get people coming to me with experience of various difference packages (eg SPSS), the main one they are using is Prism.

They find it intuitive and easy to use and it produces the kinds of plots they are used to seeing in published papers in their field. There are definitely some negatives. For example, plots are generally produced with mean and standard error. Standard error is not really a very useful descriptive statistic, and measures of spread such as standard deviation or 95% CI would be much better.

There are lots of options and they allow you to do quite a range of tests, making various adjustments, etc. The thing is, most of my clients automatically just use the defaults, which aren’t necessarily the most appropriate. It’s been created by people like them, so it’s user friendly, but doesn’t necessarily have all the statistical background necessary to easily produce the best outcome. I think that now, in the latest version, most things they are likely to require are there, they just may not know that’s what they require, or how to get to it. Perhaps if they were taught by statisticians, but using this software, it would be a good combination.

Some nice features are an “interpret” button, which takes you to an online checklist relating to the assumptions for the test you’ve done, eg this one for 2-way ANOVA: https://www.graphpad.com/guides/prism/7/statistics/index.htm?Stat_CheckList_2wayANOVA.htm . Also a “narrative output” which puts the results into sentences.

Thanks for this, which is bang on the topic I am working on for a slightly different purpose (which software to recommend a client organisation use for survey analysis). I came across https://www.jamovi.org/ which is built on top of R, looks stunningly beautiful, and aims to be easier to use for non-statisticians than SPSS. Unlike RCommander, it’s a real shift from the R philosophy (of combining small modular packages and functions) into something aimed at people who are doing a small amount of statistics not very often. Basically, it wants to do what SPSS does but open source and better and easier to expand if necessary.

Interested in if you’ve come across it? – it looks to be new but growing astonishingly fast, and already considered by the developers mature enough to replace SPSS (and I understand used at some universities).

Thanks Peter,

That is exactly the sort of comment that helps people to know about what is out there. I will take a look.

I find it bizarre that you mention “R Commander”, which I’ve never heard of. As far as I know, RStudio is currently the best known (and best) R GUI.

You are correct – which is why it is great you commented.

It has a slightly better appearance than R Commander, but I suspect still too daunting for worried students.

RStudio isn’t an R GUI but an Integrated Development Environment (IDE) to make it easier to code in R. RCommander is actually probably the best known GUI on top of R, an attempt to allow non-coders to use R via drop-down menus. So these are two completely different applications. In my other comment I refer to a new product Jamovi which is also a a GUI on top of R, but unlike RCommander is a clean start – rather than just giving menu equivalents for R functionality, it takes a user perspective (as its developers see it, similar to SPSS users) and the fact that R is underneath is not really the main point.

RStudio does not let you do analysis via drop-down menus, nor is that its aim or design philosophy. It is just a tool to help you write R code (more recently it is showing signs of becoming more than that, which I think is a Bad Idea, but that’s another topic).

I have used RCommander successfully as a transition to R, but have mixed views about whether it is necessary. It depends on the situation, but generally these days if the end goal is for people to use R I would hop straight in with R (and RStudio), and point them to http://r4ds.had.co.nz/ and/or Data Camp.

This doesn’t mean I think everyone needs to use R (although I think more could and should than currently do); I think Dr Nic’s point about worried students and people who are going to only rarely do any statistics is a good one.

I use R with Rstudio or Jupyter notebook. Either of them is a good IDE for R. I prefer them as they allow blending of code and text and you can output explanatory notes and figures in many different ways.

I’ve been using Excel all along, in spite of its ridiculous inability to make histograms. I use it because it all of the decades I worked in industry as a statistician and in quality management, I only once worked in a company that had widespread access to statistical software. However, Excel has been widely used at every desk by all employees. Better statistical functionality would be great, but most of my students won’t conduct another statistical analysis in their careers, or not soon enough to recall how to do it. I’m surprised that most of them never used Excel in high school or college until they took my statistics class. I’m encouraged that they will likely have access to Excel or something similar to it going forward and will then not be afraid to display the data they have for their own interpretation and for presentation to others. Maybe Excel will improve someday..

Yes it would be lovely if Excel did improve. And the Pivot Tables are great

I agree, pivot tables are one of the best features of Excel.

You don’t mention SAS! SAS UE is free, has an attractive interface (SAS studio) and has plenty of freely available video tutorials. Also supports Jupyter notebook. Can be tricky to install though.

A very good point: almost all the students doing such courses will never do another statistical analysis. But they will read the results of many studies and need to understand. The other problem implicit in this area is that you are trying to teach two very different activities at the same time: statistical thinking and a software tool. It’s too easy – almost inevitable – that students find “the medium is the message.” That is, they learn the practical tool and allow that to constrain their view of statistics. That’s why so many students still leave the course thinking that statistics relies upon choosing “the right test” for each data set.

Another implication to be challenged is that the software used in elementary teaching will be the tool that will be used in subsequent research, indeed throughout their career! Part of the continued controversy about P-values stems from very senior scientists who appear still to rely on just what they remember from a statistics course they took 30-40 years ago. Every other part of their science has changed massively, but they still cling to, eg, using just simple linear regression.

Another issue is cost. Many of the packages you list are very expensive, even as student versions when supplied in class numbers. I have many times heard the argument, “I use R because it’s free.” Unfortunately, R is a programming language. While it is useful for an aware statistician with programming skills, I suspect in many cases it gives “wrong” results when users are not familiar with data manipulation features and pitfalls of analysis. By analogy, I used to meet Fortran users who wrote the most complex programs requiring very long run-times, and these were just the types who would switch off overflow-checking, etc, to make the code run faster.

In respect of individual software: I’ll agree with your dread of Excel. Students arrive, thinking that they know how to use Excel, and they like it because “it gives answers not lots of error messages.” Unfortunately, it’s just not safe and the answers are often wrong. The graphics are superficially pretty but easily misleading: one feature being that it (or the user) often confuses values and labels. Minitab has good features for teaching, but severe limitations as a research tool. You haven’t mentioned Stata, which is (like SPSS) expensive for bulk use but has menus, a consistent command language, and extensive online help.

When I learned computing, we covered (as I recall) five languages in depth within a year. This was explicitly to exclude any mindset that “to a person with a hammer, every fixing looks like a nail.”

We need to move from the idea that students should leave the stats course knowing “how to carry out a t-test and decide whether to use one-sided or two-sided P”, and formulate objectives that they leave with a view that data analysis is an iterative process of investigation for which they should look for appropriate tools (or collaborators) when the need arises.

I too was going to suggest GraphPad’s Prism as an option. Another one to look into is PSPP, the free open source version of SPSS. It seems to be very close to SPSS base and I would argue that it’s main limitation is that it can’t do higher-level stats that real data analysts need, but it might be a good solution for intro classes. The skills translate to SPSS and it doesn’t have all the options.

I actually wonder if there is a good solution to this question or what the goal really should be. I have taught intro stats with an SPSS lab and it was very challenging to teach both the statistical concepts and the software at the same time. I couldn’t imagine using R unless the students were already knew other programming.

It seems perhaps a separate Statistical Computing class with multiple software would be a way to teach those who will go on to do data analysis without pulling intro stats down into the mires of computing.

Through my undergrad and grad school career I used 7 or 8 (if you count Matlab) different stat software packages in different classes. In many ways this was very helpful to me. I had something to fall back on when some of them bit the dust (RIP BMDP). And as Allan points out, it became easier to separate out the statistical issues from the software issues.

I now do a lot of statistical training for researchers and I always recommend serious data analysts know at least two stats software for a lot of reasons.

My son is currently in a high school AP stats class and they are doing everything on TI84 calculators. I was shocked when I heard this but I guess they can’t have computers on the AP exam and many high school kids don’t have computers. Fair enough. But I can’t help thinking they’re missing out on a key skill–learning to use ANY stat software. Then again, maybe they will understand the concepts better?

Hi Karen,

Thank you for your thoughts. I suspect the reason AP stats is still done on TI computers is that TI have a vested interest. I would be happy to be proved wrong. There is no reason to ever use a calculator to do statistics. I believe a far more sensible solution is to provide computer output for interpretation. Tying AP stats to a calculator is increasingly indefensible.

I also learned several stats software packages, and each one gets a bit easier. They need to be the tool, not the endpoint for all but a few career statisticians.

Sadly, I think the real reason they still use calculators on the AP exam is not a pedagogical one but a budgetary and socioeconomic one. Schools can afford to loan out calculators but not computers. Though TI having a vested interest would also be a sad reason.

I think it’s because many high schools in the US at least have very few computers available and many kids don’t have one at home.

I’m not sure if having computer output on the exam would work or be fair if a large proportion of the the kids taking the exam don’t have access to a computer while they’re taking the class.

Even at well-funded school districts around here, the computers kids are assigned by the district are either IPads or Google Chromebooks. My son has a Chromebook and it has no hard drive to install any software. So everything has to be web-based. I don’t know if any of the statistical software options are entirely web-based.

It adds an extra layer to the debate on which software would be best for a class.

Good point. It seems that there should be the will and the money to produce a good free web-based statistical software option that students could use on any device. Inputting data can be a pain, but that could be done by the teacher.

I guess the US is such a big and diverse place that it is harder to find an equitable solution. New Zealand has only the population of a median state – Colorado, last time l looked. This makes it easier for us to change things – hopefully for the best. Our high school statistics assessments are all internal using computers, or use computer printout. Universities vary. I used computer printouts in my university exams over 20 years ago.

I am a professor of statistics at California State University Fullerton and have been teaching statistics at various levels for the past 25 years. In the past five years, I have gotten involved in developing software for teaching statistics. My main goal for this project has been to implement my wishes in a statistical software for teaching and also to develop software that conforms to GAISE. The result of this effort has been the software Rguroo. Rguroo is a web-application software that runs on any web-browser, and while it is point-and-click, its computing engine is R (hence the name). We have been piloting Rguroo with about 1000 students per semester in the past three years, as we have been developing it. We have had received great responses from students and instructors alike. We have recently opened Rguroo publically. You can get your account at https://www.rguroo.com/ and try it. The software has many great features, and in about ten days from this writing, a new version with even more features will be deployed. I’d be happy to share more details, or write more about it, should you be interested. Dr. Nic, Rguroo has many of your desired features. Should anyone be interested, please drop me an email at mori@fullerton.edu, and I’d be happy to arrange a demo.

Excel can be a very good environment for some kinds of statistical analysis, but it needs help from third parties. I’m the developer of RegressIt, a free Excel add-in for linear and logistic regression and descriptive statistics, whose web site is https://regressit.com. It meets nearly all of the requirements on your wish list of features. It was developed over the last 10 years for use in teaching an advanced course on regression and time series analysis to MBA’s and engineering students at Duke University, but it’s also intended for wider use in teaching and applications, and it is in use at a number of other schools. I’m giving it away for free as a public service. If I do say so, it is one of the best tools you can get for those analytical methods, and certainly the best Excel add-in for them. It was designed to address all the otherwise-well-deserved complaints that have been leveled at Excel as an environment in which to carry out data analysis, at least with respect to multivariate descriptive statistics and regression models, which are the most widely used tools in statistics. (My own rant against Excel’s native tools is here: https://regressit.com/analysis-toolpak-problems.html.) In addition to what is on the web site site, you can get a description of its features in this recent blog post of mine at DataScienceCentral: https://www.datasciencecentral.com/profiles/blogs/linear-and-logistic-regression-in-excel-and-r-try-this-free-add. Besides its capabilities to carry out high-quality analysis in Excel, with very attractive and easily navigable output, it includes many tools to help students and instructors, such as teaching notes that can be embedded directly in the output worksheets and detailed audit trail information to authenticate individual work. And… it has an interface with R that allows an analysis in R to be driven from a menu interface in Excel, thus bridging the gap between Excel and R as tools for statistical analysis and making R’s regression analysis tools available to non-programmers. The RegressIt web site also includes some extensive tutorials for linear and logistic regression and sample data sets with analysis, and more teaching materials will be added throughout the coming year. I encourage you and your readers to give it a test drive, and your feedback is welcome.

I teach introductory stats at the community college level. You are correct, most students will never take another stats class. My real goals are to give them an overview of what statistical analysis can do and to make them understand why they should be skeptical about data that comes their way. Not entirely disbelieving of all data but asking the right questions.

I use Excel because I spent 30+ years as an industrial statistician, and most of those years I only had access to Excel. Sometimes SAS or JMP or RS1, but every workplace laptop has a spreadsheet. They may as well get used to using them and learning to do basic calculations and charts. The main drawback for me is the histograms, which suck, but it isn’t so hard to get around that to a histogram that is adequate. It is the pervasiveness of Excel that makes it my weapon of choice.

HI Judy

I totally agree. The Excel Histograms have got better, and now there is even a boxplot option in Excel. I like to use iNZight Lite which is a free online app for data exploration as well.