Sometimes statisticians just have to let go, and accept that some statistical analysis will be done in less than ideal conditions, with fairly dodgy data and more than a few violated assumptions. Sometimes the wrong graph will be used. Sometimes people will claim causation from association. Just as sometimes people put apostrophes where they should not and misuse the word “comprise”.

When we are teaching, particularly non-majors, we need to think hard about where we sit on the fastidiousness scale. (In my experience just about all statistics teaching is to non-majors, which may say something about the attitudes of people to statistics.)

The fastidiousness scale is best described by its two extremes. At one extreme statistical analysis is performed only by mathematical statisticians, using tools like SAS and R, but only if they know exactly how each formula works (and have preferably proved them as well) and have done small examples by hand. All data is perfectly random, unbiased and representative. We could call this end protectionism.

At the other end of the fastidiousness scale just about anyone can do statistical analysis, using Excel. They accept that the formulas do what the instructor tells them they do. It is a black box approach. The data goes into the black box, and the results come out. Any graph is better than no graph. Any data is better than no data. THis end is probably best labelled “cavalier”.

Some instructors teach as if the mathematical extreme were the ideal and they reluctantly allow people to do really basic summary statistics so long as the data is random, with a large sample size. They fill their teaching with warnings, and include the correction for small population in their early lectures. This protectionism could be construed as professional snobbery. This is evident in attitudes to the use of Excel for statistical analysis. I accept that the data analysis toolpak in Excel leaves a lot to be desired. (see post about Excel and post about Excel histograms) But at the same time, lots of people have access to Excel and are at home using it. When Excel is used to introduce the statistical concepts it is building on current skills, and empowering people.

Protectionism has the advantage that no bad statistical analysis is ever done. Any results that are published are properly explained, and are totally sound with regard to sample size and sampling method, choice of variable, choice of analysis, interpretation and data display. One concern is that the mathematical focus may mean that the practical aspects are neglected.

I do not recommend the cavalier end of the fastidiousness scale either. But somewhere in that direction lies empowerment. The advantages of empowerment are legion! Even if people do bad statistical analysis it is better than none at all. Taking a sample and drawing conclusions from it is better than not taking a sample. As people are empowered to do and understand statistics, they may better understand statistical ideas when they are presented to them in other contexts.

Some years ago my sister asked me to be a keynote speaker at a hand-therapy conference. At the time I had mainly taught Operations Research and some regression analysis. But it included a free trip to Queenstown away from my children, so how could I resist? I was to do a one-hour plenary session on statistics and an elective workshop on quantitative research methods. It was scheduled first thing in the morning after the “dinner” the night before. Attendance at my session was compulsory if they were to get credit towards their professional accreditation. I did wonder if my sister actually liked me! My audience was over a hundred physiotherapists and occupational therapists who specialise in the treatment of hands, from Australia and New Zealand. They are all clever people, who generally had little knowledge of statistics. I assumed, correctly, that most of them were nervous of statistics. They had been taught by protectionists, and felt afraid, like over-protected children.

I decided to take an approach of empowerment – that all statistics boiled down to a few main ideas and that if they could understand those, they would be able to read academic reports on statistical analysis critically and, with help, do their own research. I taught about levels of data, the concept of sampling, and the meaning of the p-value. I used examples about hands. And I took an enabling, encouraging approach without being patronising.

It worked. The attendees felt empowered, and a large number came to my follow-up workshop. I don’t know if any of them went on to apply much of what I taught them but I do know that a lot of them changed their attitude to statistical analysis.

Sometimes we forget that we are teaching attitudes, skills and knowledge- in that order of importance. If our students finish our course feeling that statistics is interesting, possible and relevant, then we have accomplished a great thing. People will forget skills and knowledge, but attitudes stick. If the students know that at one point they knew how to perform a comparison of two means, and that it wasn’t that difficult, if the time comes again, they are more likely to work out how to do it again. They have been empowered!

Imagine if only people who can spell well and write with correct grammar were allowed to write, if only the best chefs could cook and the rest of us would just watch in awe, if only professional musicians were allowed to play instruments, if only professional sport people were allowed to participate. Just as amateur writers, musicians, sportspeople and chefs have a better appreciation of the true nature of the endeavour, empowered amateur statisticians are in a better position to appreciate the worth and importance of rigorous, fastidious statistical analysis.

Let us cast off the shackles of protectionism and start empowering. Or at least move a little way down the fastidiousness scale when teaching non-majors.

## 4 Comments

“Even if people do bad statistical analysis it is better than none at all.”

I think this needs some qualification. Bad statistical analysis can be much worse than none at all, if it leads people to put too much faith in unsound/erroneous conclusions.

When I was young and naive, I worked with a bunch of experimenters who were concerned that a new measurement methodology must be flawed because it wasn’t consistent with an established value: Method A had given y=60 at x=0, Method B was giving y=45 at x=0. (Details slightly altered to protect the guilty and because I can’t remember the exact values anyway.)

We spent weeks second-guessing Method B before I stumbled across some more detail on Method A. It turned out they had never actually been able to measure y at x=0. Instead, they’d taken several measurements between x=10 and x=20. Then they’d stuffed the data into Excel, run a second-order quadratic fit, and extrapolated out to x=0, taking the resulting value of y=60 as gospel.

For me the lesson here was that statistical conclusions shouldn’t be divorced from their quality info. The value of y=60 was probably good enough for the original application. But if I’d known it was based on a lot of extrapolation from some noisy data, I would have reacted quite differently to the “discrepancy” we encountered later.

Unfortunately, the Dunning-Kruger effect seems to mean that people who do bad statistical analysis are bad at recognising its limitations (especially when it’s telling them what they want to hear!) so dubious conclusions are given far more weight than they deserve.

For a much worse example, see: http://en.wikipedia.org/wiki/Sally_Clark

Firstly, thanks for sharing your ideas about teaching stats; I’m ambivalent about some of them. Sure, I don’t want to be pedantic to the extreme and I want to empower users; however, there are risks to account for which I’m not sure you are considering.

“Even if people do bad statistical analysis it is better than none at all”. Well, some times a “we are unclear about the treatments” is better to continuously recommend the wrong one, because we are reducing uncertainty (which implies “be careful, keep an eye on what you are doing”) to increasing confidence in the wrong approach/prescription. I’ve seen this problem many times because people forgot (or couldn’t) fit more covariates.

In some respects I like Excel, because it allows for quick data manipulation and some degree of EDA. Nevertheless, Excel defines the subject in such a limiting way that makes any realistic analysis (beyond a single predictor) very difficult. Perhaps using Excel together with a statistical add-in in a regular basis could make more sense to me.

Furthermore, emphasizing concepts rather than specific software is also very important. People may struggle to do some analyses, but if they do not what the results mean, then we have failed.

Finally, the results of me doing an average job at cooking or badly playing the ukulele have no effect outside my immediate family. The results of screwing up analyses—most likely done for decision making in a commercial setting—may have a much larger impact.

Reblogged this on DemograFUN makes you feel StatisFUN and commented:

empowerment, of course

[…] But the ship of statistical protectionism has sailed, and it is up to statisticians and statistical educators to do our best to teach statistics in such a way that each student can understand and apply their knowledge confidently, correctly and appropriately. […]