Deterministic and Probabilistic models and thinking
16 December 2013
The Myth of Random Sampling
20 January 2014

Language is an issue in teaching and learning statistics. There are many words that have meanings in statistics, different from their everyday meaning, and even with multiple meanings within the study of statistics. Examples of troublesome words are: error, correlation, regression, significant, model. I wrote about addressing this in Teaching Statistical Language.
But the problem starts even with the name of the subject. There are at least three meanings for the term “statistics”. The word is not even consistently singular or plural. I suggest three meanings are: Data (plural), analysis (singular) and information (plural). What we teach focusses on the analysis, but involves data and information.

Statistics as Data

Sports people love statistics. Game shows and pub quizzes draw on data such as numbers of Olympic medals, wives, years of warfare, Oscars and a myriad other subjects. These statistics can be fascinating, relevant, boring or trivial. My most read blog post is entitled “Khan Academy Statistics videos are not good”. I suspect that quite a few people are searching for statistics about Khan Academy, rather than the subject of my post. This is borne out by the fact that a more recent post:  “Open Letter to Khan Academy about Basic Probability” gets considerably less traffic. I suppose there are not many people who want to know about the probability of Khan Academy. Pity – as the second post is better.
There is an entire discipline around “Official Statistics”. At a recent conference (ORSNZ/NZSA) I was fascinated by a presentation given about the need for statistics in a time of disaster and recovery. John Créquer talked about a subject close to my heart, the Christchurch earthquakes. In the weeks and months of the earthquakes authorities needed information of how many people there were of high need, in order to provide adequate service. Finding these numbers was an exercise in ingenuity and co-operation, drawing on data collected for other purposes. The presenter suggested that at times like that a national register would be invaluable. New Zealand does not have such a thing. It is an interesting conflict between the need for privacy and the public good. Créquer is a statistician from Statistics New Zealand, who has been contracted to CERA (The Canterbury Earthquake Recovery Authority) for now.  I had never thought that a statistician had uniquely valuable skills and insights to be used in a time of recovery from disaster.
The internet is an amazing source of the data kind of statistics. You can find out the number of an awful lot of things, simply by putting the question in a search box, or looking on Wikipedia. (I’ve made my annual monetary contribution – have you?). Thanks to Wikipedia, we don’t need to wonder about trivial things anywhere near as much as we used to.

Statistics as Analysis

Statistics, as it is taught and learned as a subject, mostly refers to statistical analysis and the inquiry process in which it is embedded. I sometimes wonder what people are thinking when I say that I produce materials to help people learn statistics. Do they imagine a classful of students memorising the populations of countries and batting averages?
“It is easy to lie with statistics. It is hard to tell the truth without it.”
This quote is from Andrejs Dunkels, a person whom I wish I had met. When I was looking for the source of this quote, I found a tribute page to a man who contributed greatly to the world of statistics. His quote uses statistics as a singular noun.
The analysis aspect of statistics involves taking raw data and turning it into information and evidence of what may be truth. Science would not progress far without the tools of statistics to take the raw results of experiments and observations, and using the insights gained by the mathematical world of probability, discern their significance. Without the discoveries and tools of statistics we would not be able to make sensible inference about populations from samples and experiments.
Statistical analysis uses mathematical tools, but is far more than just the mathematics. It is easy to produce wrong information by using the mechanistic calculations without thinking critically about the results. I once produced some very wrong models of performance of bank branches, using multiple regression. I even came up with some interesting rationalisations for the counter-intuitive results. Then I did a residual plot and found one outlier that changed everything! Once I removed it, the models changed to the extent that some of the coefficients changed sign. I wonder how many wrong models persist because of well-intentioned, but unskilled analysts.
There is a wonderful paragraph I used to quote in my second year statistical methods class, that unfortunately I can’t find – even using Wikipedia. It says, in essence: Statistical models are not sausage machines, taking in data and turning it into information without the interference of a human. If the results do not make sense and align with common understanding of the phenomenon, they are probably wrong.
If someone can direct me to the actual quote, I’d be very happy. I used to get the class to recite it in unison.
The point I am making is that the second meaning of statistics is a combination of science and art. It needs people.

Statistics as Information

This is similar to the first meaning, but I think that processed data should have a home separate from raw data. Statistical results include relationships and differences, not just “the facts.” I would put graphs and tables into this category. I think this category is scarier than statistics as data. Everyone can understand that Henry the Eight had six wives, and New Zealand won six gold medals at the London Olympics. Those are non-scary statistics, and easily accessible. They are statistics as data or facts.
What is more daunting to many people is the results of analysis. This is where we try to explain the population effect of cancer screening, the significance (statistical) of an increase or decrease in birthrate, the effect of seasonality on the sales of jewellery in the USA, the evidence that increasing numbers of cows are causing a degradation of water quality in natural water sources. These statistics need to be well presented. Part of our role as teachers is to help future producers of such information to be able to express themselves well so these statistics are accessible. Another part of our role is help future consumers of statistics to understand them.
Our role is important – for all three types of statistics.

Leave a Reply

Your email address will not be published. Required fields are marked *