The question of whether to teach explicitly the Central Limit Theorem seems to divide instructors along philosophical lines. Let us look first at these lines.
There are at least three different areas of activity within the discipline of statistics. These are
The theory of statistics is mathematical. It is taught and practised in Mathematics and Statistics Departments of Universities. It is possible to be an expert on the theory and mathematics of statistics while having little contact with real data. The theory provides underpinnings to the practice of statistics. It is vital that some people know this – but not most of us. One would hope that people employed as statisticians would have a sound understanding of both the theoretical and applied aspects of statistics. This relates strongly to the research into statistics, which seems to be very mathematical, from my perusal of journals. This research advances the theory and use of statistical methods and philosophy.
The practice of statistics occurs in many, many areas, particularly in universities. Most postgraduate courses require some proficiency in the application of statistical methods. Researchers in areas as diverse as psychology, genetics, market research, education, geography, speech therapy, physiotherapy, mechanics, management, economics and medicine all use statistical methods. Some researchers have a deep understanding of the theory of statistics, but most aim to be safe and competent practitioners. When they get to the tricky bits they know to ask a statistician, but most of the day-to-day data generation, collection and analysis is within their capability.
Then there is the teaching of statistics. The level of applicability and theory taught will depend on the context. An instructor in statistics (in a non-service course) in a Department of Mathematics would tend towards the mathematical aspects, as that is most appropriate to the audience. However in just about every other setting the emphasis will be on the practical aspects of data collection and inference. This treatment of statistics is explicable, accessible and interesting to just about anyone, whereas only the mathematically inclined are likely to get excited about the theory of statistics.
There is another growing area, which is the research into the teaching and learning of statistics. This informs and is informed by the other areas, as well as general educational research and cognitive psychology. Much of my thinking comes from this background. An overview of some of the material relating to college level can be found in this literature review. The general topic of How Students Learn Statistics is introduced in this early paper by Joan Garfield (1995), a leader in the field of statistics education research.
Statistics is gradually making its way into the school curriculum internationally, and in New Zealand has become a separate subject in the final year of schooling. There are philosophical issues arising as most of the teachers of statistics are mathematicians, and some tend towards the beauty and elegance of the formulas, proofs etc. The aim of the curriculum, however, is more towards statistical investigations and statistical literacy. There are fuzzy, dirty, ambiguous, context driven explorations with sometimes extensive write-ups. There is discussion and critique of statistical reports. There are experiments which may or may not produce usable results. Some of this is well into the realms of social science and well away from what mathematicians find appealing or even comfortable. In another life I can hear myself saying, “I didn’t become a maths teacher to mark essay questions!” There is a bit of a mismatch between the skill-set and attitudes of the teachers and the curriculum.
One place where this is particularly evident is in the question of teaching the Central Limit Theorem. Mathematicians like the Central Limit Theorem and it seems that they like to teach it. One teacher states “The fact that the CLT is to be de-emphasised in Yr 13 is a major disappointment to me…” This statement prompted this post. I agree that the CLT is neat. It is really handy. And it makes confidence interval calculation almost trivial. There are cool little exercises you can do to illustrate it. It is the backbone of traditional statistical theory.
However, teaching and learning do not always go hand in hand. I wonder how many students really do internalise the Central Limit Theorem. Evidence says not many. Chance, Delmas and Garfield, in “The challenge of developing statistical literacy reasoning and thinking” (Ben Zvi and Garfield 2004) state: “Sampling distributions is a difficult topic for students to learn. A complete understanding of sampling distributions requires students to integrate and apply several concepts from different parts of a statistics course and to be able to reason about the hypothetical behavior of many samples – a distinct, intangible thought process for most students. The Central Limit Theorem provides a theoretical model of the behavior of sampling distributions, but students often have difficulty mapping this model to applied contexts. As a result students fail to develop a deep understanding of the concept of sampling distributions and therefore often develop only a mechanical knowledge of statistical inference. Students may learn how to compute confidence intervals and carry out tests of significance, but the are not able to understand and explain related concepts, such as interpreting a p-value.”
I have a confession to make. I didn’t teach the Central Limit Theorem. It never seemed as if it were going to help my students understand what was going on. For a few years I made them do a little simulation exercise which helped them to see why the square-root of n occurred in the denominator of the formula for the standard error. That was fun and seemed to help. But the words “Central Limit Theorem” seldom passed my lips in my twenty years of instruction.
What has helped immeasurably have been videos, beginning with “Understanding the p-value” and plenty of different examples and exercises using confidence intervals and hypothesis tests. (Another confession – I taught traditional statistical inference, not resampling. My excuse was that I didn’t know any better, and I had to stay in parallel with the course provided by the maths department.) What I have found from my own experience as a learner and as a teacher is that students learn to understand statistics by DOING statistics.
The Central Limit Theorem states that regardless of the shape of the population distribution, the distribution of sample means is normal if the sample size is large. This was a really brilliant model for when simulation and resampling was impossible. The Central Limit Theorem makes it possible to calculate confidence intervals for population means from sample data. It is the reason why most statistical procedures either assume normality at some point, or take steps to correct for the lack thereof. (See the paper by Cobb I referred to extensively in last week’s post.)
In a curriculum that develops from informal inference to formal inference using resampling, there is no need to call on the Central Limit Theorem. With resampling we use the distribution of the sample as the best estimate of the distribution of the population. True, it is quicker to use the old method of plug the values in the formula. However it isn’t much quicker than using the free iNZight software for resampling.
At high school level we want students to get an understanding of what inference is. (I would suggest my Pinkie Bar lesson as a good way of introducing the rejection part of Cobbs mantra, Randomise, Repeat, Reject.) I’m not convinced that teaching the Central Limit Theorem, and formula-based Confidence intervals for means and proportions lead to understanding. Research suggests that it doesn’t. I agree that statistical theorists, and educators and researchers should all understand the Central Limit Theorem. I just don’t think that it has a vital place in an innovative curriculum based on resampling.
I suspect that teachers fear that if their students are not taught the Central Limit Theorem and traditional confidence intervals at high school they will be at a disadvantage at university. I’d like to reassure them that it just isn’t true. All first year university statistics courses that I know of assume no prior knowledge of statistics. (The same is true of some second year courses as well!) The greatest gift a high school statistics teacher can give their students is an attitude of excitement and success, with a healthy helping of scepticism, and an idea of what inference is – that we can draw conclusions about a population from a sample. If my first year students had started from that point, half our work would have been done.