To quote Willy Wonka, “A little magic now and then is relished by the best of men [and women].” Any frequent reader of this blog will know that I am of a pragmatic nature when it comes to using statistics. For most people the Central Limit Theorem can remain in the realms of magic. I have never taught it, though at times I have waved my hands past it.
Students who want that sort of thing can read about it in their textbooks or look it up online. The New Zealand school curriculum does not include it, as I explained in 2012.
But – there are many curricula and introductory statistics courses that include The Central Limit Theorem, so I have chosen to blog about it, in preparation to making a video. In this post I will cover what the Central Limit does. Maybe my approach will give ideas to teachers on how they might teach it.
Added 17 July 2018: Here is the video I was working on when I wrote the post:
First let me explain what a sampling distribution is. (And let me add the term to Dr Nic’s long list of statistics terms that cause unnecessary confusion.) A sampling distribution of a mean is the distribution of the means of samples of the same size taken from the same population. The distribution of the means will be different from the distribution of values in the original population. The Central Limit Theorem tells us useful things about the sampling distribution and its relationship to the distribution of the values in the population.
We have a population of 720 dragons, and each dragon has a strength value of 1 to 8. The distribution of the strengths goes from 1 to 8 and has a population mean somewhere around 4.5. We take a sample of four dragons from the population. (Dragons are difficult to catch and measure so it will just be 4.)
We find the mean. Then we think about what other values we might have got for samples that size. In real life, that is all we can do. But to understand what is happening, we will take multiple samples using cards, and then a spreadsheet, to explore what happens.
Aspect 1: The sampling distribution will be less spread than the population from which it is drawn.
Dragon example
What do you think is the largest value the mean strength of the four dragons will take? Theoretically you could have a sample of four dragons, each with strength of 8, giving us a sample mean of 8. But it isn’t very likely. The chances that all four values are greater than the mean are pretty small. (It’s about a 6% chance). If there are equal numbers of dragons with each strength value, then the probability of getting all four dragons with strength 8 is 0.0002.
So already we have worked out that the distribution of the sample means is going to be less spread than the distribution of the original population.
Aspect 2: The sampling distribution will be well-modelled by a normal distribution.
Now isn’t that amazing – and really useful! And even more amazing, it doesn’t even matter what the underlying population distribution is, the sampling distribution will still (in most cases) look like a normal distribution.
If you think about it, it does make sense. I like to see practical examples – so here is one!
Dragon example
We worked out that it was really unlikely to get a sample of four dragons with a mean strength of 8. Similarly it is really unlikely to get a sample of four dragons with a mean strength of 1.
Say we assumed that the strength of dragons was uniform – there are equal numbers of dragons with each of the strengths. Then we find out all the possible combinations of strengths from samples of 4 dragons. Bearing in mind there are eight different strengths, that gives us 8 to the power of 4 or 4096 possible combinations. We can use a spreadsheet to enumerate all these equally likely combinations. Then we find the mean strength and we get this distribution.
Or we could take some samples of four dragons and see what happens. We can do this with our cards, or with a handy spreadsheet, and here is what we get.
The sample mean values are 4.25, 5.25, 4.75 and 6. Even with really small samples we can see that the values of the means are clustering around some central point.
Here is what the means of 1000 samples of size 4 look like:
And hey presto – it resembles a normal distribution! By that I mean that the distribution is symmetric, with a bulge in the middle and tails in either direction. A normal distribution is useful for modelling just about anything that is the result of a large number of change effects.
The bigger the sample size and the more samples we take, the more the distribution of the means (the sampling distribution) looks like a normal distribution. The Central Limit Theorem gives mathematical explanation for this. I put this in the “magic” category unless you are planning to become a theoretical statistician.
Aspect 3: The spread of the sampling distribution is related to the spread of the population.
If you think about it, this also makes sense. If there is very little variation in the population, then the sample means will all be about the same. On the other hand, if the population is really spread out, then the sample means will be more spread out too.
Dragon example
Say the strengths of the dragons occur equally from 1 to 5 instead of from 1 to 8. The spread of the means of teams of four dragons are going to go from 1 to 5 also, though most of the values will be near the middle.
Aspect 4: Bigger samples lead to a smaller spread in the sampling distribution.
As we increase the size of the sample, the means become less varied. We reduce the effect of one extreme value. Similarly the chance of getting all high values in our sample or all low values gets smaller and smaller. Consequently the spread of the sample means will decrease. However, the reduction is not linear. By that I mean that the effect achieved by adding one more to the sample decreases, depending on how big the sample is in the first place. Say you have a sample of size n = 4, and you increase it to n = 5, that is a 25% increase in information. If you have a sample n = 100 and increase it to size n=101, that is only a 1% increase in information.
Now here is the coolest thing! The spread of the sampling distribution is the standard deviation of the population, divided by the square root of the sample size. As we do not know the standard deviation of the population (σ), we use the standard deviation of the sample (s) to approximate it. The spread of the sampling distribution is usually called the standard error, or s.e.
The properties listed above underpin most traditional statistical inference. When we find a confidence interval of a mean, we use the standard error in the formula. If we used the sample standard deviation we would be finding the values between which most of the values in the sample lie. By using the standard error, we are finding the values between which most of the sample means lie.
The Central Limit Theorem applies best with large samples. A rule of thumb is that the sample should be 30 or more. For smaller samples we need to use the t distribution rather than the normal distribution in our testing or confidence intervals. If the sample is very small, such as less than 15, then we can still use the t-distribution if the underlying population has a normal shape. If the underlying population is not normal, and the sample is small, then other methods, such as resampling should be used, as the Central Limit Theorem does not hold.
We do not take multiple samples of the same population in real life. This simulation is just that – a pretend example to show how the Central Limit Theorem plays out. When we undergo inferential statistics we have one sample, and from that we use what we know about it to make inferences about the population from which it is drawn.
Data cards are extremely useful tools to help understand sampling and other aspects of inference. I would suggest getting the class to take multiple small samples(n=4), using cards, and finding the means. Plot the means. Then take larger samples (n=9) and similarly plot the means. Compare the shape and spread of the distributions of the means.
The Dragonistics data cards used in this post can be purchased at The StatsLC shop.
14 Comments
When I was a very young statistician in grad school, I couldn’t imagine a scenario in which the CLT was more than a neat way to say that sample means are normally distributed so you can treat the mean of one sample as if it came from a normal sampling distribution. A real life example never raised its head until I started working in factories. The application of statistical process control (SPC), which is common in very many factories of all kinds, is a real example of when we take repeated small samples from the same population over and over. Often both a mean and range are calculated from measurements made on a small sample of parts pulled by an inspector from a manufacturing line. The mean and range are plotted on “control charts” – one for the means, and one for the ranges. Limit lines on the chart based on 1, 2, and 3 standard errors away from the overall process mean tell the inspector whether the newly calculated statistics are reasonable for the process or are evidence of the process going out of control. If an out of control situation is detected, then action is taken to correct the process and sometimes some amount of parts must be quarantined and sorted. This is real life use of the Central Limit Theorem, used every day in factories all over the world. For me, 30+ years ago, it was quite an eye-opener!
That is cool! As I was writing this post and the script for the video, I used Excel to enumerate possible outcomes and it really made it so obvious that it was approaching a normal distribution. I’ve never been one to be convinced by formulas and mathematical proofs, so concrete examples like that are very powerful for me.
I agree, a real demonstration of sample means following a Normal distribution is priceless compared to just telling students that this is what they do. With the dice you go from a Uniform distribution of raw data to a Normal distribution of means, so very striking.
I love your dragon exercise, by the way. When I teach the Central Limit Theorem, I often do a similar exercise with 6-sided dice.
Thanks. We have been amazed at the range of things we can teach with the dragons.
[…] New Truths That Only One Can See. Putting a Value to ‘Real’ in Medical Research. Still Not Significant. Significantly misleading. Scientific method: Statistical errors. Worry about correctness and repeatability, not p-values. Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing P-Values. P-hacker Shiny App. Why Not to Trust Statistics. Our nine-point guide to spotting a dodgy statistic. The Central Limit Theorem – with Dragons. […]
Nice page Dr Nic. I am always surprised at how many people (even in statistical roles in different fields) misunderstand the power and implications of this wonderful theorem. They worry excessively about non-normal data thinking that you can’t do inferences on the mean. People often unnecessarily transform their data to force their data to be normal when they don’t have to, depending on the tests of course. etc. Nice examples with dragons
Hi Duncan
Thanks for that. Another place where people think things have to be normally distributed is in regression where they think that the predictor variables need to have normalish data. Clearly that can’t be the case as you can have dummy variables!
Our dragons are proving very versatile.
Exactly. I was just about to post a followup about the same issue with understanding residuals. Its a flaw in much teaching where the emphasis is on the distribution of the response rather than the error
Dear Dr. Nic,
A nice illustration of the CLT indeed. Dragons rule. Incidentally you may have gotten a whole new bunch of fans thanks to the series Game of Thrones :).
I do teach the CLT in statistics classes, in a similar way with simulations. I find the following website quite useful in that respect:
http://mfviz.com/central-limit/
But I guess to really now if it has been useful I should ask the students.
On another note: have you heard about this impossible problem:
If a ship had 26 sheep and 10 goats onboard, how old is the ship’s captain?
It’s causing a bit of a stir because the solutions that many people come up with highlights what you have been saying in your blog: math teaching is too much “solution recipes focussed”
https://mindyourdecisions.com/blog/2018/02/08/the-real-answer-to-the-viral-chinese-math-problem-how-old-is-the-captain-stumping-the-internet/
Thanks for your great blog. Keep ‘m coming!
Marten – thanks so much, I had not seen that CLT simulator before. Very cool.
Hi, I like this approach.
Conceptual confusion around the Central Limit Theorem seems to be baked into modern statistic’s dual use of the Normal distribution both for characterising error due to sampling, and fundamental variability in a population. Binomial examples like these, where any observed value must take an integer value, definitely help. Some of the historic astronomical examples from the 18th/19th centuries, involving measuring the distance between two points, could help further still, as in such cases all variation in observed values has to be due to error alone. Ensuring that students see standard deviations and standard errors as fundamentally different, rather than the latter simply a modification of the former, seems particularly important for keeping the distinction clear.
Do you have any examples, perhaps, where you’re comparing measuring the height (or age, or other continuous quantity) of a single dragon many times, with measuring the heights of many dragons once? (Perhaps the motivation could be that dragons, being wild beasts, won’t stand still to be measured, so many estimates have to be used instead?)
Hi Jon
Thank you for your input. Very interesting. I will have a think about this and see what I can come up with.
Like a repeatability and reproducibility study