One of the key ideas in statistics is that sometimes we will be wrong. When we report a 95% confidence interval, we will be wrong 5% of the time. Or in other words, about 1 in 20 of 95% confidence intervals will not contain the population parameter we are attempting to estimate. That is how they are defined. The thing is, we always think we are part of the 95% rather than the 5%. Mostly we will be correct, but if we do enough statistical analysis, we will almost definitely be wrong at some point. However, human nature is such that we tend to think it will be someone else. There is also a feeling of blame associated with being wrong. The feeling is that if we have somehow missed the true value with our confidence interval, it must be because we have made a mistake. However, this is not true. In fact we MUST be wrong about 5% of the time, or our interval is too big, and not really a 95% confidence interval.

The term “margin of error” appears with increasing regularity as elections approach and polling companies are keen to make money out of sooth-saying. The common meaning of the margin of error is half the width of a 95% confidence interval. So if we say the margin of error is 3%, then about one time in twenty, the true value of the proportion will actually be more than 3% away from the reported sample value.

What doesn’t help is that we seldom do know if we are correct or not. If we knew the real population value we wouldn’t be estimating it. We can contrive situations where we do know the population but pretend we don’t. If we do this in our teaching, we need to be very careful to point out that this doesn’t normally happen, but does in “classroom world” only. (Thanks to MD for this useful term.) General elections can give us an idea of being right or wrong after the event, but even then the problem of non-sampling error is conflated with sampling error. When opinion polls turn out to miss the mark, we tend to think of the cause as being due to poor sampling, or people changing their minds, or all number of imaginative explanations rather than simple, unavoidable sampling error.

So how do we teach this in such a way that it goes beyond school learning and is internalised for future use as efficient citizens?

I have two suggestions. The first is a series of True/False statements that can be used in a number of ways. I have them as part of on-line assessment, so that the students are challenged by them regularly. They could be well used in the classroom as part of a warm-up exercise at the start of a lesson. Students can write their answers down or vote using hands.

Here are some examples of True/False statements (some of which could lead to discussion):

- You never know if your confidence interval contains the true population value.
- If you make your confidence interval wide enough you can be sure that you contain the true population value.
- A confidence interval tells us where we are pretty sure the sample statistic lies.
- It is better to have a narrow confidence interval than a wide one, as it gives us more certain information, even though it is more likely to be wrong.
- If your study involves twenty confidence intervals, then you know that exactly one of them will be wrong.
- If a confidence interval doesn’t contain the true population value, it is because it is one of the 5% that was calculated incorrectly.

You can check your answers at the end of this post.

The other teaching suggestion is for an experiential exercise. It requires a little set up time.

Make a set of cards for students with numbers on them that correspond to the point estimate of a proportion, or a score that will lead to that. (Specifications for a set of 35 cards representing the results from a proportion of 0.54 and 25 trials is given below).

Introduce the exercise as follows:

“I have a computer game, and have set the ratio of wins to losses at a certain value. Each of you has played 25 times, and the number of wins you have obtained will be on your card. It is really important that you don’t look at other people’s cards.”

Hand them out to the students. (If you have fewer than 35 in your class, it might be a good idea to make sure you include the cards with 8 and 19 in the set you use – sometimes it is ok to fudge slightly to teach a point.)

“Without getting information from anyone else, write down your best estimate of the true proportion of wins to losses in the game. Do you think you are correct? How close do you think you are to the true value?”

They will need to divide the number of wins by 25, which should not lead to any computational errors! The point is that they really can’t know how close their estimate is to the true value – and what does “correct” mean?

Then work out the margin of error for a sample of size 25, which in this case is estimated at 20%. Get the students to calculate their 95% confidence intervals, and decide if they have the interval that contains the true population value. Get them to commit one way or the other.

Now they can talk to each other about the values they have.

There are several ways you can go from here. You can tell them what the population proportion was from which the numbers were drawn (0.54). They can then see that most of them had confidence intervals that included the true value, and some didn’t. Or you can leave them wondering, which is a better lesson about real life. Or you can do one exercise where you do tell them and one where you don’t.

This is an area where probability and statistics meet. You could make a nice little binomial distribution problem out of being correct in a number of confidence intervals. There are potential problems with independence, so you need to be a bit careful with the wording. For example: Fifteen students undertake separate statistical analyses on the topics of their choice, and construct 95% confidence intervals. What is the probability that all the confidence intervals are correct, in that they do contain the estimated population parameter? This is well modelled by a binomial distribution with n =15 and p=0.05. P(X=0)=0.46. And another interesting idea – what is the probability that two or more are incorrect? 0.17 is the answer. So there is a 17% chance that more than one of the confidence intervals does not contain the population parameter of interest.

This is an area that needs careful teaching, and I suspect that some teachers have only a sketchy understanding of the idea of confidence intervals and margins of error. It is so important to know that statistical results are meant to be wrong some of the time.

Answers: T,T,F, debatable, F,F.

Data for the 35 cards:

Number on card | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |

Number of cards | 1 | 1 | 2 | 3 | 5 | 5 | 6 | 5 | 3 | 2 | 1 | 1 |

## 6 Comments

A crude way to avoid all of this is to add and subtract 3 standard errors instead of 2. You then have a CI which is close enough to 100%. This avoids consulting clients coming back to you when a CI “fails” and asking “what went wrong”?!

I like your true/false statements! They could be really useful for diagnosing those subtle misunderstandings around inference that trip students up. It looks to me like you deliberately avoid any strictly frequentist definitions. I’ve recently been introducing students at an early stage in an “intro to stats” type of course to the philosophical differences and historical debates. In this way I can tell them to regard probability and confidence intervals in whatever way works for them, but just remember that there are some people out there who will tick them off for saying anything that is less than strictly a long-run interpretation. I think (??!) this dose of philosophy of science helps them later on by giving something of a framework, it seems to go down well at the time, but does it really help them later on… too early to say.

What’s your view on introducing them to the purpose and meaning of inference?

I like the approach of the new New Zealand school curriculum where they start with informal inference with rules of thumb, then lead on to bootstrapped confidence intervals and resampling for experiments. By the time they go to uni they should have a good feel for it before they meet the more traditional approaches.

Wow! Bootstrapping at school. We have some catching up to do. I agree that seems to me to be a more intuitive introduction than theory around sampling distributions. Here in the UK there are some Open University courses now teaching MCMC at undergraduate level with the same rationale.

It helps that there is free R-based software called iNZight developed to enable easy bootstrapping and graphical methods. The Royal Statistical Society in a submission said that NZ was on the right track and that the UK curriculum should be more like ours. (Or something like that.) The teachers have found it very challenging, which has been an opportunity for me to produce resources to help them and their students. You can see them on http://www.course.statslc.com. – take a look around on the 7-day free trial!

Some years ago I had to teach introductory statistics to science students. Coming from a maths background I found that explaining how almost all students could easily calculate confidence intervals but very few can clearly explain what they meant. All of the points made by Dr Nic are helpful and correct but in the end the only way that I found that worked moderately well was to find a form of words that students could easily adapt to explain their results.

Two examples are

Conclusion in English about the 95% confidence interval for percentages

I have tested the treatment on a random sample of 50 migraine sufferers. I believe

that the real effectiveness of this treatment for severe migraine is somewhere between

64% to 88%. In this calculation I have assumed that my sample of people was a

“95% typical sample

Conclusion in English about the 95% confidence interval for population averages

I believe that the average blood pressure for all 20-30 year old women is between

110mm and 120mm. This is based on the assumption that my random sample of

40 women is one of the 95% most typical samples.

This seemed to work reasonably well although the “95% typical sample” phrase is a bit vague.

The explanation had to refer to the actual quantity measured ( not just “population average”), the population studied, the sample size, the limits on the CI and the statement that this was an assertion or belief based on an assumption.

Unfortunately it didn’t stop people saying that 95% of the population was in the 95% CI or that there was a 95% chance that any future experiment would give a result in this CI.

I think that confidence intervals take a lot of time to get comfortable with. Unfortunately even though these students did introductory statistics in their first year, their later science courses made little use of statistics until they got to their final year project, by which time they had forgotten most of their stats.

Tony O’Connor