Confidence intervals are needed because there is variation in the world. Nearly all natural, human or technological processes result in outputs which vary to a greater or lesser extent. Examples of this are people’s heights, students’ scores in a well written test and weights of loaves of bread. Sometimes our inability or lack of desire to measure something down to the last microgram will leave us thinking that there is no variation, but it is there. For example we would check the weights of chocolate bars to the nearest gram, and may well find that there is no variation. However if we were to weigh them to the nearest milligram, there would be variation. Drug doses have a much smaller range of variation, but it is there all the same.

You can see a video about some of the main sources of variation – natural, explainable, sampling and due to bias.

When we wish to find out about a phenomenon, the ideal would be to measure all instances. For example we can find out the heights of all students in one class at a given time. However it is impossible to find out the heights of all people in the world at a given time. It is even impossible to know how many people there are in the world at a given time. Whenever it is impossible or too expensive or too destructive or dangerous to measure all instances in a population, we need to take a sample. Ideally we will take a sample that gives each object in the population an equal likelihood of being chosen.

You can see a video here about ways of taking a sample.

When we take a sample there will always be error. It is called sampling error. We may, by chance, get exactly the same value for our sample statistic as the “true” value that exists in the population. However, even if we do, we won’t know that we have.

The sample mean is the best estimate for the population mean, but we need to say how well it is estimating the population mean. For example, say we wish to know the mean (or average) weight of apples in an orchard. We take a sample and find that the mean weight of the apples in the sample is 153g. If we only took a few apples, it is only a rough idea and we might say we are pretty sure the mean weight of the apples in the orchard is between 143g and 163g. If someone else took a bigger sample, they might be able to say that they are pretty sure that the mean weight of apples in the orchard is between 158g and 166g. You can tell that the second confidence interval is giving us better information as the range of the confidence interval is smaller.

There are two things that affect the width of a confidence interval. The first is the** sample size.** If we take a really large sample we are getting a lot more information about the population, so our confidence interval will be more exact, or smaller. It is not a one-to-one relationship, but a square-root relationship. If we wish to reduce the confidence interval by a factor of two, we will need to increase our sample size by a factor of 4.

The second thing to affect the width of a confidence interval is the **amount of variation** in the population. If all the apples in the orchard are about the same weight, then we will be able to estimate that weight quite accurately. However, if the apples are all different sizes, then it will be harder to be sure that the sample represents the population, and we will have a larger confidence interval as a result.

The standard way of calculating confidence intervals is by using formulas developed on the assumptions of normality and the Central Limit Theorem. These formulas are used to calculate the confidence intervals of means, proportions and slopes, but not for medians or standard deviations. That is because there aren’t nice straight-forward formulas for these. The formulas were developed when there were no computers, and analytical methods were needed in the absence of computational power.

In terms of teaching, these formulas are straight-forward, and also include the concept of level of confidence, which is part of the paradigm. You can see a video teaching the traditional approach to confidence intervals, using Excel to calculate the confidence interval for a mean.

In the New Zealand curriculum at year 12, students are introduced to the concept of inference using an informal method for calculating a confidence interval. The formula is median +/- 1.5 times the interquartile range divided by the square-root of the sample size. There is a similar formula for proportions.

Bootstrapping is a very versatile way to find a confidence interval. It has three **strengths**:

- It can be used to calculate the confidence interval for a large range of different parameters.
- It uses ALL the information the sample gives us, rather than the summary values
- It has been found to aid in understanding the concepts of inference better than the traditional methods.

There are also some **disadvantages**

- Old fogeys don’t like it. (Just kidding) What I mean is that teachers who have always taught using the traditional approach find it difficult to trust what seems like a hit-and-miss method without the familiar theoretical underpinning.
- Universities don’t teach bootstrapping as much as the traditional methods.
- The common software packages do not include bootstrap confidence intervals.

The idea behind a **bootstrap confidence interval** is that we make use of the whole sample to represent the population. We take lots and lots of samples of the same size from the original sample. Obviously we need to sample with replacement, or the samples would all be identical. Then we use these repeated samples to get an idea of the distribution of the estimates of the population parameter. We chop the tails off at a given point, and we give the confidence interval. Voila!

- There is a sound theoretical underpinning for bootstrap confidence intervals. A good place to start is a previous blog about George Cobb’s work. Either that or – “Trust me, I’m a Doctor!” (This would also include trusting far more knowledgeable people such as Chris Wild and Maxine Pfannkuch, and the team of statistical educators led by Joan Garfield.
- We have to start somewhere. Bootstrap methods aren’t used at universities because of inertia. As an academic of twenty years I can say that there is NO PAY OFF for teaching new stuff. It takes up valuable research time and you don’t get promoted, and sometimes you even get made redundant. If students understand what confidence intervals are, and the concept of inference, then learning to use the traditional formulas is trivial. Eventually the universities will shift. I am aware that the University of Auckland now teaches the bootstrap approach.
- There are ways to deal with the software package problem. There is a free software interface called “iNZight” that you can download. I believe Fathom also uses bootstrapping. There may be other software. Please let me know of any and I will add them to this post.

Confidence intervals involve the concepts of variation, sampling and inference. They are a great way to teach these really important concepts, and to help students be critical of single value estimates. They can be taught informally, traditionally or using bootstrapping methods. Any of the approaches can lead to rote use of formula or algorithm and it is up to teachers to aim for understanding. I’m working on a set of videos around this topic. Watch this space.

## 9 Comments

Oh I agree about your comment “There is NO PAY OFF for teaching new stuff. It takes up valuable research time and you don’t get promoted”. The same applies even for new ways of teaching old stuff. (Is statistics teaching stuck in a time-warp – I am shocked at how e.g. multivariate texts remain unchanged while the world around us is transformed e.g. real-time BIG data.)

But I was disappointed to find no video on bootstrapping. (Some years ago the ‘Fathom’ package did this quite easily but I am not up-to-date with this.)

JOHN BIBBY

The video on bootstrapping is in the pipeline. In fact writing this post was part of getting my head around the ideas for the script. So watch this space!

I wrote a general procedure for bootstrapping nearly 20 years ago, which is still in GenStat’s standard library. However, despite having been an active applied statistician ever since, I have never used it in my work. Prompted by your blog, I have tried to think why this should be. One reason is clearly that it takes more effort to use my procedure on any given statistical problem than it is to use the parametric estimates of variability that are produced automatically by the software’s statistical commands. A second reason is that use of bootstrapped estimates would require extra explanation, whereas parametric ones are accepted with little thought if a technique of analysis is considered reasonable. But I find that I also question whether the bootstrap approach is better anyway. When samples are small, the empirical distribution is poorly estimated and I would prefer to rely on a distributional assumption that I consider reasonable. When samples are large, the Central Limit Theorm tells me that parametric estimates of the variability of the mean are going to be reliable whatever, and nearly all the problems I have worked with are concerned with means. My final reason is that I seem to reall that there are difficulties in applying the bootstrap to some statistics, like medians, which can trap the unwary. I no longer recollect the details, but the lurking feeling of danger is enough to dissuade me from using the method without serious reading.

I’m puzzled by your assertion that “The common software packages do not include bootstrap confidence intervals.” R, Stata and SAS all have bootstrapping and SPSS has a bolt-on module to so it…

In response to Simon, I think the situation is much the same in R, Stat, SAS and SPSS as I mentioned that it is in GenStat. There are indeed procedures to do bootstrapping, whether extra commands in R, macros in SAS, or bolt-on procedures in SPSS, but they are not integrated into the core of the system. Most people fitting linear models in SAS, for example, use the GLM or MIXED procedures, and I don’t recall seeing any options in these to produce bootstrapped confidence intervals.

Thanks Peter, for shedding light on that. There is an add-on for SPSS but it costs more. Excel (yes I know it isn’t a proper stats package, but it is what is used especially at business schools) doesn’t have anything, though doing a bit of bootstrapping isn’t that hard if you are at home with Excel. Mostly when I write I am drawing on my own experience of teaching first and second-year undergrads and some service courses for post-grads. We did not have access to bootstrapping. High Schools are even more impoverished. Some have Fathom, which I think does bootstrapping, but most use Excel. Genstat is pretty popular too, I believe. iNZight is clunky but does the job and is free.

Do any of the more common packages allow the use of sample weights to get bootstrap CIs for means. medians etc? I don’t think STATA does. I guess this translates into having different probabilities of observations being resampled for the bootstrapping.

[…] confidence intervals see other posts: Good, bad and wrong videos about confidence intervals Confidence Intervals: informal, traditional, bootstrap Why teach […]

nice explanation,thanks. And the mention of New Zealand is especially helpful.