# Summarising with Box and Whisker plots

31 August 2015
###### Understanding Statistical Inference
9 November 2015

In the Northern Hemisphere, it is the start of the school year, and thousands of eager students are beginning their study of statistics. I know this because this is the time of year when lots of people watch my video, Types of Data. On 23rd August the hits on the video bounced up out of their holiday slumber, just as they do every year. They gradually dwindle away until the end of January when they have a second jump in popularity, I suspect at the start of the second semester.

One of the first topics in many statistics courses is summary statistics. The greatest hits of summary statistics tend to be the mean and the standard deviation. I’ve written previously about what a difficult concept a mean is, and then another post about why the median is often preferable to the mean. In that one I promised a video. Over two years ago – oops. But we have now put these ideas into a video on summary statistics. Enjoy! In 5 minutes you can get a conceptual explanation on summary measures of position. (Also known as location or central tendency)

I was going to follow up with a video on spread and started to think about range, Interquartile range, mean absolute deviation, variance and standard deviation. So I decided instead to make a video on the wonderful boxplot, again comparing the shoe- owning habits of male and female students in a university in New Zealand.
Boxplots are great. When you combine them with dotplots as done in iNZIght and various other packages, they provide a wonderful way to get an overview of the distribution of a sample. More importantly, they provide a wonderful way to compare two samples or two groups within a sample. A distribution on its own has little meaning.
John Tukey was the first to make a box and whisker plot out of the 5-number summary way back in 1969. This was not long before I went to High School, so I never really heard about them until many years later. Drawing them by hand is less tedious than drawing a dotplot by hand, but still time consuming. We are SO lucky to have computers to make it possible to create graphs at the click of a mouse.
Sample distributions and summaries are not enormously interesting on their own, so I would suggest introducing boxplots as a way to compare two samples. Their worth then is apparent.
A colleague recently pointed out an interesting confusion and distinction. The interquartile range is the distance between the upper quartile and the lower quartile. The box in the box plot contains the middle 50% of the values in the sample. It is tempting for people to point this out and miss the point that the interquartile range is a good resistant measure of spread for the WHOLE sample. (Resistant means that it is not unduly affected by extreme values.) The range is a poor summary statistic as it is so easily affected by extreme values.
And now we come to our latest video, about the boxplot. This one is four and a half minutes long, and also uses the shoe sample as an example. I hope you and your students find it helpful. We have produced over 40 statistics videos, some of which are available for free on YouTube. If you are interested in using our videos in your teaching, do let us know and we will arrange access to the remainder of them.