The wonderful advantage of teaching statistics is the real-life context within which any applicaton must exist. This can also be one of the difficulties. Statistics without context is merely the mathematics of statistics, and is sterile and theoretical. The teaching of statistics requires real data. And real data often comes with a fairly solid back-story.

One of the interesting aspects for practicing statisticians, is that they can find out about a wide range of applications, by working in partnership with specialists. In my statistical and operations research advising I have learned about a range of subjects, including the treatment of hand injuries, children’s developmental understanding of probability, the bed occupancy in public hospitals, the educational needs of blind students, growth rates of vegetables, texted comments on service at supermarkets, killing methods of chickens, rogaine route choice, co-ordinating scientific expeditions to Antarctica and the cost of care for neonatals in intensive care. I found most of these really interesting and was keen to work with the experts on these projects. Statisticians tend to work in teams with specialists in related disciplines.

When one is part of a long-term project, time spent learning the intricacies of the context is well spent. Without that, the meaning from the data can be lost. However, it is difficult to replicate this in the teaching of statistics, particularly in a general high school or service course. The amount of time required to become familiar with the context takes away from the time spent learning statistics. Too much time spent on one specific project or area of interest can mean that the students are unable to generalise. You need several different examples in order to know what is specific to the context and what is general to all or most contexts.

One approach is to try to have contexts with which students are already familiar. This can be enabled by collecting the data from the students themselves. The Census at School project provides international data for students to use in just this way. This is ideal, in that the context is familiar, and yet the data is “dirty” enough to provide challenges and judgment calls.

Some teachers find that this is too low-level and would prefer to use biological data, or dietary or sports data from other sources. I have some reservations about this. In New Zealand the new statistics curriculum is in its final year of introduction, and understandably there are some bedding-in issues. One I perceive is the relative importance of the context in the students’ reports. As these reports have high-stakes grades attached to them, this is an issue. I will use as an example the time series “standard”. The assessment specification states, among other things, “Using the statistical enquiry cycle to investigate time series data involves: using existing data sets, selecting a variable to investigate, selecting and using appropriate display(s), identifying features in the data and relating this to the context, finding an appropriate model, using the model to make a forecast, communicating findings in a conclusion.”

The full “standard” is given here: Investigate Time Series Data This would involve about five weeks of teaching and assessment, in parallel with four other subjects.(The final 3 years of schooling in NZ are assessed through the National Certificate of Educational Achievement (NCEA). Each year students usually take five subject areas, each of which consists of about six “achievement standards” worth between 3 and 6 credits. There is a mixture of internally and externally assessed standards.)

In this specification I see that there is a requirement for the model to be related to the context. This is a great opportunity for teachers to show how models are useful, and their limitations. I would be happy with a few sentences indicating that the student could identify a seasonal pattern and make some suggestions as to why this might relate to the context, followed by a similar analysis of the shape of the trend. However there are some teachers who are requiring students to do independent literature exploration into the area, and requiring references, while forbidding the referencing of Wikipedia.

Statistics is not research methods any more than statistics is mathematics. Research methods and standards of evidence vary between disciplines. Clearly the evidence required in medical research will differ from that of marketing research. I do not think it is the place of the statistics teacher to be covering this. Mathematics teachers are already being stretched to teach the unfamiliar material of statistics, and I think asking them and the students to become expert in research methods is going too far.

It is also taking out all the fun.

Statistics should be fun for the teacher and the students. The context needs to be accessible or you are just putting in another opportunity for antipathy and confusion. If you aren’t having fun, you aren’t doing it right. Or, more to the point, if your students aren’t having fun, you aren’t doing it right.

- Use real data.
- If the context is difficult to understand, you are losing the point.
- The results should not be obvious. It is not interesting that year 12 boys weigh more than year 9 boys.
- Null results are still results. (We aren’t trying for academic publications!)
- It is okay to clean up data so you don’t confuse students before they are ready for it.
- Sometimes you should use dirty data – a bit of confusion is beneficial.
- Various contexts are better than one long project.
- Avoid the plodding parts of research methods.
- Avoid boring data. Who gives a flying fish about the relative sizes of dolphin jaws?
- Wikipedia is a great place to find out the context for most high school statistics analysis. That is where I look. It’s a great starting place for anyone.

## 2 Comments

[…] And it can be so much fun. There is a real excitement in exploring data, and becoming a detective. If you aren’t having fun, you aren’t doing it right! […]

[…] Context is everything in statistical analysis. Every time we produce a graph or a numerical result we should be thinking about the meaning in context. If there is a difference between the medians showing up in the graph, and reinforced by confidence intervals that do not overlap, we need to be thinking about what that means about the heart-rate in swimmers and non-swimmers, or whatever the context is. For this reason every data set needs to be real. We cannot expect students to want to find real meaning in manufactured data. And students need to spend long enough in each context in order to be able to think about the relationship between the model and the real-life situation. This is offset by the need to provide enough examples from different contexts so that students can learn what is general to all such models, and what is specific to each. It is a question of balance. […]