This is written in the week before the 2017 New Zealand General Election and it is an exciting time. Many New Zealanders are finding political polls fascinating right now. We wait with bated breath for each new announcement – is our team winning this time? If it goes the way we want, we accept the result with gratitude and joy. If not, then we conclude that the polling system was at fault.
Many wonder how on earth asking 1000 people can possibly give a reading of the views of all New Zealanders. This is not a silly question. I have only occasionally been polled, so how can I believe the polls reflect my view? As a statistical communicator, I have given some thought to this. If you are a statistician or a teacher of statistics, how would you explain that inference works?
Here is my take on it.
Imagine you have a bowl of seeds – mustard and rocket. All the seeds are about the same size, and have been mixed up. These seeds are TINY, so several million seeds only fill up a large bowl. We will call this bowl the population. Let’s say for now that the bowl contains exactly half and half mustard and rocket, and you suspect that to be the case, but you do not know for sure.
Say you take out 10 seeds. The most likely result is that you will get 4,5 or 6 mustard seeds. There is a 65% chance, that that is what will happen. If you got any of those results, you would think that the bowl might be about half and half. You would be surprised if they were all mustard seeds. But it is possible that all ten seeds are the same. The probability of getting all mustard seeds or all rocket seeds from a bowl of half and half is about 0.002 or one chance in five hundred.
Now, if you draw out 1000 seeds, it is quite a different story. If all the 1000 seeds drawn out were mustard, you would justifiably conclude that the bowl is not half and half, and may in fact have no rocket seeds. But where do we draw the line? How likely is it to get 40% or less mustard from our 50/50 bowl? Well it is about one chance in 12,000. It is possible, but extremely unlikely – though not as unlikely as winning Lotto. We can see that the sample of 1000 seeds gives us a general idea of what is in the bowl, but we would never think it was an exact representation. If our sample was 51% mustard, we would not sensibly conclude that the seeds in the bowl were not half and half. In fact, there is only a 47% chance that we will get a sample of seeds that is between 49% and 51%.
Of course we know we are not little seeds, but people. In fact we like to think we are all special snowflakes. (The scene from “Life of Brian” springs to mind. Brian – “You are all individuals”, crowd – “We are all individuals”, single response – “I’m not!”)
But the truth is that as a group we do act in surprisingly consistent ways. Every year as a university lecturer I tried new things to help my students to learn. And every year the pass rate was disappointingly consistent. I later devised a course that anyone could pass if they put the work in. They could keep resitting the tests until they passed. And the pass rate stayed around the same.
People do tend to act in similar ways. So if one person changes their viewpoint, there is a pretty good chance that others will have also. So long as we are aware of the limitations in precision, samples are good indicators of the populations from which they are drawn.
Here is a link to our video about Inference.
I have described why polls generally work. The media tends to dwell on the times that they fail, so let’s look at why that may be.
Sometimes the poll may just be the one that takes an unlikely sample. There is a one in a thousand chance that ten seeds from my bowl will all be mustard – and a one in a thousand chance that all will be rocket. It is not very likely, but it can happen. Similarly there is a teeny chance that we will get a result of less than 45% or more than 55% when we take out 1000 seeds. Not likely, but possible. This is called sampling error, and that is what the margin of error is about. Political polls in NZ generally take a sample of 1000 people, which leads to a margin of error of about 3%. What margin of error means is that we can make an interval of 3% either side of the estimate and be pretty sure that it encloses the real value from the population. So if a poll says 45% following for the Mustard Party, then we can be pretty sure that the actual following back in the population is between 42% and 48%. And what does “pretty sure” mean? It means that about one time in twenty we will get it wrong and the actual following, back in the population is outside that range. The problem is we NEVER know if this is the right one or the wrong one. (Though I personally choose to decide that the polls that I don’t like are the wrong ones. ;))
Non-sampling error and bias
There are other problems – known as non-sampling error. I wrote a short post on it previously.
And this is where the difference between seeds and people becomes important. Some issues are:
When we take a handful of seeds from a well-mixed up bowl, every seed really does have an equal chance of being selected. But getting such a sample from the population of New Zealand is much more difficult. When landlines were in most homes, a phone poll could be a pretty representative sample. However, these days many people have only mobile phones, and which means they are less likely to be called. This would not be a problem if there were no differences politically between landline holders and others. I think most people would see that younger people are less likely to be polled than older, if landlines are used, and younger people quite possibly have different political views. Good polling companies are aware of this and use quota sampling and other methods to try to mitigate this.
The wording of the question and the order of questions can affect what people say. You can usually find out what question has been asked in a particular poll, and it should be reported as part of the report.
Unlike seeds, people do not always show their true colours. If a person is answering a poll within earshot of another family member, they may give a different answer to what they actually tick on election day. Some people are undecided, and may change their mind in the booth. Undecided voters are difficult to account for in statistics, as an undecided voter swinging between two possible coalition partners will have a different impact from a person who has not opinion or may vacillate wildly.
In a volatile political environment like the one we are experiencing in New Zealand, people can change their mind from day to day as new leaders emerge, scandals are uncovered, and even in response to reporting of political polls. The results of a poll can be affected by the day and time that the questions were asked.
On balance, polls are a blunt instrument, that can give a vague idea about who people are likely to vote for. They do work, within their limitations, but the limitations are fairly substantial. We need to be sceptical of polls, and bear in mind that the margin of error only deals with sampling error, not all the other sources of error and bias.
And as they say – the only truly correct poll is the one on Election Day.