My mentor, Hans Daellenbach told me a story about a client asking for a one-armed Operations Researcher. The client was sick of getting answers that went, “On the one hand, the best decision would be to proceed, but on the other hand…”

People like the correct answer. They like certainty. They like to know they got it right.

I tease my husband that he has to find the best picnic spot or the best parking place, which involves us driving around considerably longer than I (or the children) were happy with. To be fair, we do end up in very nice picnic spots. However, several of the other places would have been just fine too!

In a different context I too am guilty of this – the reason I loved mathematics at school was because you knew whether you were right or wrong and could get a satisfying row of little red ticks (checkmarks) down the page. English and other arts subjects, I found too mushy as you could never get it perfect. Biology was annoying as plants were so variable, except in their ability to die. Chemistry was ok, so long as we stuck to the nice definite stuff like drawing organic molecules and balancing redox equations.

I think most mathematics teachers are mathematics teachers because they like things to be right or wrong. They like to be able to look at an answer and tell whether it is correct, or if it should get half marks for correct working. They do NOT want to mark essays, which are full of mushy judgements.

Again I am sympathetic. I once did a course in basketball refereeing. I enjoyed learning all the rules, and where to stand, and the hand signals etc, but I hated being a referee. All those decisions were just too much for me. I could never tell who had put the ball out, and was unhappy with guessing. I think I did referee two games at a church league and ended up with an angry player bashing me in the face with the ball. Looking back I think it didn’t help that I wasn’t much of a player either.

I also used to find marking exam papers very challenging, as I wanted to get it right every time. I would agonise over every mark, thinking it could be the difference between passing and failing for some poor student. However as the years went by, I realised that the odd mistake or inconsistency here or there was just usual, and within the range of error. To someone who failed by one mark, my suggestion is not to be borderline. I’m pretty sure we passed more people that we shouldn’t have, than the other way around.

The point is, that life in general is not deterministic and certain and rule-based. This is where the great divide lies between the subject of mathematics and the practice of statistics. Generally in mathematics you can find an answer and even check that it is correct. Or you can show that there is no answer (as happened in one of our national exams in 2012!). But often in statistics there is no clear answer. Sometimes it even depends on the context. This does not sit well with some mathematics teachers.

In operations research there is an interesting tension between optimisers and people who use heuristics. Optimisers love to say that they have the optimal solution to the problem. The non-optimisers like to point out that the problem solved optimally, is so far removed from the actual problem, that all it provides is an upper or lower bound to a practical solution to the actual real-life problem situation.

Judgment calls occur all through the mathematical decision sciences. They include

- What method to use – Linear programming or heuristic search?
- Approximations – How do we model a stochastic input in a deterministic model?
- Assumptions – Is it reasonable to assume that the observations are independent?
- P-value cutoff – Does a p-value of exactly 0.05 constitute evidence against the null hypothesis?
- Sample size – Is it reasonable to draw any inferences at all from a sample of 6?
- Grouping – How do we group by age? by income?
- Data cleaning – Do we remove the outlier or leave it in?

A comment from a maths teacher on my post regarding the Central Limit Theorem included the following: “The questions that continue to irk me are i) how do you know when to make the call? ii) What are the errors involved in making such a call? I suppose that Hypothesis testing along with p-values took care of such issues and offered some form of security in accepting or rejecting such a hypothesis. I am just a little worried that objectivity is being lost, with personal interpretation being the prevailing arbiter which seems inadequate.”

These are very real concerns, and reflect the mathematical desire for correctness and security. But I propose that the security was an illusion in the first place. There has always been personal interpretation.Informal inference is a nice introduction to help us understand that. And in fact it would be a good opportunity for lively discussion in a statistics class.

With bootstrapping methods we don’t have any less information than we did using the Central Limit Theorem. We just haven’t assumed normality or independence. There was no security. There was the idea that with a 95% confidence interval, for example, we are 95% sure that we contain the true population value. I wonder how often we realised that 1 in 20 times we were just plain wrong, and in quite a few instances the population parameter would be far from the centre of the interval.

The hopeful thing about teaching statistics via bootstrapping, is that by demystifying it we may be able to inject some more healthy scepticism into the populace.

## 2 Comments

My biggest stumbling block as a student was learning when to put in an approximation, and what approximation to use. Even in something straightforward like Stirling’s Approximation … well, it wasn’t clear to me why this was a good enough approximation and something else was not.

I’m better at it, now, but at least in my experience there wasn’t quite enough explanation of how to pick approximations and what order to accept, and statistics offers all these points where it’s really needed.

Regarding 95% confidence limits being “wrong” 1 in 20 times, my experience is that clients interpret 95% intervals as 100% intervals. I recall one coming back to me when an interval has been shown to have “failed” and asking “what went wrong?” So, in most cases I add and subtract 3 standard errors instead of 2 – which makes it a 100% interval for all practical purposes!