- Also Statistics Learning Centre StatsLC Sign-In

Categories

First there is the problem of lexical ambiguity. There are colloquial meanings for random that don’t totally tie in with the technical or domain-specific meanings for random.

Then there is the fact that people can’t actually be random.

Then there is the problem of equal chance vs displaying a long-term distribution.

And there is the problem that there are several conflicting ideas associated with the word “random”.

In this post I will look at these issues, and ask some questions about how we can better teach students about randomness and random sampling. This problem exists for many domain specific terms, that have colloquial meanings that hinder comprehension of the idea in question. You can read about more of these words, and some teaching ideas in the post, Teaching Statistical Language.

First there is lexical ambiguity. Lexical ambiguity is a special term meaning that the word has more than one meaning. Kaplan, Rogness and Fisher write about this in their 2014 paper “Exploiting Lexical Ambiguity to help students understand the meaning of Random.” I recently studied this paper closely in order to present the ideas and findings to a group of high school teachers. I found the concept of leveraging lexical ambiguity very interesting. As a useful intervention, Kaplan et al introduced a picture of “random zebras” to represent the colloquial meaning of random, and a picture of a hat to represent the idea of taking a random sample. I think it is a great idea to have pictures representing the different meanings, and it might be good to get students to come up with their own.

So what are the different meanings for random? I consulted some on-line dictionaries.

The first meaning of random describes something happening without pattern, method or conscious decision. An example is “random violence”.

Example: She dressed in a rather random faction, putting on whatever she laid her hand on in the dark.

Most on-line dictionaries also give a statistical definition, which includes that each item has an equal probability of being chosen.

Example: The students’ names were taken at random from a pile, to decide who would represent the school at the meeting.

One meaning: Something random is either unknown, unidentified, or out of place.

Example: My father brought home some random strangers he found under a bridge.

Another colloquial meaning for random is odd and unpredictable in an amusing way.

Example: My social life is so random!

There has been considerable research into why people cannot provide a sequence of random numbers that is like a truly randomly generated sequence. In our minds we like things to be shared out evenly and the series will generally have fewer runs of the same number.

Animals aren’t very random either, it seems. Yesterday I saw a whole lot of sheep in a paddock, and while they weren’t exactly lined up, there was a pretty similar distance between all the sheep.

In the paper quoted earlier, Kaplan et al used the following definition of random:

“We call a phenomenon random if individual outcomes are uncertain, but there is nonetheless a regular distribution of outcomes in a large number of repetitions.” From Moore (2007) The Basic Practice of Statistics.

Now to me, that does not insist that each outcome be equally likely, which matches with my idea of randomness. In my mind, random implies chance, but not equal likelihood. When creating simulation models we would generate random variates following all sorts of distributions. The outcomes would be far from even, but in the long run they would display a distribution similar to the one being modelled.

Yet the dictionaries, and the later parts of the Kaplan paper insist that randomness requires equal opportunity to be chosen. What’s a person to do?

I propose that the meaning of the adjective, “random” may depend on the noun that it is qualifying. There are random samples and random variables. There is also randomisation and randomness.

A random sample is a sample in which each object has an equal opportunity of being chosen, and each choice of object is by chance, and independent of the previous objects chosen. A random variable is one that can take a number of values, and will generally display a pattern of outcomes similar to a given distribution.

I wonder if the problem is that randomness is somehow equated with fairness. Our most familiar examples of true randomness come from gambling, with dice, cards, roulette wheels and lotto balls. In each case there is the requirement that each outcome be equally likely.

Bearing in mind the overwhelming evidence that the “statistical meaning” of randomness includes equality, I begin to think that it might not really matter if people equate randomness with equal opportunity.

However, if you think about medical or hazard risk, the story changes. Apart from known risk increasing factors associated with lifestyle, whether a person succumbs to a disease appears to be random. But the likelihood of succumbing is not equal to the likelihood of not succumbing. Similarly there is a clear random element in whether a future child has a disability known to be caused by an autorecessive gene. It is definitely random, in that there is an element of chance, and that the effects on successive children are independent. But the probability of a disability is one in four. I suppose if you look at the outcomes as being which children are affected, there is an equal chance for each child.

But then think about a “lucky dip” containing many cheap prizes and a few expensive prizes. The choice of prize is random, but there is not an even chance of getting a cheap prize or an expensive prize.

I think I have mused enough. I’m interested to know what the readers think. Whatever the conclusion is, it is clear that we need to spend some time making clear to the students what is meant by randomness, and a random sample.

021 268 3529 Call Us

## 14 Comments

“A random sample is a sample in which each object has an equal opportunity of being chosen, and each choice of object is by chance, and independent of the previous objects chosen.”

I don’t like this much, Nic. We should at least start with the standard, orthodox definition of a simple random sample of size n from a finite population of size N: it’s chosen in such a way that any of the possible N choose n samples is equally likely.

I number a class of 100 from 1 to 100, and choose a sample of size 50 by tossing a coin, and choosing the the even numbers if it’s a H, and the odd numbers if it’s a tail. That is a “random” (probability mechansim) sample, and each person has the same chance of being in it. But only two possible samples can be chosen, and it’s not a simple random sample.

Also, your definition assumes objects get into the sample sequentially, which need not be the case.

Hi Nic, I haven’t given it much thought, but I wonder if the topic is a little to big from a statistical perspective. “Random” is often used as an adjective in statistics, as in random sample, random variable, missing at random, etc.

A random draw from a standard normal distribution is definitely not one where “each item has an equal probability of being chosen”. In fact numbers nearest to 0 have the greatest probability of being chosen, numbers with an absolute value of 100 or more have an infinitesimally small probability of being chosen.

I guess it depends at what level you are teaching. “Random” is one of those concepts which improves and refines with age 🙂

The best source I ever came across, I was looking at tests for randomness when defining pseudo-random-number generators, was one of Knuth’s volumes. Its a bit heavy going in places, but very interesting.

Cath

“In my mind, random implies chance, but not equal likelihood. When creating simulation models we would generate random variates following all sorts of distributions.”

I think the likelihood of an outcome does not imply the randomness of the event. Suppose you had an urn (yeah its me the urn guy) with 5 red marbles and 10 blue marbles. Selecting any of the marbles is a matter of chance and each has a probability of 1 in 15 of being chosen. But the outcome of choosing a red versus blue marble is different. The selection is “random”, the “likelihood” depends on the definition the desired outcome.

Hi Dr Nic

I enjoy reading your blog, and this time you’ve tempted me into replying as well, with not one but two points.

It’s already been said, but “random” and “equal probability” are quite distinct concepts.

In the area where I used to work (survey sampling from finite populations) that is very obvious. You begin with simple random samples (which may be with or without replacement) as an introduction, but they are almost never used in practice. There is all the fun of stratified sampling, clustered sampling, multi-stage designs, multi-frame designs, multi-phase designs etc to look forward to once you get past the very basics. With all the fancy labels they might seem intimidating to students, but almost all of them arise as solutions to very practical problems, and if approached that way can be quite easy to understand (OK I know I’m not I am not a good test, I worked with it for so long I can’t remember what it was like when I started). For example, how do I make sure I get enough males and females in my sample? – stratify. Is there a correct way to get a better (ugh I hate using the word with defining what I mean but ploughing on anyway) sample when I know there are both big and small businesses out there? – sure stratify by business size and maybe even have a ‘take-all stratum’. What if I don’t know the size of the businesses to begin with ? – well ask them in the first phase of your collection, and then subsample in the second phase (because otherwise you’ll get “too many” small business and “not enough” medium and large ones). What if it costs me a lot to get to where the respondents are so I can ask them questions? – well cluster the design to reduce costs – it usually makes the variance go up but overall it’s more efficient, as long as you weight the sample estimator correspondingly. Because I’m old-fashioned, all these approaches involve random sampling where each unit in the population must have a ‘known non-zero chance of selection’ – but it is almost unheard of for those chances to be equal for every unit.

Second, “unknown” and “random” are distinct concepts.

Things that are random may well begin as unknown, but there are many things that are unknown to me that are certainly not random! Try asking me about the right temperature to bake a cake.

More seriously, and more challengingly, I recall a talk by Persi Diaconis (apologies if I’ve spelt his name wrong). He pointed out that flipping a coin (even a ‘fair’ one if you are brave enough to try and define what that is) is actually a deterministic situation. Coins are large enough that Newtonian mechanics works for them, so if you know (I’ve just slipped in an important three words)the starting position of the coin, it’s orientation, the force applied when it’s flipped, the distance to the floor, etc then the outcome, heads or tails, can be calculated using relatively basic mechanics. He reckoned he used to be able to win a lot of bets, using a machine he had made so the he could control the initial force of the toss etc. Of course, if you don’t know the initial conditions, the outcome is unknown to you before hand, but it’s not actually random, and you’d have been unwise to bet against him. Unfortunately the security guards in modern airports weren’t keen to let him on planes with his device, and it was quite delicate since it had to deliver fairly precise and repeatable tosses, so I didn’t get to actually see it in action.

Anyway, good luck with defining random, and keep the blogs coming.

Regards

Geoff

There are many terms where the mathematical and the common definitions differ. Take “expected”, for example. Mathematically, the expected value of a single roll of a fair dice is 3.5, but for those not mathematically inclined, it doesn’t make sense to “expect” to roll a 3.5.

P.S. “People cannot be random”: not entirely true. According to IMDB, there is a screenwriter called John Random. There may be others.

This MAA article may help:

What Is a Random Sequence by Sérgio B. Volchan

http://www.maa.org/programs/maa-awards/writing-awards/what-is-a-random-sequence

Richard von Mises thought random, chance and probability all involved “insensitivity to place selection” and the “impossibility of a successful gambling system.” This outcome-based description may make a stronger impression on students than the classic probability-based descriptions. For details, see Schield & Burnham (2008). Von Mises’ Frequentist Approach to Probability. Copy at http://www.statlit.org/PDF/2008SchieldBurnhamASA.pdf

[…] start with, the words “random variable” are difficult on their own. I have dedicated an entire post to the problems with “random”, and in the writing of it, discovered another inconsistency in the way that we use the word. When […]

[…] terms. I have written about this before in such posts as Teaching Statistical Language and It is so random. But the terms sampling error and non-sampling error win the Dr Nic prize for counter-intuitivity […]

I do not see how the concept of “randomness” can require “an equal chance of selection”.

Let me explain why, by reference to an example. Suppose I have 15 balls in a bag, numbered from 1 to 15. All the balls with numbers which are multiples of 3 are red, and all the others are blue. Suppose I plunge my hand into the bag and draw out a ball without looking. Was that a random act, or was it not?

If the concept of randomness depends upon there being an equal chance of selection, then my act was a random act if I am interested in the number on the ball, but it was not random if I am interested in the colour of the ball. How can that be? It is the same act. How can the randomness of the act depend upon an extraneous factor such as whether I am trying to draw a number at random or a colour at random?

But just supposing, for the sake of argument, that the quality of randomness can depend upon whether I am interested in the colour or the number. What if I don’t know? Perhaps I am going to decide whether it is the number or the colour that matters only after I have drawn the ball, by flipping a coin. Heads it’s the number, tails it’s the colour. At the point that I have drawn the ball but not flipped the coin, was my act of drawing the ball random or was it not? It seems that no answer can be given to that, and that whether an action in the past was (will have been?) random depends entirely upon the outcome of a future event … which may never happen. I might, for instance, get bored of the experiment and go and use the coin to buy an ice cream instead. If I do that, nobody will ever be able to say whether my act in drawing the ball out of the bag was random or not.

Or suppose the coin has already been flipped, by a blind person, who put it in a locked safe and destroyed the key. Now the randomness or otherwise of the drawing of the ball is definitely settled, it does not depend upon some future event … but nobody can ever find out if it was or was not a random act.

This is all madness!

I cannot see that “equal chance of selection” can possible be a necessary condition without which “randomness” cannot exist.

Thanks for your thoughts, Jeremy.

That lies in the difference between the event and the outcome. Each event is equally likely (that is, each of the 15 balls is equally likely to be drawn). Each outcome is not equally likely (red or blue) since there are a different number of equally likely events leading to each outcome.

The set of all events is S = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}

Which contains 15 equally likely elements.

The set of events resulting in red is

R={3,6,9,12,15}

which contains 5 elements while blue is

B={1,2,4,5,7,8,10,11,13,14}

which contains 10 elements.

Thus P(R) = 5/15 = 1/3

and

P(B) = 10/15 = 2/3.

So while the outcome of drawing a red ball is not as likely as drawing a blue ball, each individual ball is equally likely to be drawn. So you could look at it as the event of drawing a ball is random, the outcome may not be depending on how we define it.

You are correct. Not all outcomes of a random process are equally likely.