- Also Statistics Learning Centre StatsLC Sign-In

All my time at school the “average” was always calculated as the arithmetic mean, by adding up all the scores and then dividing by the number of scores. When we were taught about the median, it seemed like an inferior version of the mean. It was the thing you worked out when you weren’t smart enough to add and divide. It was used for house prices, and that was about it. Of course the mean was the superior product! Why wouldn’t you use the mean?

I’ve been preparing resources for teaching the fabulous new New Zealand curriculum, and have been brought face-to-face with my prejudices. It strikes me that the median has had very poor representation.

I put a question on Facebook and Twitter to see what people felt about the mean and the median. I briefly explained what each was, then asked which one they thought was better. Some people had no idea what I was talking about, but most felt that the mean was the superior statistic. The following are a selection of responses:

The mean, but I don’t know why.. maybe that’s just what we were taught to use when I was back in school (a long time ago!) lol

When I think of “average” I always think of the mean. I don’t know if it’s actually better though

well the median is a real pain to work out. you have to make a list of all the numbers, in order, and then count how many they are and then go to the middle. PAIN IN THE BUM. the average… well that is somewhat quicker to do, no? and i don’t see the point in the median at all. unless well no, there is just no need for it. who cares what the15th person in the class got on a test? the lowes and highes is much more interesting. As i remember it, the mode is the most commonly occuring number out of a set of numbers… i think of this as the “mode” or in English (not French), the ‘fashionable” number. oh and it stresses me how all 3 start with Ms cos that is confusing. which is why i like to use the word average.

The mean, which I’m guessing is the same as the average? When the media refer to real estate stats they always use median price, which can distort reality, we would prefer the average price.(From a real estate agent)

I don’t really think it’s a case of which is better. They’re two different things aren’t they? I think it’s usually easier to work out the average.

A number of my Facebook friends did know about statistics, and responded in favour of the median in most cases. This was an interesting comment:

“It depends. Everyone who proof read my thesis was like why on earth are you using the median – no one uses it. And most of the other similar primate studies I’ve read use the mean (except one, that was published by my associate supervisor). But my means were off their rocker, and I’m pretty sure my medians were a much better representation of reality in this case. It makes making comparisons between studies a little awkward though.

I am hard pressed to find an instance where the mean is actually a better measure of central tendency than the median. The purpose of the mean or median (or mode) is to provide a one number summary of a set of data. The whole idea of the mean is actually quite tricky, as you can read in one of my early posts about explaining what the mean is. Generally the summary value is used to compare with another sample or population.

In my lectures I often illustrated times when the median is a better summary measure of a sample or population than the mean. This is quite common in notes and YouTube videos. Never once did I show where the mean was preferred to the median! So why were/are we so loyal to the mean, bringing out the median for special occasions and real estate?

I think there are two answers, both of them no longer valid. It is a question of legacy.

Despite first appearances, for anything larger than a trivial sample the mean is actually easier to calculate than the median. Putting a set of 100 values in order by hand is no easy task. (Pain in the bum, as my friend so elegantly expressed it.) Adding up scores and dividing by 100 is a walk in the park in comparison. In the early 1980s when I learned programming (in Fortran, Pascal and Cobol), writing a sorting program was far from trivial and a large set of numbers would take a large amount of time to sort. Only in later years, as computing power has expanded, has it been possible to get a computer to calculate a median.

Means behave nicely and give nice mathematical results when manipulated. Because of this we can calculate confidence intervals using a nifty little formula and statistical tables. Until bootstrapping by computer became do-able on a large and small scale, there was no practical way to perform inference on a number of very useful statistics, including the median and the inter-quartile range.

A median is intrinsically understandable. It is the middle number when the values are put in order. End of story. – Well not quite – you do have that slightly tricky thing where the sample is even and you have to average the middle two terms, but apart from that it is easy!

A median is not affected by outliers. I learned a new term for this when I was reading up in preparation for writing this post. The term is “resistant” and I learned it from one of Mr Tarrou’s videos for AP Statistics. I found these videos after my tirade against videos on confidence intervals. Tarrou’s videos are long and a bit more mathematical than I would like. (He can’t help it – he is a maths teacher and the AP Statistics syllabus seems to have been devised by mathematical statisticians trying to put students off ever taking the subject again.) But they are GOOD. Tarrou’s videos are sound, and interesting and well put together. I will be recommending them as complementary to my own offerings. (Because I sure as heck don’t want to have to do all that icky mathsy stuff).

But I digress. The median is “resistant” because it is not at the mercy of outliers. There are lots of great examples, including in Mr Tarrou’s video. If you have a median of 5 and then add another observation of 80, the median is unlikely to stray far from the 5. However a mean is a fickle beast, and easily swayed by a flashy outlier.

The main disadvantage I can see for the median is that it can be a bit jumpy in small samples made up of discrete values. I guess if you have two well-behaved populations that are very similar and you want to see precise differences then the means might just be better – but even then you would possibly be over-interpreting small differences.

I have found it very interesting observing the behaviour of confidence intervals for the difference of two medians, compared with confidence intervals for the difference in two means. While I was preparing materials for our on-line resource, I performed nine such tests on different real data taken from students at university. The scores are very jumpy, and the differences between the medians often include exactly zero. Consequently the confidence intervals of the difference of two medians quite often have zero as their lower bound. This provides a challenge in interpretation, as I had not met this often when looking at the differences between means. However, it also illuminates the odd relationship we have with zero. Just because a confidence interval for a difference of two means is (-0.13, 3.98) and includes a zero, it is tempting to conclude that there is no significant difference. But is -0.13 really any different from zero in practical terms? The other point is that we should be leaving the confidence interval as it is, rather than stretching it into further inference.

I did a little surfing to see what the word on the web was. To find out who said what, drop the entire phrase into Google. (Ah ‘tis a wonderful we live in, indeed)

- “The
**mean**is the one to use with symmetrically distributed data; otherwise, use the**median**.” Hmm – but if the data is symmetric, surely the mean = the median? - “An important property of the
**mean**is that it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. “ Ok – hard to argue with that. - “Calculation of
**medians**is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.” Totally! - “However, when the sample size is large and does not include outliers, the
**mean**score usually provides a better measure of central tendency. “(Then goes on to give an example of when the median is better.) - “Use the median to describe the middle of a set of data that does have an outlier. Advantages of the median: Extreme values (outliers) do not affect the median as strongly as they do the mean, useful when comparing sets of data, it is unique – there is only one answer.

Disadvantages of the median: Not as popular as**mean.**” (Not as popular??!)

Sorry median – you do not win X-Factor for summary statistics. You may be more robust, and less fickle, not to mention easier to understand, but you just aren’t as popular!

I can feel a video coming on – the median has been relegated to the periphery long enough!

Here is our video about different summary statistics, which also addresses the relative merits of mean and median, and why they even matter!

021 268 3529 Call Us

## 40 Comments

Nice article. As a statistician, I’m a huge fan of the median. I’ve worked on a large simulation study, and even small departures from non-normality result in the median being a better estimate of central tendency (even in quite large sample sizes).

Most software gives a p-value for non-parametric tests, such as the Wilcoxon Rank Sum Test (WRST). What a lot of people don’t know is a neat trick to work out a confidence interval. If you have two samples, say two treatments in a clinical trial, if you add a constant to all the values in one sample, and the p-value from the WRST is non-significant, then it’s within the confidence interval. By either playing around, or adding an increasing range if constants to one sample, you can EASILY get the confidence interval.

Lastly, there’s one case where I do think it makes sense to use the mean, even when distributions aren’t normal. That’s where you are analysing amounts of money (say the cost of an illness). Quite often, distributions of cash are highly skewed, but there is interest in TOTAL spend. In these cases, the mean can be more relevant.

Thanks Kevin – great to hear from a practitioner. That is a really good point about the total.

Someone on Twitter pointed out “Depends on the application. Median is good for giving a “typical” value, but median speed won’t help me predict my travel time”

This is another case where what we really want is related to the total, rather than the average. Or something like that.

thanks dr have gain a lot

After reading Kevin’s comment and then reading the comment about “predict my travel time” I am kind of confused and have some questions. First why couldn’t you predict your travel time with the Median? Second, I run a crew of about sixteen installers for work. I want to track their install rates at ft/hr and see what I can expect from them in a days work. I want to use the Median, but after reading this last comment of predicting travel time it seems like the mean might be better? Most days they install about 300 ft/hr, but there are many outliers, especially with bad weather. Any help on this would be great, Thanks.

Hi Carson

It would be good to graph your data and look at that before deciding between the mean and the median. It sounds as if the median is a better option in this case.

The mean does have a smaller sampling error thatn the median and it is important to the calculation of the variance and standard deviation.

One number (mean, median etc) is seldom enough to describe a set of numbers, a standard deviation helps. But a plot of the distribution (cumulative?) is the often best answer.

So true, but the sad fact is that often only one number is given. Box plots and dotplots are emphasised in the new NZ curriculum, over single value summaries. In the world of politics and economics, however, we are usualy fed only the mean. I wish we were told the standard deviation more often. (As in ever!)

Regression is the main tool in the statitistian’s tool box and that based all around averages e.g. if Y is height and X is an indicator 0,1 for men and women respectively, than the estimates in a regression give you the average male hieght and the difference between the average male height and average female height. Since regression (and it’s generalisation) are pervasive in the applied literature then it’s quite hard to change. (I quite like quantile regression but it doesn’t give unique solutions.)

I think it’s also revolves around this theorem too (whose name I’ve forgotten) – “The most powerfull test of size alpha is the Likelihood Ratio Test” and most estimators coming out of maximising Likelihoods are means or functions of means.

Thanks for that. Makes sense.

Hi Dr. Nic,

To the best of my knowledge, bootstrap DOES NOT WORK for the median. There are asymptotic methods that involve estimating the density and there are non-parametric methods based on the median – basically inverting the sign test which leads to intevals whose endpoints are qunaitles (as I am sure you know).

In business, the mean is more “mean”ingful than the median! Would any business person really care about median monthly profit? Mean monthly profit means much more, because it you multiply it by 12 you get the total profit. Ditto for sport. The Aussie cricket team members might have a higher batting median than another team, but this would say little about the probability of winning. Batting averages on the other hand predict long-run team score.

Paul Swank. No, the median does not have a smaller sampling error than the mean. it depends on the udnerlying distribution. For normal data you are correct. The median is abou 67% efficient. For heavy tailed distributions, the median gets ebtter. But for laplace errors, it is more efficient than eny other estimator – it is the MLE!

“To the best of my knowledge, bootstrap DOES NOT WORK for the median. ” Is’nt that an over-statement? See for example Biometrika (2001) 88 (2): 519-534. (Brown, Hall and Young). “Even in one dimension the sample median exhibits very poor performance when used in conjunction with the bootstrap. For example, both the percentile‐t bootstrap and the calibrated percentile method fail to give second‐order accuracy when applied to the median.” Much depends on the distribution.

”

The bootstrap is certainly unsatisfactory for extreme quantiles.

Would any business person really care about median monthly profitI agree the business wouldn’t, but because the business actually cares about the total itself. Calculting a mean provides a cosmetic benefit over the total. The mean (“average monthly”) reveals nothing new. The mean is merely a more memorable/familiar unit.

Similarly, measuring in a metric unit rather than a less familiar unit doesn’t change the ‘true” quantity.

Actually, it may be outdated now, but there was an old saw about the income of lawyers, and it really dealt with the median, an astonishingly low dollar figure. The stratospheric incomes of the few highly successful skewed the average data, and led to misleading career advice regarding the profession. If one among a thousand people had won a billion dollar lottery, a million average looks pretty good, but the median would still be zero. The basic lay distinction is the mean or average is a mathematical result, focusing on the data taken from a number of samples, while the median gives equal weight to each sample, which can provide directions for additional study regarding what might account for distributions of results in the population. I once scored a 29 out of 100 on a calculus test, where two scored 98 and 99; I received a C, thankfully because the curve took note of the median score, not the average. Mean is essentially a single calculation, while median requires two steps in its calculation – an ordered distribution based on data, then a simple numerical midpoint in the distribution, independent of the data itself. Implicit, also, in any statistical analysis, is assuming relationships of some kind, which may not always exist; everyone has a height, but not everyone has an income. We can reasonably anticipate average and median heights to be close, but incomes especially in corrupt, bankrupt third world dictatorships may display peculiar statistical abnormalities. Numbers that reflect measurements of natural life processes may have a much narrower range, while those associated with abstracted measures of human invention can reach to the limits of the universe.

Thanks for those great examples to add to the discussion.

Hi Nic

The mean has the property of being the best linear unbiased predictor, as long as the distribution of the data is reasonably well behaved. A lot of stats analysis, as opposed to description, is geared towards being able to predict things, so the mean is therefore preferable. The distribution issue is not usually a problem when using the mean, as long as data is not really sparse, because of the Central Limit Theorem.

I have never personally found a use for the mode. However, I guess it must have a place in descriptive stats applied in some areas.

Thanks for that. I appreciate getting the balance of the argument.

I’ve always wondered about the point of the mode.

Hi Nic,

I thought the mode was only useful for ordinal data but otherwise was rather pointless as a measure of ‘central tendency’.

It is instructive that most of the comments relate to the purpose to which the estimate of mean/median will be used. Real statistical applications rarely have as their purpose the estimation of such a parameter, it is merely a step in the process. Unfortunately much teaching – high and low level – ignores this context.

This issue extends beyond the mean/median debate. For example, skewed data from many applications such as geochemistry is best approximated by the log normal. However it may not make sense to consider means on a log scale since they lose the additivity that may be fundamental to the application.

Orthogonal to this is the mathematical context. Medians are ugly mathematically and complex analyses based on them can be even more ugly (think of Tukey’s median polish, essentially iterative proportional scaling with medians – easy to describe, messy to do, impossible to really understand). While we should not over constrain any analysis to match our mathematical limitations, we should not ignore the enormous benefit we can get from applying mathematical understanding.

Thanks for that really helpful comment. As my interest is very much at the beginner/consumer level of statistical education it is great to have people provide a more advanced perspective.

Technically and as a Statistician, I prefer the Median to the Mean. The median is robust and resistance to outliers in the dataset unlike the mean which is highly affected/influence by extreme observations in the dataset.

Thanks

Perhaps it’s worth noting that in survival analysis, although things like “mean survival” are defined, they are very rarely used – the median, and other quantiles, reign supreme. Censoring of survival times means that the calculation of a mean involves extrapolation; the highly skew nature of survival times removes much of the interpretable meaning of the mean.

Please see MDST242 “Statistics in Society” – an Open University course. That uses median + quartiles + deciles + extremes. You can select from these and get measures of dispersion, skewness, kurtosis etc.. Confidence interval on the median is also a cinch.

Hi Nic,

In slight defense of the mean (ie, when you’re forced to present only 1 number – such as on a balance sheet)… generally it is better to present the mean in respect of liabilities rather than the median. Typically liabilities are skewed (if things go bad, they go very bad), so an estimate that responds to outliers is actually handy here. 🙂

For instance, when estimating the ‘long-tail liability losses’ for insurance companies the total losses are very skewed. Although the median is a better estimate of the centre of the distribution, the mean is a better choice for presentation in a balance sheet as it is much more conservative (and as a slight bonus, everyone ‘understands’ a mean so they tend to request it). In practice, you need to ensure that much more capital is available than the mean – assuming you want to stay in business.

It comes back, as several respondents have mentioned, to the purpose of the statistic. Generally I want at least three numbers: mean, standard deviation, and skewness.

Incidentally, another reason the mean may be preferred computationally over the median is that you can calculate it while only keeping track of three numbers: SUM(X) [ie, the sum of x to n-1], N, x. The median is harder to restrict in this way.

I think no-one has yet mentioned the question of multidimensional data, and defining a “location” for these. Suppose data consist of locations of lightning strikes, expressed as 2 dimensional coordinates. Then “nice” properties for a location might include that the selected point should not depend on the coordinate system used (invariance to rotation). In more general cases this might be extended to invariance to affine transformations. A co-ordinatewise median point does not satify these. Of course there is a multivariate version of the median that can cope with the rotation invariance, and there are others, some based on counting shells.But still, these seem to fail the requirement, which might be either an extension or a reversal of the aove, in that one might not want the per-coordinate location of the multidimesional location to depend on what other dimensions have been measured or are being considered.

Thus, in a multivariate setting, the mean is invariant to linear transformations, while the co-ordinatewise median is not. Neither the mean or co-ordinatewise median are affected by dropping variables that are not of interest. Still stating the co-ordinatewise medians may still be prefereable to the mean, or not, depending on what purpose you think the summary might be put. For example, suppose data consist of daily values of (weight of) sediment transported past a point in a river … then the mean daily value is immediately informative about the total weight transported in, say, a month, while the median daily value is not.

Other comments have emphasized describing more of the distribution than just the location. This applies even more to multivariate data.

In your post you said:

“The mean is the one to use with symmetrically distributed data; otherwise, use the median.” Hmm – but if the data is symmetric, surely the mean = the median?

The point here should be that: while “the mean = the median”, the properties of the sample mean and the sample median are different (and of couse their values are usually different). So your question really splits into two parts… what thing should you be trying to estimate as a summary of the data, and then how should you estimate that quantity.

I don’t see why you want to introduce “resiliance” in place of “robustness”. The latter term is standard statistical terminology.

As pointed out previously, calculating any statistic is either a step in a wider inference or is a quick summary. You are right that given any collection of numbers most people’s instinct is to add them up! The mean may not be a “meaningful” summary. Apart from skewed data, multimodal data may deliver a mean that has no relevance to the measured variable..The mean salary in a company is bloated by the CEO’s plundering. Regarding modes, tailors seem to plan for the modal number of legs, not the mean.

Someone mentioned lognormal distributions, and I work with these a lot. When any distribution can be transformed to symmetry, the median and the mode of the transformed values become similar, so the log-mean or geomean of a skewed distribution is reasonably estimated by the median.

I too am a fan of the median, but as one of Tukey’s five-value summary, hence the median rather than the mean is usually shown on boxplots. Means are sometimes added as symbols.

Someone else mentioned SD. Mean and SD are sufficient statistics to describe a normal distribution. If you *assume* your sample comes from a normal distribution, Mean and SD are sensible summary values.

Sorting is of course a well-studied computing problem (cf Knuth). It’s very odd to have people writing as if we were still in the 1950s when computer power was scarce and expensive. For any non-trivial stats calculations, get a proper stats package. Then you type in the values (or scan, or download), and get the whole slew of summary statistics (mean, median, mode, quartiles, min max, SD, trimmed mean, …) to examine for data cleaning before deciding which to report as *information*.

Allan

Several respondents have mentioned the difficulty of working with the median for purposes of statistical theory. Classically, expectations have been central; they are theoretical means. The theory is a good approximation, in modest sized samples, only if data is on the scale that is not badly asymmetric. Where a monotonic transformation is used, often log(), that leads to a roughly symmetric distribution, does one need to apply a correction to correct for the bias induced on the original scale? Not if the medians (which are unaffected by monotone transformation) are the appropriate measures.

Modern computational abilities free us somewhat from the constraints of an expectation-tied theory. ‘Somewhat’ is the key word here. As an aside, the limitations of that theory, and the common role of recourse to empirical approaches that may involve heavy computation, become even more important when dealing with the dependence that is often present in the observational data with which most statisticians work most of the time.

“The median is ‘resistant’ because it is not at the mercy of outliers.”

I love it! I’ll try it out on my students tomorrow.

I was surprised not to see a mention of the trimmed mean given that the median is the 50% trimmed mean and the mean is the 0% trimmed mean. Most statistical packages have a trimmed mean function.

The median and the trimmed mean lead on to the general area of Robust Statistics, which has been studied extensively since the 70s. Robust Statistics have not made much of an impact in applied statistics, at least beyond the area of descriptive statistics. Why is this? Well much of applied statistics is built around the linear model and the ANOVA decomposition, which is built on the properties of the L2 norm which underlies the mean and variance.

Another chunk of applied statistics is based on statistical models and the likelihood function. Many of the MLEs least to statistics that are not robust. One way to keep both the likelihood function and robustness is to mix the statistical model with a parameter-free heavy-tailed component intended to diminish the influence of outliers on model parameters. Rohan Maheswaran studied this approach in his thesis.

This is a great post, Dr. Nic, and I appreciate everyone’s discussions about it.

1) In summarizing or describing a data set, it’s always a good idea to use multiple summary statistics and visualizations. I like both the mean and the median, so I like the use both. In fact, I like the 5-number summary, the mean, the variance, and a plot of the data (histogram, bar chart, scatter plot).

2) The mean often shows up as a sufficient statistic for the parameters of many distributions. Thus, it is often used to find an estimator with a lower variance using the Rao-Blackwell Theorem.

3) The mean is often the maximum likelihood estimator for the parameters of many distributions. Beyond its use as a point estimator, it also has many nice large-sample properties that can be used for inference.

4) Echoing Peter Lane’s comment, a nice thing about the mean is the ease of using it for inference; its sampling distribution can be easily found based on the Central Limit Theorem!

Eric Cai – The Chemical Statistician

http://chemicalstatistician.wordpress.com

https://twitter.com/chemstateric

[…] I’ve written previously about what a difficult concept a mean is, and then another post about why the median is often preferable to the mean. In that one I promised a video. Over two years ago – oops. But we have now put these ideas into a […]

Each statistic has it’ s place. I most often use mean in situations where the values I’ m getting are precise and when all data needs to be taken into account, such as physical measurements from sensors. These measurements can still have outliers but the scientist must then justify their omission with a relevant criterion. Median is used in situations where the data can be sparatic and is heavily prone to bias or outliers. I would use median for instance if I was asking people to rate a movie beucase many may say 0 or 10 which is not an accurate reflection on how they actually felt. An example of this being done is how in judged sports the high and low scores are most often omitted. Conversely, there are some situations where only mean makes sense. For instance, if lottery winnings were described by the median they may show a 0 % return which is not a good representation of the return. If the same was done with mean statistic it may show a 67 % return which is what players should actually expect.

Hi Patrick

Thanks for your excellent examples of when to use the median and the mean. I like that they are contextual rather than just rules.

Nic

Sorry median – you do not win X-Factor for summary statistics. You may be more robust, and less fickle, not to mention easier to understand, but you just aren’t as popular!

in all the debate, that sense of humor though!!!!

the info is quite useful !!

[…] the measuring of income it uses mean income of the specialty instead of median of the specialty. Median income is the better than mean when it comes to methods of measuring income because it is more robust to outliers and more robust […]

If you transform the data with an affine transformation, then both the mean and the median (and mode) transform the same as the data. However, if you transform the data with a non-affine monotonic transformation, then only the median transforms with the data.

Thank you so much. God bless you.

Perfectly written and clear to understand. Thank you.

Very nice video!

I think the problem that your social media commentariat demonstrated is not so much the confusion about mean and median as a confusion about what we are trying to achieve by calculating them in the first place. For most people of non-technological vocation, that would be “representing the typical number”. Which is, for typical everyday distributions, more likely to be the median than the mean. I especially like that your video also includes the mode and strongly emphasizes the importance of actually looking at the distribution before deciding how to compress it into a single number.

I can think of only two things that I would add if I used this in teaching.

The first thing I might have added is to point out the reason for the skew of your example distribution, because it is so commonly occuring: the distribution is defined only on [0,+infinity[ since you can’t own a negative number of shoes. I might also have taken the opportunity to point out that this not infrequently leads to distributions which have their maximum at 0 and then just tail off to infinity. And that in such cases the central tendency, while perfectly definable mathematically, probably no longer tells you the most relevant thing about the distribution. But I guess you get to that in the next lesson 🙂

Secondly, I think that your graphics with the students being sorted for the median and then shoes and students just piled up and divided are useful for getting an intuitive idea about why the median is often a better single-number-measure: it contains more information about the distribution. You can compute the mean from less information than you need for the median. In your video this can be seen by comparing the simplistic concepts of the two piles for the mean with the more sophisticated treatment of first sorting and then divvying up the student group in two equal size groups.

…which brings us to another case in which the mean might have to be used.

Suppose we decided to make an independent measure of the distribution by sneak(er)ing into the empty student dorms on Barefoot Athletics Day in honor of Abebe Bikila. Due to the general student approach to tidyness, we come away with a great count of the total number of shoes for men and women, but a rather foggier view of precisely how these shoes couple to individual students. In this case we might conclude that the great uncertainties in our assignment of the individual shoes and the great accuracy of our total shoe count, 1623.5 pairs (that student tidyness again), might lead us to prefer the mean, since it can be computed from more certain data.