What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data?

First of all, let’s look at what ordinal data is.

It is usual in statistics and other sciences to classify types of data in a number of ways. In 1946, Stanley Smith Stevens suggested a theory of levels of measurement, in which all measurements are classified into four categories, Nominal, Ordinal, Interval and Ratio. This categorisation is used extensively, and I have a popular video explaining them. (Though I group Interval and Ratio together as there is not much difference in their behaviour for most statistical analysis.)

Nominal is pretty straight-forward. This category includes any data that is put into groups, in which there is no inherent order. Examples of nominal data are country of origin, sex, type of cake, or sport. Similarly it is pretty easy to explain interval/ratio data. It is something that is measured, by length, weight, time (duration), cost and similar. These two categorisations can also be given as qualitative and quantitative, or non-parametric and parametric.

But then we come to ordinal level of measurement. This is used to describe data that has a sense of order, but for which we cannot be sure that the distances between the consecutive values are equal. For example, level of qualification has a sense of order

- A postgraduate degree is higher than
- a Bachelor’s degree,which is higher than
- a high-school qualification, which is higher
- than no qualification.

There are four steps on the scale, and it is clear that there is a logical sense of order. However, we cannot sensibly say that the difference between no qualification and a high-school qualification is equivalent to the difference between the high-school qualification and a bachelor’s degree, even though both of those are represented by one step up the scale.

Another example of ordinal level of measurement is used extensively in psychological, educational and marketing research, known as a Likert scale. (Though I believe the correct term is actually Likert item – and according to Wikipedia, the pronunciation should be Lick it, not Like it, as I have used for some decades!). A statement is given, and the response is given as a value, often from 1 to 5, showing agreement to the statement. Often the words “Strongly agree, agree, neutral, disagree, strongly disagree” are used. There is clearly an order in the five possible responses. Sometimes a seven point scale is used, and sometimes the “neutral” response is eliminated in an attempt to force the respondent to commit one way or the other.

The question at the start of this post has an ordinal response, which could be perceived as indicating how quantitative the respondent believes ordinal data to be.

What prompted this post was a question from Nancy under the YouTube video above, asking:

“Dr Nic could you please clarify which kinds of statistical techniques can be applied to ordinal data (e.g. Likert-scale). Is it true that only non-parametric statistics are possible to apply?”

As shown in the video, there are the purists, who are adamant that ordinal data is qualitative. There is no way that a mean should ever be calculated for ordinal, data, and the most mathematical thing you can do with it is find the median. At the other pole are the practical types, who happily calculate means for any ordinal data, without any concern for the meaning (no pun intended.)

So the answer to Nancy would depend on what school of thought you belong to.

All ordinal data is not the same. There is a continuum of “ordinality” if you like.

There are some instances of ordinal data which are pretty much nominal, with a little bit of order thrown in. These should be distinguished from nominal data, only in that they should always be graphed as a bar chart (rather than a pie-chart)* because there is inherent order. The mode is probably the only sensible summary value other than frequencies. In the examples above, I would say that “level of qualification” is only barely ordinal. I would not support calculating a mean for the level of qualification. It is clear that the gaps are not equal, and additionally any non-integer result would have doubtful interpretation.

Then there are other instances of ordinal data for which it is reasonable to treat it as interval data and calculate the mean and median. It might even be supportable to use it in a correlation or regression. This should always be done with caution, and an awareness that the intervals are not equal.

Here is an example for which I believe it is acceptable to use the mean of an ordinal scale. At the beginning and the end of a university statistics course, the class of 200 students is asked the following question: How useful do you think a knowledge of statistics is will be to you in your future career? Very useful, useful, not useful.

Now this is not even a very good Likert question, as the positive and negative elements are not balanced. There are only three choices. There is no evidence that the gaps between the elements are equal. However if we score the elements as 3,2 and 1, respectively and find that the mean for the 200 students is 1.5 before the course, and 2.5 after the course, I would say that there is meaning in what we are reporting. There are specific tests to use for this – and we could also look at how many students changed their minds positively or negatively. But even without the specific test, we are treating this ordinal data as something more than qualitative. What also strengthens the evidence for doing this is that the test is performed on the same students, who will probably perceive the scale in the same way each time, making the comparison more valid.

So what I’m saying is that it is wrong to make a blanket statement that ordinal data can or can’t be treated like interval data. It depends on meaning and number of elements in the scale.

And again the answer is that it depends! For my classes in business statistics I told them that it depends. If you are teaching a mathematical statistics class, then a more hard line approach is justified. However, at the same time as saying, “you should never calculate the mean of ordinal data”, it would be worthwhile to point out that it is done all the time! Similarly if you teach that it is okay to find the mean of some ordinal data, I would also point out that there are issues with regard to interpretation and mathematical correctness.

*Yes, I too eschew pie-charts, but for two or three categories of nominal data, where there are marked differences in frequency, if you really insist, I guess you could possibly use them, so long as they are not 3D and definitely not exploding. But even then, a barchart is better. – perhaps a post for another day, but so many have done this.

## 47 Comments

I would like to mention a couple of downloadable articles relevant to this post.

Warren Sarle has a very comprehensive FAQ on “Measurement Theory”.

ftp://ftp.sas.com/pub/neural/measurement.html

A more sceptical view of the area is given by Paul Velleman and Leland Wilkinson, each a statistical package developer, in the article “Nominal, Ordinal, Interval, and Ratio Typologies are Misleading”, an expansion of a 1993 American Statistician article available from

http://www.cs.uic.edu/~wilkinson/Publications/stevens.pdf

Sarle confronts some of Velleman and Wilkinson’s 1993 objections in his article.

Dr Nic suggests that there are not many statistical consequences of a difference between the Interval and Ratio levels. Maybe one might be that a log transformation is often appropriate with Ratio level data, but with merely Interval data y = log(x + alpha), with alpha estimated either formally or informally, might be required.

Thanks Murray. I will read the articles.

At the level I generally aim at, which is introductory statistics, the students are unlikely to need to know the difference, but it is worth bearing in mind.

Sure Nic – yes the level of these articles is well ahead of what a student could cope with directly but it helps for a teacher to be a step or two ahead of the class. I read both articles in earlier versions some time ago. It looks like both articles have been updated in response to feedback over the years. The kind of reader that they will help most is the statistician who consults with social scientists, especially psychologists. Despite (or because of) the fact that the articles take opposing positions I remember feeling that I came away feeling that I understood the main issues fairly well as a result of reading them.

Murray

On 8/07/2013 2:01 p.m., Learn and Teach Statistics and Operations Research wrote: > WordPress.com >

It’s a shame the dot-plot doesn’t get a look in in the video, or your comments. The authoritative tone of the video in particular gives the impression that the bar chart is the acknowledged king for ordinal data, but all statistical graphics gurus I have come across prefer the dot plot.

Hi Peter

That is a very good point. I really like dot-plots too and used them all the time when I taught using Minitab.

I suspect the problem was that I was writing the video for a course that used Excel for producing graphs, and of course Excel doesn’t do dot-plots. I will rectify that if I redo the video!

[…] Oh Ordinal data, what do we do with you? […]

Another common example of ordinal data at the “high” end of your scale is grouped interval data, i.e. where interval data has been grouped into a frequency table. This data can be displayed as a histogram and numerical summaries such as the grouped mean and grouped standard deviation can be calculated. In the pre-computer age this was a common exercise for students and there are still introductory Statistics textbooks around that contain these formulae (the early editions of Black’s Business Statistics spring to mind). It can be argued that with today’s computing power the need for discussing grouped data has gone, but nowadays a lot of “real” data is only published in frequency tables

[…] What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data? First of all, let’s look at what ordinal data is. It is usual in statistics and other sciences to c… […]

“qualitative and quantitative, or non-parametric and parametric” – PLEASE don’t repeat an old canard. There are no such things as parametric and non-parametric data. Those terms apply to the model applied to the data.

On ordinal data, it depends on whether you impute an underlying scale or have a set of nominal categories that are ordered but not on any measured dimension – defining the person with higher qualifications as cleverer seems a circular definition. There is skill and experience in this data analysis game – it’s not just do the sums and out pops THE answer.

Pie charts – are popular and have a place as presentation devices. Like any rhetorical device, they can be used to give emphasis so that the message is clarified or can be abused to distort the plain message from the data. Hence I agree in loathing 3-D pies because they introduce an uncontrolled distortion but have no issue with exploded segments: “among the categories shown we highlight … .” Graphs are misused because people are not taught to draw or read them – it’s assumed to be inate.

Dear Dr Nic

You ask what do we teach? I’d like to quote from Statistics at Square One 11th ed (Campbell and Swinscow, 2009, p22)

“… the mean from ordered categorical variables can be more useful than the median, if the ordered categories can be given meaningful scores. For example, a lecture might be rated as 1 (poor) to 5 (excellent). The usual statistic for summarising the result would be the mean. ”

Our research (into quality of life measures) shows that in many cases the mean of a set of ordinal data gives a ‘good enough’ result, although obviously care is needed!

Mike

Thanx Dr. Nic for the wonderful lesson, I am just curious if the gaps between ordinal data are equal, should we still avoid the mean?

It depends on the context. If you can guarantee that the gaps between the ordinal data are equal, you in fact have interval data, and it is fine to calculate the mean. The trick is that the gaps can look equal, but if you think hard about what they are actually measuring, we cannot guarantee that they are.

Likert is considered as interval scale instead of ordinal scale. Prof. naresh Malhotra in marketing research also of same opiniion

So far, as I am beginning in this stats world, I have found your explanation the most helpful! What I am struggling with is how to make customer satisfaction data more meaningful than “you have a 96% satisfaction rate.” (on a scale of 1-5).

[…] Oh Ordinal data, what do we do with you?. […]

[…] one of the most consistently viewed posts in this blog is one I wrote over a year ago, entitled, “Oh Ordinal Data, what do we do with you?”. Understanding about the different levels of data, and what we do with them, is obviously an […]

Lets not dumb it down and perpetuate bad practice. Here is what I say

If the data is ordinal one is usually interested in the proportion of respondents givign a response ‘at least as high as some value’. E.g if responses are like a lot, like, neutral dislike, dislike a lot then onew ould like to now how many chose like or like a lot. thes choices may be albelled 1- 5, but never take mean

So descriptively show CUMULATIVE graph of proprotion choosing: 1; 1 or 2; 1,2,or 3;1,2,3,or4.

There is a procedure for comparing groups on ordinal data called ordinal regression. It is available in all commonly used statistical packages. What is compared is a transformation of cumulative proportion for each group or treatment. The transformation is either logistic or normal. The reason for the transformation is that raw proportions may have floor or ceiling effects.

this is within comprehesnion of businessa and pscyhology students in my experience

If non-parametric comparison means rank methods, eg. mann-whitney, wilcoxon etc, then these are as inappropriate as normal based methods. they assume that all groups have the SAME shape distribution, which is impossible for scales with few options, e.g. typical Likert ITEMS and unlikley fro scales with floor or ceiling effects.

However, normal based methods and means are routinely used for Likert SCALES composed by summing scores on many Likert items. This will not be too. It misleading if the sum scores are reasonably normally distributed. It will be misleading if there are strong floor or ceiling effects, as occurs for many diagnostic scales. Fro example the Beck depression score has more tha 20 items but the majority of ‘non-depressed’ have scores below 5.

Cumulative proportion is the best summary statistic for any ordinal scale. Ordinal regression is the appropriate analysis. Now avaialble in most statistical packages for within as well as between group comparisons.

These methods are rarely described in introductory text, of course.

e.g. Dr. Nic does not mention cumulative proportions or ordinal r logisitc regression. Why?

This needs to CHANGE. The concepts are the same as for normal based methods, so it is a samll extention to ordinal methods, as the packages are availalble

Nice article. People tend to seem dislike dealing with this sticky issue. And even among those who say they are “purists” you see them commonly treat types of data which are ordinal (like IQ, which is really more of a ranking and certainly not quantitative) as if it was quantitative.

I’m running into this issue right now dealing with a statistical problem involving the average of numerical rankings for a group of institutions. It’s not obvious which way to correctly consider the data, and I guess in the end, statistics being the empirical science that it is, the answer is “whatever works best”

[…] What can you do with ordinal data? Or more to the point, what shouldn’t you do with ordinal data? First of all, let’s look at what ordinal data is. It is usual in statistics and other sciences to c… […]

Comment

You said “… if we score the elements as 3, 2 and 1, respectively and find that the mean for the 200 students is 1.5 before the course, and 2.5 after the course, I would say that there is meaning in what we are reporting.”

Some question in your statement are:

• How do we can rate “very useful=3”, “useful=2”, and “not useful=1?”

• Why not “very useful=1000”, “useful=100”, and “not useful=10?” OR “very useful=A”, “useful=B”, and “not useful=C (assume that A better than B, and B better than C)? ”

How can calculate an average of 200 students, whereas 1, 2, and 3 is not a number but just coding or label that has increasing order like c, b, and a? During my study of mathematics and statistics have never been no such thing as algebra of label. If the calculation of the average is justified by mathematics and a score of 1 to 3 to form a continuous continuum of “not useful” to “very useful”, get you explain a “measure unit” of this measure? How is it “useful” twice “not useful”? Is “useful” plus “not useful” same with “very useful?” If your goal just want to see the development of student achievement (performance), the “signed test” of change from “before” to “after” of the intervention is enough, and does not need to justify the ordinal rules for various reasons. Your reason and justification may have not only mislead the people who do not understand the measure theory but invites people to justify the wrong rules in life sciences.

Thank you.

Budi Hari Priyanto

Statistician

Hi Budi

Thank you for your contribution.

I just voted for ‘never’ quoting a mean for ordinal data. Then I read the article and yes, we do occasionally average things that look ordinal. But … what have we (and in your example, you) averaged? Usually, it is not the raw values, except by coincidence; ordinal raw values are not really numbers at all. ‘not useful’, ‘useful’, ‘very useful’ clearly do not have an arithmetic mean. But what the example did was, consciously or otherwise, rank them and then average the ranks. Rank is an interval scale; and we can do a reasonable collection of traditional statistics on ranks. It’s not hard to defend tests on mean rank as an indication of improved ranking. We can’t easily put the mean rank back on the original scale – is 2.4 really closer to ‘useful’ than ‘very useful’? – but we can test pretty reliably for an improvement between two treatments. Quite a lot of ‘score’ averaging used on ordinal data is just proxy rank averaging.

One other comment; be wary of lumping ratio and interval scale data together. Not long ago I caught a lab staff member, based on a moderately defensible habit developed from concentration data, calculating the relative standard deviation for a cold room temperature thermometer near 4 degrees Celsius, then using that to infer the dispersion at room temperature. Lab thermometer readings do _not_ change in precision by a factor of five over a 16 Celsius range….

Thanks Steve. As is usual in Statistics, the answer is ‘it depends’. I see your point about Interval/Ratio. Something to be aware of.

I strongly agree using non-parametric to analyze the ordinal data because the rating or rank scale such as 1, 5, 10, 20, 40 or 1, 10, 100, 1000, 10000 have the same meaning with the scale of 1, 2, 3, 4 and 5. Rank transformation will change the ordinal into interval “relative.” I said “relative” because it depends on the sample size. As an alternative, the ordinal data can be analyzed using frequency comparison. Whatever the reason, ordinal data can not be treated as interval/ratio scales (see Fraenkel et al 2012: How to design and evaluate research in education 8th edition; Glass & Hopkins (1996): Statistical methods in education and psychology 3rd edition; Sheskind, D. J. (2004) Handbook of parametric and nonparametric statistical procedure 3rd edition; Pagano, R. R (2013): Understanding statistics in the behavioral sciences 10th edition; etc.).

Thanks for this very informative article.. I think those practicing in statistics should read this one first before doing in analysis..

Thanks! They could watch the video too.

I am in a dilemma when it comes to voting on ordinal data (scores):

Scoring options for an event

Score 1 = no error was made

Score 2 = possible error but need to see trend (track and trend)

Score 3 = an error was made

10 voters with the following scores:

Score of 1 – 3 votes

Score of 2 – 3 votes

Score of 3 – 4 votes

There is no majority. To me you would continue to discuss and vote until you get a majority – or – to save time you can use the median. Using a mode (pleurality) does not make sense to me in this situation. I can’t find any literature on this.

Thank you

Hi Doug

That is a really interesting problem. I agree that mode makes no sense. If you had 11 voters, 5 of which voted 1, and 6 voted 3 would give a mode of 3, which does not summarise what is happening. In that case the median would also be 3 – so that doesn’t make much sense either. I agree discussing is important. I hate to admit it, but there is a case for a mean. (waits for people to throw fruit!).

I’m not going to throw fruit. But I don;t think a mean would be sensible. That;s because I am not clear that these are ordinal data at all. They correspond roughly to ‘Yes’, ‘No’ and ‘I/we need more information before I/we choose Yes or No’. Can we really call that ordinal?

In terms of interpretation, often the answer depends on the intent. If the aim was to identify consensus (or at least a majority) on which outcome to choose, the mode is the obvious candidate. A median or mean would not do that reliably even if the data were ordinal; with a discrete ordinal scale it is perfectly possible to have no ‘voters’ at all at a median value and for any decent sized data set it’s almost certain there will be none at the mean. If, on the other hand, we wanted to compare two groups reporting on ordinal scale a median might be more useful, as would a mean if we believed we could average ranks meaningfully. But if – as I think – this is better considered categorical, a contingency table would probably tell us more and apply to either data type.

I have ordinal data , can I do descriptive statistics? if no then what is the procedure ?

If i have ordinal data i need get Mode . do I need to do hypothesis test for the mode?

Ordinal data is a challenge. There is no hypothesis test for the mode. However, you might like to look at proportions instead. For example you might like to see if there is evidence that more than 70% of people agreed or strongly agreed. There are tests for proportions.

Hello

I have a question

I need to find a relationship between my data. I am studying about landscape health, one side I have three class of Landscape health including (Health, At risk and Un health(3, 2 and 1 level)) and other sides I have some environmental variables like soil organic matter, precipitation and so on those are quantitative variables

now for finding a relationship between Health class (Qualitative variable) and environmental variables (Quantitative variables) which model do you suggest

I appreciate helpppppp me

Hi

I’m afraid this is not a question that can be answered without careful consideration of the data, the levels of measurement and the background context. First idea is to graph every combination you can think of that makes sense, and look for relationships. It is preferable to start with an underlying theory of a model. Do you have any literature with such models?

Sorry I cannot be much help.

Dear Dr Nic

many thanks for your response

I provided a graph for each variables and relationship was obvious between health levels and some variables, specially soil variables. but there were no relation between topographical variables and health levels.

I could not find related paper until now, I should search more about this.

I need to know is Ordinal logistic regression is a good way or not. I mean in ordinal logistic regression both side of equation are ordinal (dependent and independent) or just one side is ordinal and other side is interval

[…] but differences between them is not known. An example of ordinal data is the Likert scale. Which a statement is given, and the response is given as a value, often on a 1 to 5 scale, either showing agreement, […]

[…] The uses for ordered data is a matter of some debate among statisticians. Everyone agrees its appropriate for creating bar charts, but beyond that the answer to the question “What should I do with my ordinal data?” is “It depends.” Here’s a post from another blog that offers an excellent summary of the considerations involved. […]

Hi Nic,

Thanks for the intro! What surprises me is that the only example of ordinal data is the Likert scale or small deviations. I have an example where I would like to have your opinion on whether it is indeed ordinal.

Let’s say we questioned people with hypertension. They all know and we are interested in their actions to counter it. People may;

– Have no action.

– Monitor blood pressure.

– Set reduction targets.

– Are in a structured diet/programme to reduce the blood pressure.

In my opinion, there is some order. I would argue having no actions is worse than monitoring, but just how much worse is ‘undefined’! Would you argue this is ordinal?

Now let’s say we know also gender, age, etc. and we wish to find relations. Would ordinal rather than nominal treatment even be benficial for analysis?

Now lets say people may also be doing more than one action, which makes it multiple response and even more complex 😉

Like many things in statistics, there are different ways to classify this. I have seen work where people endeavoured to come up with a more detailed classification. The argument about whether this is ordinal is most relevant when it comes to analysis or graphs. As there is some order, I would tend to present the results in a graph in this order.

Ah multiple responses!

An interesting piece

But how would one use qualitative data

For instance if I was to ask nonsense questions of children

How would I classify the answers?

Hi Rose

I’m not sure what the point of asking nonsense questions is. If there are open-ended answers we would need to use qualitative analysis of some sort.

This conversation has been extremely helpful, however I have a somewhat unique scenario for which I would like feedback if anyone is able.

There are three groups:

Best strategy (assigned a value of 0)

Medium strategy (value of 1)

Worst strategy (value of 2)

These groups do have a defined definition (meaning they are not subjective assignments). For each participant, there are four tests performed each day over 8 days. Currently I am averaging the four tests for each day and plotting that over time. It has been suggested that I consider summing the data instead, however for some participants on some days there are less than four data points so I haven’t gone that route.

There are two groups of participants and I am trying to determine if there is a difference between the two groups. For my initial analysis I dropped the consideration that this data is ordinal and performed a repeated measures two-way ANOVA where my factors are day and group. Group is highly significant both times I’ve done this (I also perform this test at two ages, which I analyze separately).

I’ve obviously made a bold assumption that I can consider this averaged data to not be ordinal when it was derived from ordinal data, so I’m not confident that my stats approach is valid. Can anyone help me out with this? I have a decent stats background but am a scientist, not a statistician. TIA!!

Hi Mackenzie

Sorry I can’t really help you from here. It takes quite a bit of time and explanation to understand a complex test such as you describe. There are consulting groups who might be able to help you.

I like that we have a lot of interaction that is useful on this post. But, for a person with no background on it, reading and watching the video by Dr Nic is good enough. There is no need to raise the depth of understanding on people who are just reading it for the first time, so I believe this is a good post and not as amateurish as the other professors above have deigned to remark.

Thanks for that – I’m happy you found it useful. It is good to have a range of depths for different readers, I think.

I am interested in a disease that kills my crop. I go to the field and I score the plants based on the severity of disease symptoms where 0 is healthy and 10 is dead. I could as easily score 0 to 100, or 0 to 5. I am binning a continuous variable. A mean outcome in the original 0-10 scale of 4.386 seems reasonable even though my ability to distinguish a rank of 0 to 1 may be different than 5 to 6.