Welcome to the world of optimal scaling, i.e. the conversion of a qualitative variable into a quantitative one

Reading the article will take you: 4 minutes.

Optimal scaling is an alternative to traditional multidimensional data analysis techniques. Optimal scaling techniques are especially recommended for those who most often analyze data of nominal or orderly measurement level. The main idea is to assign numerical values to the categories of a qualitative variable using certain criteria that optimize the solution. These criteria obviously depend on the analytical procedure used.

All right, but is it so difficult to assign numerical values to individual categories of a qualitative variable? After all, already at the stage of designing a questionnaire for survey purposes, we plan how the respondents' answers will be coded. For example, we decide, whether the reply 'decisively agree' is coded as 1, as 5,
or in the form of a different number. So we assign numeric values to variable categories ourselves.
Well, yes, but in this case the values of particular categories depend on the arbitrary decision of the analyst.
For some applications, these codes are enough for us. However, even when we want to calculate the arithmetic mean, controversies arise.

If the choice of codes was contractual, then what grounds are there for us to take seriously the value of the average calculated on their basis? Numerical values of qualitative variables do not have metric properties, which means that we should not perform arithmetic calculations on them. Codes are used only for convenient data storage, to facilitate variable transformations, etc. Meanwhile, optimal scaling means that the numerical values obtained by variables have metric properties. In other words, we want to optimally transform qualitative variables into quantitative ones. This will become even more clear when we use the example.

Respondents were asked questions about the importance of different attitudes in life. Among other things, they were asked about the importance of fun as well as the importance of appropriate behavior. Both of these variables were measured on a scale from 1 to 6, where 1 meant 'very important' and 6 meant 'completely unimportant'. As a result of the optimal scaling, the quantifications of individual categories have changed. Old and new quantifications are shown in the chart below.


Figure 1. Quantifications of categories of two ordinal variables regarding the importance of proper behaviour and fun in life

Figure 1. Quantifications of categories of two ordinal variables regarding the importance of proper behaviour and fun in life


What can we see? Have a look at the line representing the quantification of the fun variable first. It turned out that there is a real gap between ‘very important’ and ‘important’! This means there is a large difference between agreeing with the statement and a ‘devoted’ support for it. On the other hand, the difference between ‘unimportant’ and ‘completely unimportant’ is lesser than suggested by the initial coding. It is only 0.7.

Now, to the proper behaviour variable. If someone found it important in life, they were close to agreeing it was very important. Likewise, there was a short distance from finding proper behaviour ‘unimportant’ to consider it ‘completely unimportant’. For categories in the middle of the scale, the differences are large.

When using optimal scaling, you can come across the level of measurement and across the idea of optimal scaling level. Their similarity may be misleading, so we will discuss the difference. There are three basic levels of measurement: nominal, ordinal, and quantitative or numerical[1]. A nominal variable is a variable the values of which represent unordered categories. Examples of nominal variables are region, favourite broadcasting station, car make, or religious affiliation.
An ordinal variable is a variable the values of which represent ordered categories. Examples include education, customer satisfaction scale value, or product score. A quantitative variable is a variable the values of which have a meaningful metric such as age in years, income in pounds, height in centimetres, or the number of purchased products. Levels of measurement are constant characteristics of variables but the selection of the scaling level provides a slight room for analyst's decisions. The selected scaling level does not have to be the same as the measurement level. The scaling level determines whether algorithms for calculating quantifications are limited, for example, by the order of categories of the variable.

The most restrictive scaling level is the quantitative level as it imposes the most limitations. In practice, it is rarely used. Some other scaling levels include ordinal, nominal, and multiple nominal. The ordinal level is appropriate when you know the order of the categories and want to retain it. An example could be a variable the values of which represent four categories of respondent's education. The initial coding scheme was:

1 primary
2 vocational
3 secondary
4 higher


During the analysis, individual categories of education are requantified, for example:

1 primary
1,2 vocational
3.9 secondary
4 higher


This result would mean that in terms of the investigated feature, the difference between vocational education and secondary education is much greater than, for example, between secondary education and higher education. The most important thing is when using the ordinal scaling level, you can be sure the order of categories remains the same. Secondary education will not be quantified higher than higher education.

It is not the case for the nominal scaling level. Here, the initial order of categories may change. Use this level when you do not know the order of the categories but want to order them using the analysis. It may also come in handy when you theoretically know the order of the categories but want to give the algorithm more freedom to change it.

A good example could be the attitude towards voluntary service in various age groups. It turns out that the youngest and oldest people readily participate in voluntary service while middle-aged people are unable to fit it in their schedules. In this case, categories of the youngest and the oldest could be put next to each other.

The last scaling level I would like to mention is the multiple nominal level. Use it if you do not plan to set the categories in any order. For example, when the goal of the analysis is to identify groups of similar broadcasting stations.

Analytical techniques that employ optimal scaling mechanisms include correspondence analysis or principal component analysis for qualitative data. But this is a topic for another post.


[1] Further division of scales is omitted intentionally. More about this topic can be found in Frankfort-Nachmias Ch., Nachmias D., Research Methods in the Social Sciences.

Rate article:

Share the article on social media