A reliable tool to measure temperature will be a thermometer, alcohol concentration in exhaled air will be measured with a breathalyser and body weight with a bathroom scale. In survey research, the measurement tool is the questionnaire and the questions it contains. If we want to determine the respondent's age, it is sufficient to ask the relevant question - for example, about the year of birth. Whatever phenomenon we are measuring, we want the measurement tool to be well chosen.
In survey research, the concept of reliability, as well as accuracy (more on this below), are among the key issues that allow researchers to assess the quality of research tools and the reliability of the results obtained. This translates into the validity of conclusions and their applicability in practice.
Difference between reliability and relevance
Before addressing reliability, it is worth taking a closer look at the concept of relevance. A good research tool is not only reliable, but also relevant.
Will the number of football matches watched on TV in the last month be a good indicator of footballing ability? Not likely. Watching matches on TV may be relevant if we are investigating interest in football or engagement with sporting culture. However, trying to use this variable as an indicator of playing skills is misguided because it does not measure what we actually want to study.
One of the basic conditions that a measurement tool should meet is relevance. A relevant tool is one that measures exactly what it should measure. The use of a tool that is not relevant will result in potentially misleading conclusions being drawn from our analyses. The relevance of indicators is therefore crucial in the context of research design, so that the results correspond to the real phenomena we want to understand or measure.
However, accuracy is not everything. Suppose we want to measure the level of vulgar language on a certain internet forum. The research team divides the individual posts among themselves and counts the number of vulgarisms in them. It seems that everything is fine, after all, the measurement tool has been chosen correctly. Unfortunately, after a while it turns out that an expression that one person feels is highly vulgar, for another is not even worthy of being counted among the inappropriate expressions. If the coders have not established in sufficient detail the rules for coding expressions before starting the work, such a measurement will be unreliable.
Reliability in surveys
The reliability of tools for measuring a phenomenon can be understood in three different ways. The first one we dealt with in the example described above. Reliability here is understood as the degree of agreement between the persons assessing a phenomenon. If a measurement tool is reliable, we can be sure that every person who measures with that tool will get the same result.
The second understanding of reliability concerns the repeatability of a measurement over time. By measuring the length of a selected wall in a flat repeatedly with the same tape measure, we should get the same result each time. So reliability refers to the consistency and repeatability of the results of a survey tool. A tool is reliable if it gives stable and similar results when used repeatedly under the same conditions.
The third understanding of reliability concerns the internal consistency of the scales used to measure phenomena. In social and marketing research, we often deal with constructs that are not directly observable. This is related to the nature of the phenomena under study. People's opinions, attitudes and motives for behaviour are not easy to observe. To do so, it is necessary to select appropriate indicators. Although indicators are often respondents' answers to single questions in a questionnaire, sometimes it is necessary to construct a scale composed of several questions. This is most often the case when there is a concern that the respondent, when asked something directly, will not be able to answer (‘Are you neurotic?’), or that their answer will not be reliable (‘Are you an intelligent person?’).
A common approach to studying phenomena that are not directly observable is to ask respondents a set of questions that are transformed into indicators called indices or scales at the data analysis stage. Imagine that we want to measure customer satisfaction with the level of service at a service point. The level of service is made up of many components: the speed of order fulfilment, the courtesy and competence of the service staff, the degree to which all customer requirements are met. If we ask about each of these components individually and then add up the respondents' answers, we obtain a service level satisfaction scale. Regardless of whether we build the scale ourselves or use a ready-made scale developed by someone else, before we start using it for analysis, we should check whether such a scale is reliable, that is, in this sense, whether its individual components are consistent with each other. One of the basic approaches to testing the internal consistency of scales is to use the alpha measure, designed by L. J. Cronbach. This is one of the basic measures of scale reliability that is available in PS IMAGO PRO, in a simple to calculate manner.
Summary
Meanwhile, as a conclusion, I include a short test of your understanding of the concepts of reliability and relevance. Read the question and try to answer it before continuing.
If you adjust your bathroom scale so that it always ‘subtracts’ 5 kg from your measured weight, the measurement made with this tool will be:
- accurate and reliable
- accurate but unreliable
- reliable but not accurate
- not accurate and not reliable
Of course, the correct answer is 3 - the measurement will be reliable but not accurate. Reliable because no matter how many times you step on that scale, you will read the same value. When you ask another person to read the result, you will also get the same answer. The problem here, however, would be the accuracy of the measurement - the results would underestimate the true value.