Regardless of the measured phenomenon, you want the right tool for the right job. Is the number of football matches the respondent saw in the last month the right measure of their football skills? Not really. Watching football matches on TV indicates an interest in sport but says nothing about the footballing prowess of the subject. The primary condition for a measuring tool is its validity. A valid tool is the one that measures exactly what should be measured. The use of invalid tools will result in invalid analytical conclusions.
But validity is not all. Assume you want to measure the level of vulgarisation of an online forum. The research team divides individual posts among themselves and count the occurrences of vulgar language. Everything looks fine at first; the measuring tool seems to be valid. Unfortunately, over time, it may transpire that an expression considered vulgar by one person is not given a second thought by someone else. If coders fail to set specific enough principles of coding before they undertake such work, the measurement will be unreliable.
The reliability of a measurement can be understood in three ways. The first is evident in the above-mentioned example, namely, the degree of unanimity among people who assess the phenomenon. If there is complete unanimity such that each person who performs the measurement obtains the same result, then the measuring tool is reliable.
The second way to understand reliability is to focus on measurement reproducibility over time. When measuring the length of the same wall in an apartment with the same ruler, the result should be the same each time (that is if nothing strange is happening to the apartment, i.e. environmental conditions are constant). In other words, the measuring tool is reliable when it is independent of the person performing the measurement and independent of time.
The third way to understand reliability is related to the internal consistency of the scales used to measure phenomena. In social and marketing research, you often have to handle constructs that cannot be observed directly. This is due to the nature of the investigated phenomena; opinions, attitudes, and behavioural drivers are not easily observable. To this end, the appropriate indicators have to be selected. Such indicators are often respondents' responses to individual questions in a questionnaire, however, sometimes it is necessary to build a scale made up of several questions. This is particularly the case when a respondent would not be able to respond if asked directly (“Are you neurotic?”) or when the response may be unreliable (“Are you intelligent?”). A common approach to research of phenomena that cannot be observed directly, is to ask respondents questions that are later transformed into indicators, also called indexes or scales. Imagine you want to measure customer service satisfaction in a service outlet. Service level comprises multiple elements: speed of service, politeness, competence, the extent customer requirements are met, etc. If you ask about each component separately and then sum up the responses, you can derive a service satisfaction scale.
Regardless of whether you construct the scale yourself or use a ready-made scale, you should verify the internal consistency of all its elements before putting it to work. One of the basic approaches to the verification of the internal consistency of scales is Cronbach's alpha measure. This topic is, however, so broad, it warrants a separate discussion.
As a summary, here is a short test on how reliability and validity can be understood. Read the question below and try to answer it before proceeding further.
If you adjust your bathroom scales to always subtract 10 pounds, the measurement will be:
- valid and reliable
- valid but unreliable
- reliable but invalid
- invalid and unreliable
*** The right answer is, of course, C.: the measurement will be reliable but invalid. It will be reliable because regardless of how many times you step on the scales, it displays the same weight. If someone else reads the value, it will also be the same. The issue is measurement validity; the results would be lower than the actual value. A small price for feeling better about yourself, isn't it?