In this post we will address which statistics are worth noting when preparing for scale reliability analysis, and how to prepare data for this analysis.
Let's assume that we are studying bank customers' attitudes towards saving money. The indicator of their attitude to saving is the summary scale created from the following statements:
- I prefer spending to saving money;
- Having savings gives me a feeling of security;
- Buying new things makes me feel better;
- I feel uncomfortable when spending money;
- I am proud of my ability to save money;
- Saving proves prudence.
For each of these statements, the task of the respondents was to mark an answer on a scale from 1 to 6, where 1 meant "definitely disagree", and 6 meant "definitely agree". Let's note that for the statements b, d, e and f, the higher the respondent's answer, the more positive the attitude towards saving. For the statements a and c it is the exact opposite:higher values of the answers indicate a negative attitude towards saving. In such a scenario, before conducting the reliability analysis, we have to re-encode the variables so as to ensure that the high values always indicate the same (in our case, a positive attitude towards saving). It is preferable to create new, properly transformed variables (a2, c2) and use them in further analysis. After the transformation, the variable "I prefer spending to saving money", becomes "I prefer saving to spending money". Similarly, the variable "buying new things makes me feel better" becomes "buying things spoils mymood". The above transformations, or scale reversal, – are made at the data preparation stage of analysis.
Now let's move on to the statistics that are worth focussing on before the complete scale reliability analysis is performed. Firstly, it is worth checking what part of the observations has been excluded from the analysis due to data shortages.
Table 1. Information about data being analyzed
One of the most frequently used methods of treatment of data shortages is listwise removal of observations, which is the method used in this case. This means that if the respondent has failed to answer one ore more of the six questions, all the respondent's answers are not taken into account.
It is worth thinking about the method of treatment of data shortages for several reasons. First, if the percentage of the excluded observations would be too large, there is a risk that the results of the analyses would not be credible. In such cases, it is worth thinking about applying an alternative method for addressing data shortages. Second, if we calculate the value on the scale of the attitude towards saving for each respondent, we will have to decide what result the respondents with data shortages are to receive, and then consider the overall effect this will have on the final analysis. In this example, the excluded observations with data shortages account for 4.7% of the total. It is not an insignificant amount, however, we may afford such a loss.
As a next step, it is worth familiarizing ourselves with the basic statistics of the variables being analyzed.
Table 2. Statistics of items
The questions included in the scale are referred to as scale items. The average and standard deviation have been calculated for each item. The respondents most strongly agreed with the statement having savings gives me a feeling of security. A high average is also recorded for the (transformed) statement I prefer saving to spending money. The lowest average has been observed in the case of the (also transformed) statement buying new things spoils my mood. At the same time, this statement has the largest standard deviation, which means that the respondents were not as consistent in their assessments as they were the other statements. This large standard deviation may direct us to recognize this item as "suspicious" and one that can potentially spoil the integrity of our scale.
Another step is to become familiar with the correlation matrix between the items.
Table 3. Correlation matrix between the items with gradient coloring* of table cells
In this table we do not have to look at all the values, but only at the lower triangle (values located under the diagonal). The upper triangle is a mirror of the same information.
The statements with the highest correlation are: I am proud of my ability to save money, and, I feel uncomfortable when spending money. This means that the stronger the pride a respondent feels in their ability to save, the stronger the discomfort this person feels when spending money. Other strongly correlated statements are the following pairs:
- Saving proves prudence, and, I feel uncomfortable when I spend money,;
- I am proud of my ability to save money, and, Having savings gives me a feeling of security.
On the other hand, the weakest correlation is between the statements Buying new things spoils my mood, and, I prefer saving to spending money.
Similarly we can analyze the covariance matrix between the items (I have not included it in this post). On its diagonal there are variances of particular items, and in other cells thecovariances between pairs of items.
In the study of both of the matrices, it is worth paying attention to how our "suspicious " item correlates with others. In our case the values in the matrix cells are relatively aligned and the item Buying new things spoils my mood does not seem to stand out strongly from the others.
So far, we have focused on the particular items of the scale. However, we cannot forget that our goal is to build a scale where the values are the total value of all items. The scale statistics table below gives us the following: average, variance and standard deviation of a scale consisting of all six items being analyzed.
Table 4. Scale statistics
Note that our scale can take values from 6 (when a respondent chooses the lowest possible value, namely 1 for all items) to 36 (when a respondent chooses 6 for all the values). The average of 26.6 on this scale seems quite high and indicates a positive general attitude of the respondents towards saving.
A review of these statistics before completing the reliability analysis allows for a fuller understanding of the data being analyzed. It does not take much time, and may help you to avoid mistakes and draw incorrect conclusions. The scale reliability analysis itself is for the subject of a separate post.
*Gradient coloring of table cells is one of 50+ custom built procedures included with PS IMAGO PRO.