# When letters count in a table

Reading the article will take you: 2 minutes.
A common task analysts are often faced with is to establish whether there are any relationships between variables and, if there is a, what is the nature of the relationship.

In the case of qualitative variables, you want to establish whether the assignment of one variable to a category varies the distribution of categories of another variable. This is why we build crosstabs that help identify possible relationships between variables in a sample. If the sample is a representative sample, the next step may be statistical inference,the analytical step aimed to determine whether the identified relationships apply to the general population. For qualitative variables, we commonly use the chi-square test for this purpose. Still, for an inquisitive analyst this is just the beginning of the road because the chi-square test determines whether there is a relationship but provides no insight into the strength of the possible relationship (relationship strength is measured with a separate group of tests) or the specific categories of variables that are significantly different to each other. The latter is handled by the column proportions tests.

Table 1 shows the distribution of responses to the question regarding the frequency of Internet use depending on the occupational status of the respondents.

Table 1. A crosstab

Pupils, students, and employees most often replied ‘every day or almost every day’. The other groups (retired and disabled people, unemployed, homemakers) most often chose ‘not at all’. Even among these groups, there are significant differences in response distribution. Compare, for example, homemakers to retired or disabled people. You will see that the first group is more heterogeneous. Among homemakers, there is almost the same number of people who use the Internet every day and do not use it at all. Similarly, if you compare pupils and students to employed people, almost everyone (over 96%) in the former group uses the Internet at least a few times a week. Among employees there are relatively many (almost 26%) who do not use the Internet at all.

The question now is which of the differences would be statistically significant. To find out, we may use column proportions tests, also called z tests. The z statistic corresponds to the chi-square statistic calculated for every pair of cells separately (within a table row). The significance level should then be adjusted for multiple comparisons. In this example, we used the Bonferroni correction and the results in the form of letters can be found in Table 2.

Table 2. Column proportion tests results, Bonferroni correction

Let's first take a look at those who use the Internet every day or almost every day (the last row of the table, above the Total row). Employees were marked ‘a’. Is there any other category with the same mark in this row? Yes, the unemployed and homemakers. This means these categories are not significantly different to the employees in terms of their daily use of the Internet. Pupils and students are substantially different to the other groups. They are the only ones marked with a ‘b’. The same applies to pensioners and the disabled. They are different to the other categories because the ‘c’ is not found anywhere else. We can analyse every row of the table the same way. Take a look at the ‘several times a month’ row, for example. The cells for pensioners and the disabled have ‘b’ and ‘c’. This means that the number of pensioners and the disabled who use the Internet several times a month is not significantly different to pupils and students (‘b’), the unemployed (‘b’ and ‘c’), or homemakers (‘c’).

As you can see, crosstab cells may contain not only numbers but also letters and the ability to read their meaning provides a more in-depth insight into the analysis.