 # Factor analysis

Reading the article will take you: 4 minutes.
The aim of factor analysis is to explain as much of the variation as possible with as few variables as possible.

With this statistical method, it is possible to reduce the number of variables by identifying larger sets (factors) that contain variables that generate a common variance (covariance) and therefore explain the same phenomenon. The variables included in such a set are therefore consistent (correlated) with each other, while the resulting factors (sets) differ from each other. Factor analysis also allows the classification of variables, i.e., the determination of the structure of variables.

We can use factor analysis to detect a customer’s main purchase motives, or to find out the most important personal characteristics for a job position, e.g., managerial. In a set of many variables, we can reduce their number by creating factors such as emotional, social or economic variables. By doing so, it will also be easier to interpret the results obtained.

## Exploratory versus confirmatory factor analysis

Factor analysis can be used as both an exploratory and a confirmatory technique. In the case of exploratory factor analysis, we are typically dealing with a situation where we need to identify the number of factors on an ad hoc basis from the available data. This approach can be particularly useful when we are operating on a dataset containing many variables whose structure is not clearly defined. An example of this would be a study of customer satisfaction or evaluation of a particular product where variables may group into factors such as satisfaction with the product itself, with customer service, with social or prestige aspects, or with the practicality of the product under study. Some factors may be self-evident, while in some situations, the substantive interpretation of the resulting factors may require more expert knowledge.

Confirmatory factor analysis, used in structural equation modelling, assumes a certain number of factors on the basis of a priori theoretical knowledge and which variables are strongly correlated with a particular factor. This form of factor analysis is often used to verify new solutions in the context of well-studied structures of phenomena. This is the case, for example, when constructing new psychological questionnaires designed to diagnose phenomena that are strongly grounded in theory. Confirmatory factor analysis thus makes it possible to test hypotheses about the structure of the data, for example, the analysis of personality traits according to the Big Five model. Here, five main personality dimensions are distinguished, which are measured using more subscales. One factor is conscientiousness, which includes variables that should correlate with each other, such as competence, orderliness or dutifulness.

## Factor analysis methods

There are many techniques for reducing the number of variables. One of the more commonly used is principal component analysis, which allows the output variables to be written as a linear combination of uncorrelated observed variables called components (analogous to factors). Factor extraction methods such as unweighted least squares, generalised least squares, maximum likelihood, principal axis, alpha or image are also available in PS IMAGO PRO.

One method for determining the number of factors to extract is the scree plot and the Cattell criterion based on it. To select the number of components, one looks for a point on the graph where the graph stops being steep (stops settling) and counts the components (points) above that point. The number of these points represents the number of factors that will be extracted in the analysis. In the example below (Figure 1), we would choose to extract three factors.

A slightly less rigorous method is to use the Kaiser criterion, in which the number of factors is determined by the number of components with an eigenvalue greater than one. With a large number of input variables, however, there is a risk that this criterion will extract a large number of weaker components and thus reduce the amount of data to a smaller extent.

These methods are of course used in exploratory factor analysis, where we do not know in advance what number of factors we will expect. Figure 1. Scree plot of components

In addition to just extracting the factors, factor analysis also allows for rotations. This method allows the axes representing the factors to be rotated and adjusted in such a way that their interpretation is easier (Fig. 2). Five rotation methods are available in PS IMAGO PRO: Varimax, simple Oblimin, Quartimax, Equamax and Promax. They differ from each other in certain assumptions and the algorithms with which they are carried out. Figure 2. Correlation plot of the two variables after rotation of the coordinate axis.

## Assumptions of factor analysis

The basic assumption of factor analysis is that the variables in question use a quantitative level of measurement. Each variable should also have a large variance: if the group is too homogeneous, it will be significantly more difficult to distinguish the resulting factors. In addition, the variables should have a normal distribution and possible outliers should be removed.

As indicated in the introduction to this article, the variables used in the analysis must correlate significantly with each other in some way, so that the number of variables can then be reduced on this basis. The observations, on the other hand, should be independent.

Another important issue is the size and complexity of the dataset to be analysed. In order to be able to obtain a meaningful solution, we must have a sufficient number of variables and observations. What exactly does this mean? For recommendations, we refer to the literature where we find recommendations of at least 3-4 variables (scale items) for each potential factor (Fabrigar, Wegener, MacCallum, Strahar, 1999) or 5-10 observations for each variable (Gorsuch, 1983). The type of factor analysis (exploratory, confirmatory), as well as the nature of the data, affect how many observations are actually needed to produce reliable results (MacCallum, Widaman, Zhang, Hong, 1999). Although no ideal universal value or proportion can be identified here, as always - in general, the larger the data set, the better.

## Summary

Factor analysis is a dimension reduction technique that can be used both to construct measurement tools, composite indicators and theoretical concepts about the structure and relationships of variables. However, conducting a factor analysis requires a number of arbitrary decisions regarding, for example, the selection of variables to be analysed, the number of factors, the method of extraction or rotation. Introducing variables that are weakly related to the others can significantly weaken the solution obtained and significantly change the arrangement of variables in the individual factors. If factor analysis is used exploratory, it is worth testing several solutions with different numbers of factors, and selecting the best version both in terms of the results obtained and its interpretability.