The technique is often used within the broader category of classification algorithms, which also include decision trees, k-NN (k nearest neighbour) methods, support vector machines (SVMs) or neural networks.
A key aspect of discriminant analysis is the use of a discriminant function to separate groups on the basis of variables and assign new observations to appropriate categories.
Discriminant analysis was first developed by Ronald Fisher in 1936 as a method of discriminating between groups based on multivariate data. In the context of machine learning, it is closely related to classification models and can be used as a tool to support the development of more complex models.
R. Fisher proposed a linear function that expresses the relationship between the independent variables and their influence on the classification result, which is still the foundation of this method today.
Discriminant analysis requirements
Although the discriminant function is similar to a linear function, the assumptions that need to be met in order for the results of the analysis to be considered valid are more extensive than in linear regression.
Among the most important assumptions that the data should meet for discriminant analysis are:
- a qualitative dependent variable of at least two categories,
- quantitative independent variables, representing a sample from a multivariate normal distribution,
- homogeneity of the variance/covariance matrix across groups. This is an important assumption, as violations of this assumption can lead to errors in the interpretation of results. The multivariate M Box test can be used to check this assumption, but should be treated with caution, especially when dealing with non-normal data,
- lack of correlation between predictors. Excessive correlation can lead to problems with the inversion of the variance/covariance matrix,
- relatively equal categories of the dependent variable. Large differences in group sizes can lead to classification errors,
- no outliers that can significantly affect the results of the analysis.
It will be helpful to check assumptions and perform discriminant analysis using PS CLEMENTINE PRO, which easily allows this type of analysis to be carried out. In the case of data problems, it is worth considering transformations (e.g. logarithmisation, compounding) or the use of alternative classification methods such as decision trees, SVMs or neural networks. These are available in PS CLEMENTINE PRO.
Discriminant function
The discriminant function is a key element of discriminant analysis. It allows a mathematical description of the decision boundary between groups based on the input variables. Its general form is:

gdzie:

This function is used both to create decision boundaries and to assign new observations to appropriate groups.
Discriminant analysis in PS CLEMENTINE PRO
In PS CLEMENTINE PRO, discriminant analysis is one of the analytical tools available. Thanks to its intuitive interface, the solution enables this analysis to be carried out in a fast and user-friendly manner, offering many possibilities for visualising and interpreting the results.

Fig.1
Example of an analytical flow using discriminant analysis prepared in PS CLEMENTINE PRO
Discriminant analysis, like other classification techniques, has a key function in the process of assigning objects to specific groups based on their characteristics. Its versatility makes it applicable in many fields, such as finance, social sciences, marketing or public health. For example, in finance, it enables the assessment of customers' credit risk, while in marketing it supports customer segmentation.
Summary
Discriminant analysis is a sophisticated statistical technique for classifying objects on the basis of their characteristics. An important element of this method is the discriminant function, which describes the decision boundaries between groups, allowing new observations to be assigned to the appropriate classes. The method requires a number of assumptions to be met, such as multivariate normality of the distribution of independent variables or the absence of strong correlation between predictors. Violation of these assumptions can lead to errors in the results, so it is worth using tools such as PS CLEMENTINE PRO to support the analysis, visualisation and interpretation of the results.