Data analysis

Articles

Car value: variable selection for linear modelling

The Automatic linear modeling procedure is intended to streamline the work of analysts who use regr…

Automatic linear modeling: ensemble models

Today, we will look into ensemble model methods.

Multidimensional scaling in brand positioning

Positioning a product in the context of the competition, or the identification of a product’s featu…

Consumer shopping value analysis using regression trees. Part 2

Regression trees are a very interesting data analysis technique commonly used in tasks related to p…

When two dimensions are not enough: the multidimensional scatterplot in ps imago pro

A scatterplot, or scatter graph, is a popular diagnostic tool for associations between quantitative…

Scatter plot

A scatterplot (also otherwise known as a dot plot or scatterplot) is a graph with two perpendicular…

Factor analysis

The aim of factor analysis is to explain as much of the variability as possible with as few variabl…

The power of a test

The power of a test is the probability of detecting a statistically significant effect when one act…

From variance to the method of least squares

The mean is one of the most popular and widely used statistical measures. By itself, it is not an e…

Student T-tests

The Student's t-test group is used to compare two groups of results, measured by the arithmetic mea…

Pearson's chi-square independence test

The chi-square test of independence is one of the most common statistical tests. It is used to test…

Quantiles, quartiles, percentiles (measures of location)

We use quantiles to determine the position of a given value compared to others in a group or popula…

Skewness and kurtosis

Kurtosis and skewness are measures of asymmetry that describe properties such as the shape and asym…

Logistic regression

Regression is used to predict the value of the dependent (predictor) variable on the basis of the v…

Neural networks

Neural networks are a family of algorithms that are becoming increasingly popular for tasks in the …

Gini index

The Gini index is a measure of the concentration of the distribution of a variable.

Entropy

Entropy is a measure of disorder or uncertainty in a probability distribution.

Pearson's chi-square correlation test

Popular statistical tests include Pearson's chi-square tests. It is worth noting at the outset that…

Levels of measurement

The level of measurement is one of the most important properties of variables. It determines which …

Outlier cases. Identification and significance in data analysis

In data analysis, it is important to identify unusual observations that are significantly different…

Statistical inference

Statistical inference is the branch of statistics through which it becomes possible to describe, an…

Outlier or anomaly? Detection of abnormal observations

Can one abnormal occurrence cause concern? Based on one deviation from the norm, should a red light…

Recoding quantitative variables into qualitative ones – techniques and their practical application

When analysing the data, we take into account both quantitative information (such as salary, age, n…

Segmentation: from grouping to classification

Segmentation is a key process in data analysis, dividing a data set into relatively homogeneous gro…

The three sigma rule

The three sigma rule is an important tool in statistics and quality management. In the context of d…

Population pyramid

When looking for the best way to visualise the data you have, you will come across an impressively …

Data gaps in quantitative data analysis - what are they and how to deal with them?

Missing data in the context of data analysis refers to situations where there are no values for cer…

Bayesian inference

Bayesian inference is a method of statistical inference. It is named after Thomas Bayes, the Britis…

General linear models and generalised linear models - differences and similarities

In data analysis, the use of general linear models is common due to their simplicity and ease of in…

Meta-analysis as an analytical tool

In today's scientific and research world, analysts are often confronted with the problem of analysi…

Parametric versus non-parametric tests. Which test to choose for analysis?

Statistical analysis is an integral part of scientific research and working with data. In order to …

Predictive AI vs. Generative AI – characteristics and differences

Artificial intelligence (AI) is one of the most exciting and rapidly developing areas of technology…

Median

The median is a statistic that we classify as a measure of central tendency. It is one of the most…

Coefficient of determination R²: what is it and how to interpret it?

The coefficient of determination, denoted R² (R-square), is one of the most commonly used statistic…

Automatic preparation of data for analysis

Data preparation plays a key role in data analysis and machine learning processes. Its importance s…

Rule induction algorithms – discovering patterns in data

Rule induction is one of the key methods in the field of artificial intelligence and machine learni…

Data analysis

Articles

Cookies

Your privacy

Base cookies

Analytical cookies

Advertising cookies

Advertising preferences