### Automatic preparation of data for analysis

Data preparation plays a key role in data analysis and machine learning processes. Its importance stems from several important aspects that affect the quality and reliability of the results. High-quality data influences more accurate and reliable statistical models. Raw, unprocessed data often cont…

Read more### Coefficient of determination R²: what is it and how to interpret it?

The coefficient of determination, denoted R² (R-square), is one of the most commonly used statistical tools for model evaluation. It offers a measure of how well a model under test fits the data. In this article, we'll look at what exactly the R² coefficient is and what role it plays in data analys…

Read more### Median

The median is a statistic that we classify as a measure of central tendency. It is one of the most popular descriptive statistics next to the arithmetic mean. For students of analytics, it is a statistic with which they become familiar as one of the first. In addition to its simple interpretation …

Read more### Predictive AI vs. Generative AI – characteristics and differences

Artificial intelligence (AI) is one of the most exciting and rapidly developing areas of technology in the modern world. From self-learning algorithms to advanced image recognition systems to autonomous vehicles, AI is revolutionizing various areas of our lives. What exactly is artificial intellige…

Read more### Story of a pie

You may not know this but this year is the 217th birthday of the humble pie chart. Its first known, and purposeful, application was the visualisation of the geographical distribution of the Turkish Empire across three continents: Asia, Europe, and Africa. It was first presented in Statistical Brevi…

Read more### Parametric versus non-parametric tests. Which test to choose for analysis?

Statistical analysis is an integral part of scientific research and working with data. In order to draw valid conclusions, the use of appropriate statistical tests is essential. The analyst is often faced with the choice of which test to choose in a given situation. This is important because the wr…

Read more### Meta-analysis as an analytical tool

In today's scientific and research world, analysts are often confronted with the problem of analysing large amounts of data coming from different studies. In such situations, meta-analysis becomes an indispensable tool. It allows the results of many studies to be assessed collectively and more prec…

Read more### General linear models and generalised linear models - differences and similarities

In data analysis, the use of general linear models is common due to their simplicity and ease of interpretation of the results obtained. However, there are times when the analyst encounters situations where the assumptions of classical linear models are difficult or impossible to meet. This may be …

Read more### Bayesian inference

Bayesian inference is a method of statistical inference. It is named after Thomas Bayes, the British mathematician and pastor who first formulated Bayesian probability theory in the 18th century. It is a method of data analysis that allows the probability of certain events to be determined not only…

Read more### Data gaps in quantitative data analysis - what are they and how to deal with them?

Missing data in the context of data analysis refers to situations where there are no values for certain variables or observations in a dataset. In other words, they are places where a number, text, or some other form of data was expected, but for various reasons was not there. Missing data can take…

Read more### Population pyramid

When looking for the best way to visualise the data you have, you will come across an impressively wide range of different types of charts - from simple, basic ones such as a scatter plot to very advanced ones such as a Sankey diagram. Some, however, are designed with a specific type of data in min…

Read more### The three sigma rule

The three sigma rule is an important tool in statistics and quality management. In the context of data analysis, it allows the identification of outlier points that are significantly different from the rest of the data. The use of the three-sigma rule in quality control also allows anomalies to be …

Read more### Segmentation: from grouping to classification

Segmentation is a key process in data analysis, dividing a data set into relatively homogeneous groups based on specific criteria. The purpose of segmentation is to identify hidden patterns, differences and similarities between objects in a dataset, enabling more precise and relevant analyses. Two …

Read more### Recoding quantitative variables into qualitative ones – techniques and their practical application

When analysing the data, we take into account both quantitative information (such as salary, age, number of products ordered) and qualitative information (e.g. gender, education, level of satisfaction with service). In order to make it easier to work with the data or to adapt it to a specific stati…

Read more### Outlier or anomaly? Detection of abnormal observations

Can one abnormal occurrence cause concern? Based on one deviation from the norm, should a red light start flashing? Of course! In many industries and businesses, an anomaly is a sign that must be reacted to quickly and efficiently in order to prevent consequences. So how do you recognise an anomaly…

Read more### Statistical inference

Statistical inference is the branch of statistics through which it becomes possible to describe, analyse and make inferences about the whole population on the basis of a sample.

Read more### Outlier cases. Identification and significance in data analysis

In data analysis, it is important to identify unusual observations that are significantly different from the others. Such values, called outliers or outlier cases, can affect the results of statistical analysis and lead to erroneous conclusions. In this material we will look at what outliers are, t…

Read more### Levels of measurement

The level of measurement is one of the most important properties of variables. It determines which statistical tests will be available to the researcher during the course of the analysis. But what information does it convey to us specifically? A level of measurement is a pattern of measurement that…

Read more### Pearson's chi-square correlation test

Popular statistical tests include Pearson's chi-square tests. It is worth noting at the outset that this test has more than one application. In this material, I will discuss the main differences between the tests and introduce the most important issues related to the chi-square test.

Read more### Gini index

The Gini index is a measure of the concentration of the distribution of a variable.

Read more### Neural networks

Neural networks are a family of algorithms that are becoming increasingly popular for tasks in the areas of prediction, classification or clustering.

Read more### Logistic regression

Regression is used to predict the value of the dependent (predictor) variable on the basis of the value of the independent variable or variables (predictors).

Read more### SKEWNESS AND KURTOSIS

Kurtosis and skewness are measures of asymmetry that describe properties such as the shape and asymmetry of the distribution under analysis. They provide us with information on how the values of the variables deviate when compared to the mean value.

Read more### Quantiles, quartiles, percentiles (measures of location)

We use quantiles to determine the position of a given value compared to others in a group or population. Let's say you have received your matriculation exam results in mathematics. Would you like to find out if your score is high compared to the results of the other people writing the baccalaureate…

Read more### Pearson's chi-square independence test

The chi-square test of independence is one of the most common statistical tests. It is used to test whether there is a statistically significant relationship between two qualitative variables.

Read more### Student T-tests

The Student's t-test group is used to compare two groups of results, measured by the arithmetic mean, against each other.

Read more### From variance to the method of least squares

The mean is one of the most popular and widely used statistical measures. By itself, it is not an exhaustive indicator and only allows for the determination of the central tendency of the variable under analysis.

Read more### The power of a test

The power of a test is the probability of detecting a statistically significant effect when one actually occurs in the population under study.

Read more### Customer Satisfaction Index (CSI)

The Customer Satisfaction Index (CSI), or Consumer Satisfaction Index, is a method used in marketing to assess customer satisfaction with the products or services provided by a company.

Read more### Customer Effort Score (CES)

The Customer Effort Score (CES) is, along with the Net Promoter Score (NPS) and Customer Satisfaction (CSAT), one of the main indicators related to customer satisfaction.

Read more### Factor analysis

The aim of factor analysis is to explain as much of the variability as possible with as few variables as possible.

Read more### Net Promoter Score (NPS)

Customer satisfaction and loyalty surveys are now an integral part of a business focused on growth and building competitive advantage. The NPS index, which is an acronym for Net Promoter Score, is now the standard in this area.

Read more### Column chart, bar chart and histogram

Column and bar charts have long been some of the most popular ways of visualising data. Before deciding to use any of them, it is worth taking a closer look at them.

Read more### Scatter plot

A scatterplot (also otherwise known as a dot plot or scatterplot) is a graph with two perpendicular axes on which two variables are presented.

Read more### Integrating business intelligence with data science: PS IMAGO PRO and PS CLEMENTINE PRO

A complex knowledge management system requires integration, namely the integration of multiple and diverse processes, and the integration of the tools responsible for those processes. In practice, how can we create a complete analytical system that not only takes advantage of modern algorithms, but…

Read more