Automatic preparation of data for analysis
Data preparation plays a key role in data analysis and machine learning processes. Its importance stems from several important aspects that affect the quality and reliability of the results. High-quality data influences more accurate and reliable statistical models. Raw, unprocessed data often cont…
Read moreCoefficient of determination R²: what is it and how to interpret it?
The coefficient of determination, denoted R² (R-square), is one of the most commonly used statistical tools for model evaluation. It offers a measure of how well a model under test fits the data. In this article, we'll look at what exactly the R² coefficient is and what role it plays in data analys…
Read moreMedian
The median is a statistic that we classify as a measure of central tendency. It is one of the most popular descriptive statistics next to the arithmetic mean. For students of analytics, it is a statistic with which they become familiar as one of the first. In addition to its simple interpretation …
Read morePredictive AI vs. Generative AI – characteristics and differences
Artificial intelligence (AI) is one of the most exciting and rapidly developing areas of technology in the modern world. From self-learning algorithms to advanced image recognition systems to autonomous vehicles, AI is revolutionizing various areas of our lives. What exactly is artificial intellige…
Read moreStory of a pie
You may not know this but this year is the 217th birthday of the humble pie chart. Its first known, and purposeful, application was the visualisation of the geographical distribution of the Turkish Empire across three continents: Asia, Europe, and Africa. It was first presented in Statistical Brevi…
Read moreParametric versus non-parametric tests. Which test to choose for analysis?
Statistical analysis is an integral part of scientific research and working with data. In order to draw valid conclusions, the use of appropriate statistical tests is essential. The analyst is often faced with the choice of which test to choose in a given situation. This is important because the wr…
Read moreMeta-analysis as an analytical tool
In today's scientific and research world, analysts are often confronted with the problem of analysing large amounts of data coming from different studies. In such situations, meta-analysis becomes an indispensable tool. It allows the results of many studies to be assessed collectively and more prec…
Read moreGeneral linear models and generalised linear models - differences and similarities
In data analysis, the use of general linear models is common due to their simplicity and ease of interpretation of the results obtained. However, there are times when the analyst encounters situations where the assumptions of classical linear models are difficult or impossible to meet. This may be …
Read moreBayesian inference
Bayesian inference is a method of statistical inference. It is named after Thomas Bayes, the British mathematician and pastor who first formulated Bayesian probability theory in the 18th century. It is a method of data analysis that allows the probability of certain events to be determined not only…
Read moreData gaps in quantitative data analysis - what are they and how to deal with them?
Missing data in the context of data analysis refers to situations where there are no values for certain variables or observations in a dataset. In other words, they are places where a number, text, or some other form of data was expected, but for various reasons was not there. Missing data can take…
Read morePopulation pyramid
When looking for the best way to visualise the data you have, you will come across an impressively wide range of different types of charts - from simple, basic ones such as a scatter plot to very advanced ones such as a Sankey diagram. Some, however, are designed with a specific type of data in min…
Read moreThe three sigma rule
The three sigma rule is an important tool in statistics and quality management. In the context of data analysis, it allows the identification of outlier points that are significantly different from the rest of the data. The use of the three-sigma rule in quality control also allows anomalies to be …
Read moreSegmentation: from grouping to classification
Segmentation is a key process in data analysis, dividing a data set into relatively homogeneous groups based on specific criteria. The purpose of segmentation is to identify hidden patterns, differences and similarities between objects in a dataset, enabling more precise and relevant analyses. Two …
Read moreRecoding quantitative variables into qualitative ones – techniques and their practical application
When analysing the data, we take into account both quantitative information (such as salary, age, number of products ordered) and qualitative information (e.g. gender, education, level of satisfaction with service). In order to make it easier to work with the data or to adapt it to a specific stati…
Read moreOutlier or anomaly? Detection of abnormal observations
Can one abnormal occurrence cause concern? Based on one deviation from the norm, should a red light start flashing? Of course! In many industries and businesses, an anomaly is a sign that must be reacted to quickly and efficiently in order to prevent consequences. So how do you recognise an anomaly…
Read moreStatistical inference
Statistical inference is the branch of statistics through which it becomes possible to describe, analyse and make inferences about the whole population on the basis of a sample.
Read moreOutlier cases. Identification and significance in data analysis
In data analysis, it is important to identify unusual observations that are significantly different from the others. Such values, called outliers or outlier cases, can affect the results of statistical analysis and lead to erroneous conclusions. In this material we will look at what outliers are, t…
Read moreLevels of measurement
The level of measurement is one of the most important properties of variables. It determines which statistical tests will be available to the researcher during the course of the analysis. But what information does it convey to us specifically? A level of measurement is a pattern of measurement that…
Read morePearson's chi-square correlation test
Popular statistical tests include Pearson's chi-square tests. It is worth noting at the outset that this test has more than one application. In this material, I will discuss the main differences between the tests and introduce the most important issues related to the chi-square test.
Read moreGini index
The Gini index is a measure of the concentration of the distribution of a variable.
Read moreNeural networks
Neural networks are a family of algorithms that are becoming increasingly popular for tasks in the areas of prediction, classification or clustering.
Read moreLogistic regression
Regression is used to predict the value of the dependent (predictor) variable on the basis of the value of the independent variable or variables (predictors).
Read moreSKEWNESS AND KURTOSIS
Kurtosis and skewness are measures of asymmetry that describe properties such as the shape and asymmetry of the distribution under analysis. They provide us with information on how the values of the variables deviate when compared to the mean value.
Read moreQuantiles, quartiles, percentiles (measures of location)
We use quantiles to determine the position of a given value compared to others in a group or population. Let's say you have received your matriculation exam results in mathematics. Would you like to find out if your score is high compared to the results of the other people writing the baccalaureate…
Read morePearson's chi-square independence test
The chi-square test of independence is one of the most common statistical tests. It is used to test whether there is a statistically significant relationship between two qualitative variables.
Read moreStudent T-tests
The Student's t-test group is used to compare two groups of results, measured by the arithmetic mean, against each other.
Read moreFrom variance to the method of least squares
The mean is one of the most popular and widely used statistical measures. By itself, it is not an exhaustive indicator and only allows for the determination of the central tendency of the variable under analysis.
Read moreThe power of a test
The power of a test is the probability of detecting a statistically significant effect when one actually occurs in the population under study.
Read moreCustomer Satisfaction Index (CSI)
The Customer Satisfaction Index (CSI), or Consumer Satisfaction Index, is a method used in marketing to assess customer satisfaction with the products or services provided by a company.
Read moreCustomer Effort Score (CES)
The Customer Effort Score (CES) is, along with the Net Promoter Score (NPS) and Customer Satisfaction (CSAT), one of the main indicators related to customer satisfaction.
Read moreFactor analysis
The aim of factor analysis is to explain as much of the variability as possible with as few variables as possible.
Read moreNet Promoter Score (NPS)
Customer satisfaction and loyalty surveys are now an integral part of a business focused on growth and building competitive advantage. The NPS index, which is an acronym for Net Promoter Score, is now the standard in this area.
Read moreColumn chart, bar chart and histogram
Column and bar charts have long been some of the most popular ways of visualising data. Before deciding to use any of them, it is worth taking a closer look at them.
Read moreScatter plot
A scatterplot (also otherwise known as a dot plot or scatterplot) is a graph with two perpendicular axes on which two variables are presented.
Read moreIntegrating business intelligence with data science: PS IMAGO PRO and PS CLEMENTINE PRO
A complex knowledge management system requires integration, namely the integration of multiple and diverse processes, and the integration of the tools responsible for those processes. In practice, how can we create a complete analytical system that not only takes advantage of modern algorithms, but…
Read more