The Oktoberfest in Munich, Germany, attracts millions of people from around the world every year. Beyond the boisterous festivities, a fascinating world of statistical correlations reveals itself. In this blog, we will explore the topic of “correlation and causality” and visit the legendary beer festival and correlations in a slightly different way. So, grab your beer mug, slip into your lederhosen, and get ready for a statistical short trip to Oktoberfest!
Correlation and causality: a brief overview
Before we delve into the examples of Oktoberfest, let’s clarify what correlation and causality mean. Correlation refers to the statistical relationship between two or more variables. When two variables (x, y) are correlated, changes in one are associated with changes in the other, but this does not necessarily imply a cause-and-effect relationship and may be influenced by an outside factor (z) (see Figure 1).
Causality, on the other hand, refers to a direct cause-and-effect relationship between two correlating variables, where changes in one variable lead directly to changes in the other. Therefore, causality usually requires careful investigation, experimentation, and an in-depth knowledge of the relationships between variables (see Figure 2). Understanding the difference between these two concepts is critical when analyzing data and drawing conclusions to avoid misinterpretation and wrong decisions.
Backhendl and beer: an unexpected correlation discovery
In the data from past Oktoberfests, for example, there is a negative correlation between consumption of Backhendl, a traditional Bavarian dish of crispy breaded chicken, and beer consumption (Figure 3). This means that at Oktoberfests with higher beer consumption, fewer Backhendl tend to be devoured.
However, no cause-and-effect relationship can be implied for this correlation, as higher beer consumption does not directly cause reduced demand for Backhendl. Instead, both variables react to similar influences such as the number of visitors at the Oktoberfest or the willingness to pay of festival visitors, which among other things leads to this correlation (see Figure 4).
With an increased number of visitors, it is obvious that a larger quantity of beer and Backhendl will be consumed. Willingness to pay, on the other hand, could reinforce the negative correlation, as visitors consider whether they would rather buy a Mass of beer or a Backhendl for the money. Backhendl also competes with other food offerings and Oktoberfest visitors may switch to alternative (cheaper) food such as sausages or Brezen. The willingness to pay for a Backhendl is therefore decreasing. Together with a rising price trend, this can spread to Backhendl consumption, as can also be seen in Figure 5. Intuitively, this also makes sense: the more expensive a product is, the lower the demand.
On the other hand, beer is one of the main reasons for visiting the Oktoberfest, which is why Oktoberfest visitors are less price-sensitive here. Moreover, beer has no real competition at the Oktoberfest. These assumptions can also be gleaned from the data: Although the price of a Mass (as of 2022: €13.45 / Mass) has risen sharply in recent years, consumption has also grown (see Figure 6). The willingness to pay for a Mass of beer is therefore high, while it is rather low for a Backhendl, since alternative food options can be used.
Another interesting correlation can be seen between beer consumption at Oktoberfest and worldwide UFO sightings (see Figure 7). This correlation states that the more beer is consumed at an Oktoberfest, the more UFO sightings occur worldwide. However, since beer consumption at Oktoberfest is very local, but UFO sightings occur worldwide, no causality between the two variables can be established in this regard either. Likewise, it is possible with this relation that no external factor influences the correlation of these two variables, as it is the case in the upper example. In that case the beer consumption-UFO sightings correlation is pure coincidence.
By the way: If you are interested in more funny correlations, you will find them on the website https://www.tylervigen.com/spurious-correlations. There you can see, for example, that the number of deaths by drowning in a pool correlates strongly with the number of movies starring Nicolas Cage.
Beer consumption and alcohol levels: the causality link
Let us now consider a scenario in which correlation and causality coexist. We examine the influence of beer consumption on blood alcohol levels. Let’s assume that a data analysis reveals a strong positive correlation between beer consumption and alcohol in the blood. Consequently, the more beer an Oktoberfest visitor consumes, the higher the alcohol level in his/her blood. In addition, we conduct careful analyses and experiments that confirm that beer consumption leads to increased alcohol levels in the blood. Then we would have established causality in this case (Figure 8).
The meaning of causality in AI
In the world of machine learning and artificial intelligence, causality plays an increasingly important role. While correlations are easy to find in large data sets, it is often critical to understand if and how one variable influences another.
Causality models help to better understand patterns and relationships in data. They make it possible to identify targeted influencing factors and plan interventions to achieve desired outcomes. Here are some examples of how causality is used in different application areas of machine learning:
Medical diagnosis: In medical research, it is crucial to understand which factors influence a patient’s health. Causal models help to accurately analyze and evaluate the effects of treatments on patients. For example, studying causal relationships between certain medications and improvements in disease symptoms can help develop more effective therapies.
Marketing optimization: Companies use causality analysis to understand the impact of marketing campaigns and promotional efforts on sales of their products or services. If they can find out, for example, which specific advertising measures lead to an increase in sales, they can use their marketing budget more efficiently.
In these and many other application areas of machine learning, the consideration of causality is an important step towards more robust and interpretable models. It makes it possible not only to identify correlations, but also to understand the underlying causes and effects, which is crucial for making accurate predictions and informed decisions.
Now it’s your turn
Finally, we look at the monthly distribution of average births from 1990 to 2022 in the state of Bavaria (Figure 9).
Could it be that the Oktoberfest is responsible for the “above-average” number of births in July? We invite you to dive into this exciting world together with us, to identify correlations and draw correct conclusions. Contact us because your data is too valuable to be misinterpreted.