Contact Sales

Blog, Blog, Data science, Data science, Sports science

Marathon sub 2 hours? In 10 years, says our model

20 Apr 23

The 2023 marathon season is in full swing. Last weekend, the second of the six annual «Marathon Majors» was held in Boston, and next Sunday the biggest Swiss marathon of the season will take place in Zurich. Thousands of runners pursue personal goals, official records or even sports history marks like the two hours barrier at these events.

Marathon: The two hours barrier

One such mark is the two-hour threshold in the men’s race: the official world record from last year is 2:01:09, set by Kenyan Eliud Kipchoge. In 2019, he even ran a time of 1:59:40 in an unofficial race (with pace-makers and aid stations in violation of the regulations), hinting that the historic two hours mark could soon fall officially as well.
From a modeling perspective, the interesting challenge is to predict the year in which the first person will break this two hour barrier in an official marathon. We tackle this problem in three steps.

Step 1: Data basis

As with any analysis, the underlying database is very important. What historical data is available? Which data points are included? Which are not relevant and are ignored? For the modeling of marathon world records, we have made the following decisions in this regard:

  1. We do not only consider historical world records, which are run irregularly, but the best time of marathons from each year. On the one hand, this results in a larger and more robust data volume and, on the other hand, allows a better estimation of the performance variation between the years.
  2. We limit the analysis to the last 50 years. This makes sense because the «laws» followed by the development of world best times plausibly changes over the years (due to factors such as the globalization of the sport, technological progress, changes in competition regulations, etc.).

The men’s world best times in marathon running from the last 50 years will therefore serve as our data basis:

Abbildung 1: Venn-Diagramm der künstlichen Intelligenz

Step 2: Time series model

On the basis of the selected data, a time series model is created. Time series analysis is a special type of regression analysis used to examine patterns or trends in a series of data collected over time and to make predictions about their progression. Two aspects stand out in our marathon time series since 1970: The time series is approximately linear and there is a clear downward trend:

Abbildung 2: Lineares Modell von Grösse und Gewicht.

The regression model shows that the improvement trend is ~10 seconds per year and that there is a «random» (i.e. not explained by the model) dispersion of about ± 45 seconds around the trend line. Based on these findings from the historical modeling, statements can now be made about the further development of the time series.

Step 3: Prediction

We extrapolate the modeled trend and simulate possible continuations of the time series until 2040, also taking into account the observed dispersion:

Abbildung 3: Vom Input «Grösse» zum Output «Gewicht».

With this simulation we can make probabilistic statements:

  • The first sub-two-hours-marathon is most likely to occur in 2032 until 2034. The two hours barrier is first broken in these three years in over 40 % of the simulation runs.
  • The probability that the two hour threshold will be reached before the end of 2023 is below 1 %.
  • By 2040, according to the model, we will almost certainly (>99 %) observe a first official marathon under two hours.

It should be noted that these results are only valid within the framework of the assumptions made – i.e. that the time series continues to follow the same laws as since 1970. Unforeseen and non-modeled events, such as a sudden change in regulations by the World Athletics Federation, can strongly influence the further course of the marathon time series and thus jeopardize our estimates. Nevertheless: In the «normal case» we can expect the first official marathon under two hours in about ten years.

Want to know more about time series models and sports? In our blog you will find more time series analyses and sports models.