Contact Sales

Blog, Data science, Data science, Diverse

Forecast of the daily number of guests for the ski resort Flumserberg

24 Feb 22

Depending on the weather, snow quality or day of the week, the utilization of a ski resort can vary greatly. For the Flumserberg ski resort, we have developed a tool that forecasts the daily number of visitors for the next week.


On a sunny Saturday afternoon, thousands of people are on the slopes – on a foggy Tuesday morning, almost none. In other words, the daily number of guests at a ski resort is not constant, but strongly dependent on factors such as weather, snow quality, day of the week or cantonal vacation status. This poses a challenge for the resource planning of the resort. How do you ensure ahead of time that all ticket office employees are on site on a peak day? And how do you prevent staff from working in vain on a rainy day?

To answer such questions, we have developed a forecasting tool for the Flumserberg winter sports region in St. Gallen. This uses patterns from historical data and weather variables to automatically forecast the daily number of guests for the next week and visualizes them in an intuitive way. Flumserberg can thus estimate at an early stage how the utilization will turn out on a particular day.

Forecasts with Machine Learning

Making predictions on the basis of previous values and other influencing variables is a classic machine learning problem: Computer-based mathematical models can use a historical data set to “learn” how a target variable (in this case, the number of visitors to the ski resort) is associated with other variables (weather, day of the week, etc.). For example, the model learns how many more guests can be expected if the weather forecast for tomorrow is very good. Once such associations are learned, the model can forecast the target for future cases and also make quantitative statements about the uncertainty of the forecast.

Forecasts Flumserberg

To create the forecasts for Flumserberg, the following steps were followed:

  1. selection and preparation of data sources: We used the ski area’s entry data since 2010 as a baseline. These represented the target variable, i.e. each quantity that we ultimately wanted to predict. The historical data set was supplemented with relevant weather variables (precipitation, temperature, cloud cover, etc.) as well as information on special events in the ski area, holidays and vacations in the surrounding cantons.
  2. Construction of additional variables: Based on the raw data, additional variables were calculated (so-called “feature engineering”), which could potentially be useful for fitting a model. For example, for each date, the number of entries for each of the preceding days was included. In addition to the target variable (entries per day), the final data set contained a total of 50 so-called explanatory variables – variables that are available to the model for the prediction of the target variable.
  3. Fitting the model: We used a “random forest model” for the prediction. This type of model is based on the combination of hundreds of decision trees (hence “forest”) and has proven to be very robust in pattern recognition of various kinds. In addition, a random forest can learn complex relationships such as variable interactions. Thus, statements such as “a holiday has a stronger effect on the number of visitors when the sun is shining” are possible in principle.
  4. Generating the forecasts: With a fully adapted model, new forecasts can now be calculated. For this purpose, a value is entered for each of the 50 explanatory variables (weather, snow, vacation, etc.) and the model forecasts the target variable (entries) on the basis of the rules learned. Of course, there is a certain amount of uncertainty in any prediction – after all, you don’t know the weather in three days with perfect certainty. In order to communicate this uncertainty, the model was also used to determine the respective probability that the number of admissions on a day is in the low (green), medium (yellow), or critically high range (red).
Abbildung 1: Venn-Diagramm der künstlichen Intelligenz

Evaluation of the model

The forecast tool for Flumserberg has been in use since last ski season (2020 / 21). We therefore already have a season of forecasts made, which we can compare with the actual values and thus assess the performance of the tool. In this assessment, it is important to keep in mind that the winter of 2020 / 21 was strongly influenced by the Corona pandemic. There were often unusually few visitors to the ski resorts and sometimes they were even closed altogether. This naturally led to a tendency to over-predict the model, which was trained exclusively on data from before the pandemic.

So what did the model performance look like? Despite the corona situation, the predictions were surprisingly good: on average, the deviation of the predictions from the true value was namely ~1000 individuals. As the following figure shows, the more accurate structure of the deviation was the expected one: Generally too high predictions due to the corona effect.

Abbildung 2: Lineares Modell von Grösse und Gewicht.

More important for the applicability of the model, however, is the fact that the model correctly predicted in > 80% of the cases whether a critical limit defined by Flumserberg would be exceeded or not. Even more concretely: Out of 98 cases where this limit was actually exceeded, our model had predicted this 97 times.

«Datahouse’s forecasts – in addition to our experience – helped us estimate what kind of workload we could expect over the next few days, allowing us to properly plan our staffing resources.»

says Michael Ackermann, member of the management of Bergbahnen Flumserberg.

Transfer to other cases

Machine learning based pattern recognition in historical data and predictions based on it are of course not limited to the application in ski resorts. On the contrary: In many industries, data sets exist that contain useful information for calculating business-relevant predictions. The specific methodology of the applied machine learning models depends on the exact use case and has to be developed carefully.

Are you interested in pattern recognition and predictions or would you like to find out what potential your data has in this regard? We will be happy to advise you! So do not hesitate to get in touch with us.