Written by Scott Gray

Fun hacks: Predicting the winner of the 2023 Masters Tournament using hx Renew

Data engineering

3 minutes

In its 84-year history, the Masters has had 54 winners, making it one of the most difficult sporting events to predict. In this article, we'll be using hx Renew and its capability to integrate with historical data to predict the winner of this year's

In 1932, Bobby Jones and Clifford Roberts opened Augusta National Golf Club, where the Masters Tournament, renowned as the pinnacle of golfing excellence, has been held since then. In the 86 tournaments since, there have been 54 different winners – making it one of the most challenging sporting events to predict.    

New model developers at hyperexponential are challenged to develop a personal project within hx Renew, our next-generation pricing tool, to increase their familiarity with the platform. Scott Gray joined the team in November 2022 and decided to leverage hx Renew to predict this year's coveted Masters Tournament winner by integrating with data from datagolf, the API for all things golf. Keep reading to see how his experiment worked out! 

Creating a model and connecting to third party data in minutes 

To predict the winner, our golf model will calculate predicted strokes per round and total strokes for multiple user-selected players in a user-selected PGA money event. We'll achieve this by fetching players' real-time ratings and historical performances from datagolf's API into the hx Renew model. With hx Renew's ability to integrate dynamically with all kinds of data sets, our team set this up in minutes. We have been able to apply this same methodology to help insurers connect to both their internal systems as well as external 3rd party databases. 

No matter how complex your datasets, hx Renew makes sense of it 

Golf has become one of the most data-heavy sports, and even with numerous data categorisations, hx Renew was able to help us create a predictive model successfully. 

Players' real-time ratings are split into two categories; their SG (strokes gained) categories and driving attributes.  

SG categories indicate a player's skill level in several facets of the game: putting, around the green, approach, and off the tee. SG is measured by how many strokes a player saves in the category compared to the average player. For example, a player with a putting metric of 3 and an approach metric of -2 means that this player will gain 3 strokes on the average player in putting and lose 2 strokes in approach shots. You can sum these categories to get the SG total showing the total strokes a player will gain or lose against the average PGA player. 

Driving attributes are calculated by measuring if the accuracy (percent of fairways hit) and distance (yards of the tee) sit below or above the average player. If a player has a distance metric of 20 and an accuracy metric of -5%, this means that they hit the ball 20 yards further than the average player but are also 5% less accurate.  

A player's historical data is a collection of all their scores from every round they have played in the selected tournament over the past five years. The model takes the average of these rounds and weighs this value in the final prediction in proportion to how many rounds they have played in the tournament.  

For example, a player who has played 24 rounds in the tournament will have a predicted score that closely reflects their historical performance. In contrast, a player who has played only 4 rounds prediction will deviate more from their historical performance average and more closely reflect their current ratings.   

We also leverage historical data to determine the average player score in the selected tournament. Every round played in the tournament in the past five years is averaged for this metric. This number is also used to define SG scores since a player's SG total can be subtracted from this number to come up with their score prediction in the tournament based purely on SG ratings.  

Utilising the predictive model developed in hx Renew 

Now that we've defined the parameters for the model, let's use it to predict five golfers' scores in the upcoming Masters Tournament. The following steps are split into three main categories: event and player selection, historical stats and player ratings, and event predictions.   

Step 1: Selecting the players 

We start with event and player selection. Figure 1 shows our player selections for our model - a collection of past Masters winners and some of the current top players in the world. 

Fig 1 – event and player selection page

Step 2: Integrating with historical data 

Next, we fetch and view the data for historical stats and player ratings. The historical stats section (Figure 2) shows us the average score in the event, and the average scores and rounds played per player. The table can be sorted to see who has played the least/most rounds or has the best/worst average score. Fig 2 – historical event stats section. sorted by best average score

The second section on this page shows players' current ratings (Figure 3) and provides a visual comparison of players' SG ratings (Figure 4). The table (Figure 3) can be sorted by each column, making it easy to see which players are rated the best in which categories.

Fig 3 – players current ratings table sorted by SG Total

Fig 4 – strokes gained comparison visualization

Step 3: Getting to our predictions 

Lastly, we can see our model rating factors and predictions on our events prediction page. The players' score prediction factors table (Figure 5) shows the final rating factors determining a player's score prediction. The driving adjustment factor is derived from a look-up table based on a player's driving attributes and treated like an SG metric. 

Fig 5 – players score prediction factors

Users can select what factors they want to include in the predictions for this model and any others in hxRenew. We can see in Figure 6 that Rory McIlroy is predicted to have the best (lowest) score out of the five golfers selected when including all prediction factors. When we filter the factors to only include historical performance, we see that Scottie Scheffler now has the best-predicted score (Figure 7).

Fig. 6 – score predictions with all rating factors included (Rory is the winner)

Fig. 7 – score predictions based off only historical performance (Scottie is the winner)

In conclusion 

There you have it! Based on score predictions with all rating factors included, we have predicted that Rory McIlroy will win the 2023 Masters Tournament. Whereas if we look purely at historical data, Scottie Scheffer comes out on top. Who do you think will win this year's tournament? Tell us on LinkedIn or Twitter

With hx Renew, we have made it seamless for our customers to integrate with all sorts of datasets – even complex historical golfing data. If you want to learn more about hx Renew and how it can help your company with its pricing transformation journey, contact our team.