This is the continued saga of “how to build a game projection”. I’d recommend reading the three previous articles of the series before you continue reading this post:
The goal in this article is to lay the groundwork for the final game projection model. In the previous article I built the in-season model. Today I will focus on the pre-season model and combining the two.
Just to summarize: The in-season model is supposed to adjust player values based on performance in the present season. The pre-season model is supposed to give us the player values at the start of the season based on the performance in previous seasons.
Isolating pre-season models
To begin with, I’m only interested the isolated performance of the potential pre-season models. So, I’m basically calculating a player value at the start of the season and keeping this value throughout. Then based on these player values I can calculate win probabilities for every game and use log loss to compare the model performance to the actual results.
For now, I will look at Evolving-Hockey’s GAA and xGAA models, sGAA and GSAx. The sGAA model is a combination of Evolving-Hockey’s GAR and xGAR models. You can read about the sGAA model here:
I’m just using a 1-year sample size, so the player values are simply the sum of GAA, xGAA, sGAA or GSAx in the previous season. I then calculate the team strengths for every game the same way I did in the previous articles:
xWin% (team strength) = 0.5 + Player Value * x
The win probability is then:
Win Probability = xWin% / (xWin% + Opponent xWin%)
Then the log loss for every game is calculated:
Log loss = -ln(1 – ABS(Result – Win Probability))
If a team with a win probability of 60% loses, then the log loss is:
Log loss = -ln(1 – ABS(0 – 0.6)) = -ln(0.4) = 0.9163
The x-value in the first formula can now be found by minimizing the log loss – the lower the log loss the better the model!
With this procedure I found the x-values and corresponding log loss values for each model:
Here’s the visualization of the log loss as a function of game number. The lines are just the trendlines to make the graph easier to interpret.
We see that xGAA and sGAA are better pre-season models than GAA. We also see that GSAx is a really bad model. There’s nothing surprising in this. Other research have shown that goaltender performance in one year correlates poorly with goaltender performance the next year.
Based on these results I want to continue working on a sGAA-based pre-season model. The first step is to combine sGAA (Skaters) and GSAx (Goaltenders). Here’s the results:
|Model||x1 (sGAA)||x2 (GSAx)||Log loss|
It’s worth noting how little weight is put on goaltending. The predictability of goaltending from season to season is very small.
Combining pre-season model and in-season model
The next step is to combine the in-season model found in the previous post and the pre-season model. Here’s the results:
|Model||x1 (In-season)||x2 (pre-season)||Log loss|
Now we have a model that can compete with the closing betting lines. In fact, the log loss of the model is slightly below the log loss of the market.
Currently, the pre-season model is weighed the same throughout the season, but it might be preferable to decrease the weight put on the pre-season model as the season progresses. So, you put more weight on new data and less weight on old data.
I will decrease the weight by 1% and 2% respectively for every game a player has played:
Value (weighted 1%) = Value * (1 – Game No) * 0.01
Value(weighted 2%) = Value * (1 – Game No) * 0.02 if Game No > 51 the value is 0
By doing this the weight put on pre-season data decreases as the season progresses. With a 2% decrease players playing in their game number 51 or above will have all their value come from the in-season model.
Here’s the results with these weightings:
|Model||x1 (In-season)||x2 (Pre-season)||Log loss|
The 1% decrease appear to give the best results.
Theoretical betting results
What would happen if we used the calculated win probabilities to bet on the closing betting lines. The approach here is to bet using the Kelly Criterion with a 0.3 multiplication:
Bet size(Risk) = ((Win probability * (Odds – 1) – 1 + Win probability) / (Odds – 1)) * 0.3
This is of course somewhat cheating, because the model is build on the very games we’re “betting” on… but since the sample size is 6 seasons it should still be a good indicator of model performance. Here’s the results:
We could also look at betting results on specific teams. Here’s the results of bets placed on each team:
So, the model was significantly higher on N.J and CBJ than the market. Here’s the results of bets placed against each team:
Here we see that the model has been much lower on BUF than the market.
Finally, we can look at the ROI as a function of game number. To see if the model performs better in the beginning or the end of a season:
We don’t really see any trends here, meaning the model performs equally well throughout the season.
The model is not yet finished, but the initial results look very promising. There are still improvements to be made. I want to use a 3-year sample size for the pre-season model instead of a 1-year sample size. I could also add an age curve and define a value for rookies and players with small sample sizes. Right now, rookies are considered average by default.
…But before I make any improvements, I want to test a different pre-season model that’s closer to the in-season model in structure. That’s the goal of the next article. If this model isn’t better, then I will move forward with a sGAA-based pre-season model. Either way, I think the results are looking promising.