Game projection model – Pre-season Model (Part IIIA)

This is the continued saga of “how to build a game projection”. I’d recommend reading the three previous articles of the series before you continue reading this post:

Game projection model – The Variables (Part I)

Game projection model – The Variables (Part IB)

Game projection model – In-season Model (Part II)

The goal in this article is to lay the groundwork for the final game projection model. In the previous article I built the in-season model. Today I will focus on the pre-season model and combining the two.

Just to summarize: The in-season model is supposed to adjust player values based on performance in the present season. The pre-season model is supposed to give us the player values at the start of the season based on the performance in previous seasons.

Isolating pre-season models

To begin with, I’m only interested the isolated performance of the potential pre-season models. So, I’m basically calculating a player value at the start of the season and keeping this value throughout. Then based on these player values I can calculate win probabilities for every game and use log loss to compare the model performance to the actual results.

For now, I will look at Evolving-Hockey’s GAA and xGAA models, sGAA and GSAx. The sGAA model is a combination of Evolving-Hockey’s GAR and xGAR models. You can read about the sGAA model here:

Interpretation and redefining of Evolving-Hockey’s GAR and xGAR models.

I’m just using a 1-year sample size, so the player values are simply the sum of GAA, xGAA, sGAA or GSAx in the previous season. I then calculate the team strengths for every game the same way I did in the previous articles:

xWin% (team strength) = 0.5 + Player Value * x

The win probability is then:

Win Probability = xWin% / (xWin% + Opponent xWin%)

Then the log loss for every game is calculated:

Log loss = -ln(1 – ABS(Result – Win Probability))

If a team with a win probability of 60% loses, then the log loss is:

Log loss = -ln(1 – ABS(0 – 0.6)) = -ln(0.4) = 0.9163

The x-value in the first formula can now be found by minimizing the log loss – the lower the log loss the better the model!

With this procedure I found the x-values and corresponding log loss values for each model:

Model	x-value	Log loss
Base	–	0.6881
Market	–	0.6714
GAA	0.0043	0.6777
xGAA	0.0047	0.6759
sGAA	0.0051	0.6761
GSAx	0.0019	0.6877

Here’s the visualization of the log loss as a function of game number. The lines are just the trendlines to make the graph easier to interpret.

We see that xGAA and sGAA are better pre-season models than GAA. We also see that GSAx is a really bad model. There’s nothing surprising in this. Other research have shown that goaltender performance in one year correlates poorly with goaltender performance the next year.

Based on these results I want to continue working on a sGAA-based pre-season model. The first step is to combine sGAA (Skaters) and GSAx (Goaltenders). Here’s the results:

Model	x1 (sGAA)	x2 (GSAx)	Log loss
Base	–	–	0.6881
Market	–	–	0.6714
sGAA+GSAx	0.00505	0.00065	0.6760

It’s worth noting how little weight is put on goaltending. The predictability of goaltending from season to season is very small.

Combining pre-season model and in-season model

The next step is to combine the in-season model found in the previous post and the pre-season model. Here’s the results:

Model	x1 (In-season)	x2 (pre-season)	Log loss
Base	–	–	0.6881
Market	–	–	0.6714
Combination	0.689	0.598	0.6710

Now we have a model that can compete with the closing betting lines. In fact, the log loss of the model is slightly below the log loss of the market.

Currently, the pre-season model is weighed the same throughout the season, but it might be preferable to decrease the weight put on the pre-season model as the season progresses. So, you put more weight on new data and less weight on old data.

I will decrease the weight by 1% and 2% respectively for every game a player has played:

Value (weighted 1%) = Value * (1 – Game No) * 0.01

Value(weighted 2%) = Value * (1 – Game No) * 0.02 if Game No > 51 the value is 0

By doing this the weight put on pre-season data decreases as the season progresses. With a 2% decrease players playing in their game number 51 or above will have all their value come from the in-season model.

Here’s the results with these weightings:

Model	x1 (In-season)	x2 (Pre-season)	Log loss
Base	–	–	0.6881
Market	–	–	0.6714
1% decrease	0.747	0.862	0.6707
2% decrease	0.910	1.028	0.6711

The 1% decrease appear to give the best results.

Theoretical betting results

What would happen if we used the calculated win probabilities to bet on the closing betting lines. The approach here is to bet using the Kelly Criterion with a 0.3 multiplication:

Bet size(Risk) = ((Win probability * (Odds – 1) – 1 + Win probability) / (Odds – 1)) * 0.3

This is of course somewhat cheating, because the model is build on the very games we’re “betting” on… but since the sample size is 6 seasons it should still be a good indicator of model performance. Here’s the results:

Season	Risk	Bet result	ROI
20142015	6646%	429%	6.5%
20152016	5144%	-135%	-2.6%
20162017	4118%	110%	2.7%
20172018	4771%	359%	7.5%
20182019	6436%	372%	5.8%
20192020	5493%	722%	13.2%
Total	32608%	1858%	5.7%

We could also look at betting results on specific teams. Here’s the results of bets placed on each team:

Team	Risk	Bet result	ROI
N.J	2282%	20%	0.9%
CBJ	1827%	188%	10.3%
OTT	1742%	-59%	-3.4%
NYI	1537%	190%	12.4%
CGY	1527%	3%	0.2%
NYR	1503%	189%	12.6%
WPG	1383%	94%	6.8%
MIN	1365%	111%	8.1%
PHI	1279%	8%	0.6%
EDM	1163%	151%	13.0%
DET	1099%	45%	4.1%
DAL	1099%	145%	13.2%
MTL	1072%	-252%	-23.5%
FLA	1024%	41%	4.0%
VAN	994%	113%	11.4%
STL	956%	195%	20.4%
CAR	948%	48%	5.1%
ARI	933%	-116%	-12.5%
ANA	925%	105%	11.4%
NSH	881%	150%	17.0%
S.J	863%	-37%	-4.2%
TOR	828%	-89%	-10.7%
BOS	818%	76%	9.3%
T.B	756%	109%	14.4%
PIT	735%	66%	9.0%
CHI	674%	63%	9.4%
COL	653%	-13%	-2.0%
WSH	536%	144%	26.8%
L.A	482%	83%	17.3%
VGK	403%	162%	40.2%
BUF	324%	-78%	-23.9%

So, the model was significantly higher on N.J and CBJ than the market. Here’s the results of bets placed against each team:

Team	Risk	Bet result	ROI
BUF	3723%	100%	2.7%
CHI	1888%	-10%	-0.5%
ARI	1852%	129%	7.0%
L.A	1753%	287%	16.4%
PIT	1259%	350%	27.8%
BOS	1201%	89%	7.4%
WSH	1186%	-118%	-10.0%
ANA	1184%	-25%	-2.1%
DET	1158%	227%	19.6%
VAN	1138%	135%	11.9%
COL	1128%	33%	2.9%
NSH	1005%	42%	4.2%
STL	996%	125%	12.5%
T.B	993%	-176%	-17.7%
PHI	950%	-22%	-2.3%
EDM	934%	189%	20.2%
S.J	927%	289%	31.2%
MTL	908%	-207%	-22.8%
FLA	838%	-32%	-3.8%
TOR	791%	95%	12.0%
CAR	775%	153%	19.8%
NYR	739%	-26%	-3.5%
MIN	728%	183%	25.1%
DAL	708%	173%	24.5%
VGK	699%	72%	10.4%
NYI	663%	116%	17.5%
CGY	648%	-94%	-14.6%
CBJ	635%	-211%	-33.2%
OTT	544%	-8%	-1.5%
WPG	478%	-2%	-0.5%
N.J	181%	3%	1.7%

Here we see that the model has been much lower on BUF than the market.

Finally, we can look at the ROI as a function of game number. To see if the model performs better in the beginning or the end of a season:

We don’t really see any trends here, meaning the model performs equally well throughout the season.

Perspective

The model is not yet finished, but the initial results look very promising. There are still improvements to be made. I want to use a 3-year sample size for the pre-season model instead of a 1-year sample size. I could also add an age curve and define a value for rookies and players with small sample sizes. Right now, rookies are considered average by default.

…But before I make any improvements, I want to test a different pre-season model that’s closer to the in-season model in structure. That’s the goal of the next article. If this model isn’t better, then I will move forward with a sGAA-based pre-season model. Either way, I think the results are looking promising.

Data from www.Evolving-Hockey.com and www.sportsbookreviewsonline.com

Del dette:

Related

Leave a comment Cancel reply