Blog 12: Predicting the past

As the headline say, I will try and predict past results in this blog. That’s obviously pretty easy, but I will try and do it without using hindsight. The model I’m using is a fairly simple one based on LS-GAA. Before we get to predict team results, I will take one last look at the individual player level.

My model has a descriptive component (LS-GAA) and a predictive component (pLS-GAA). The descriptive LS-GAA should theoretically equal the player’s goal contribution above or below league average. If a player has a positive LS-GAA he contributes to the team making the playoff.

The projected-LS-GAA tries to predict future performances of a player. It’s based on weighted data from the previous 3 seasons.

So I have calculated the pLS-GAA for 2017/2018, 2018/2019 and 2019/2020 for every player and compared it with the actual LS-GAA for those seasons. It’s the same process as in blog 10. I have just added data from two more seasons. Here’s the result:

And if we remove the goaltenders, the correlation unsurprisingly gets better:

I can refine the data by valueing the components differently:

pLS-GAA adj. = 0.46*pGK + pEVO + 0.83*pEVD + 0.95*pPP + 0.9*pSH + 1.7*pPEN

This doesn’t mean that goaltending is less important than the the other components. Goaltending is just harder to predict, so the model works better if you expect some regression towards average.

Here’s the graph with the adjustments:

Here it is without the goalies:

With that introduction, let’s now look at the team projections. I will do team projections from all 3 seasons based on both the adjusted and unadjusted model.

The first step is to assign players to the teams, and it’s a prediction so I can’t use hindsight. So every player is assigned to the team where he started the season. I can’t predict which players get traded.

When every player has been assigned to a team you find the team p-LS-GAA by adding the individual numbers. Then we use the correlation between goal differential and standing points to convert team p-LS-GAA into projeted points. Now the projected points can be compared with the actual points – the 2019/2020 season is prorated to 82 games. I’m also comparing with Dom Luszczyszyn’s model (The Athletic).

2017/2018

TeamPointspPoints
(LS-GAA)
pPoints
(LS-GAA adj.)
pPoints
(Dom’s model)
NSH11795.896.095.8
WPG11490.191.395.0
T.B11395.797.995.9
BOS11297.997.196.8
VGK10983.582.283.3
TOR10589.590.295.0
WSH105109.6103.0100.5
ANA10191.187.393.7
MIN101100.2101.795.7
PIT100105.2106.0103.0
S.J100103.4103.393.7
L.A9892.491.192.8
PHI9888.788.390.5
CBJ9799.296.695.3
N.J9785.684.878.2
FLA9685.187.393.1
COL9577.781.684.3
STL9492.191.893.3
DAL9292.494.095.4
CGY8490.489.693.0
CAR8392.093.594.9
NYI8092.994.193.4
EDM7895.592.093.0
NYR7797.196.490.8
CHI7691.190.095.6
DET7381.184.281.0
VAN7376.477.380.8
MTL7196.192.798.4
ARI7085.585.884.9
OTT6788.589.889.1
BUF6276.580.985.5

2018/2019

TeamPointspPoints
(LS-GAA)
pPoints
(LS-GAA adj.)
pPoints
(Dom’s model)
T.B12896.8100.5104.6
BOS10796.596.2101.0
CGY10788.588.093.2
WSH10497.895.494.4
NYI10378.882.784.9
S.J101100.298.698.6
NSH100103.2101.2104.8
PIT10098.698.1100.2
TOR10097.998.9102.9
CAR9992.694.691.8
STL9988.993.191.5
WPG9997.196.9100.6
CBJ9898.796.2101.1
MTL9682.584.584.2
DAL9389.788.090.8
VGK9397.195.393.5
COL9089.087.886.7
ARI8689.988.589.8
FLA8685.786.895.1
CHI8482.982.990.2
MIN8398.198.692.9
PHI8293.595.396.5
VAN8179.580.977.6
ANA80101.694.497.0
EDM7988.787.386.4
NYR7885.183.781.8
BUF7681.283.086.8
DET7478.078.174.6
N.J7289.189.685.3
L.A7192.489.292.0
OTT6473.478.777.4

2019/2020

TeamPointspPoints
(LS-GAA)
pPoints
(LS-GAA adj.)
pPoints
(Dom’s model)
BOS117.198.396.5103.7
STL108.697.698.9101.4
COL107.892.592.393.9
T.B107.8111.1108.9108.3
WSH107.0104.3102.096.7
PHI105.890.692.391.5
PIT102.293.894.999.7
VGK99.3100.599.1100.1
CAR97.796.297.3100.4
DAL97.497.293.395.9
NYI96.584.885.089.2
EDM95.988.188.983.3
CBJ94.993.194.787.3
TOR94.9103.3103.6105.2
FLA92.799.998.493.3
NSH92.7103.6100.2100.6
VAN92.785.886.084.9
CGY92.589.490.396.0
NYR92.585.384.284.9
WPG92.493.193.491.8
MIN91.589.691.893.0
ARI86.789.187.285.1
CHI84.389.588.786.6
MTL82.084.386.590.1
BUF80.874.978.080.2
N.J80.885.888.793.0
ANA77.495.889.288.5
L.A75.081.880.578.8
S.J73.893.494.094.1
OTT71.671.877.971.2
DET45.071.373.473.7

The 2017/2018 was difficult to predict. We can compare the quality of the models by looking at the average difference between actual points and projected points:

YearLS-GAALS-GAA adj.Dom’s model
2017/201811.911.912.0
2018/20198.68.18.1
2019/20207.77.47.2
Total18.018.116.2

These simple projection models based on LS-GAA are comparable to Dom’s model. I still think the model can be made a lot better, but you can never make a perfect NHL model. There’s great parity in the NHL and the nature of the game is unpredictable.

When we look at the total difference in points over the 3 years, Dom’s model is better than the LS-GAA based models. This indicates that the mistakes in my models accumulate more over time.

In the next blog I will try and better the goaltender projections, so I can put more weight on that component.

Conclusion:

  • This was the first draft of my projection model. There’s still a lot of tweaks and refinements to be made, but the first draft is pretty good.
  • My model is similar to Dom Luszczyszyn’s model. They are both based on player stats and they both have a descriptive component (LS-GAA and GSVA) and a predictive component (pLS-GAA and GS). My model uses totals where Dom’s model uses rates, so he has to estimate the time on ice for each player.
  • I will try and have a projection model ready for the playoffs (knock on wood), but I won’t post a model I don’t trust.

Stay safe and remember to be kind

All data from www.evolving-hockey.com and www.theathletic.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: