The goal in this article is to determine whether xG data is impacted by where the games are being played. Do certain arenas impact the shot location data one way or the other?
All data for this article is 5v5 data from www.evolving-hockey.com
Henrik Lundqvist vs. Tuukka Rask
A big part of the inspiration for this article came from looking at Henrik Lundqvist’s GSAx numbers. Either Lundqvist is superhuman, or there’s something off about the expected goals numbers for the New York Rangers. Here’s a list of the top 10 goalies in terms of 5v5 GSAx from Evolving-Hockey:
If we instead look at the top 10 GSAA goalies, we see another picture. Lundqvist is still really good, but nowhere near superhuman.
I’ve also looked at the shot quality faced, and I have defined Shot Quality (SQ) as expected goals per fenwick. This is a metric I will use numerous times during this article. If we look at the shot quality faced amongst goalies with at least 200 games played, we find Lundqvist at the very top. So, according to Evolving-Hockey’s xG-model, Lundqvist has faced the hardest shots – 6.1 percent of all unblocked shots against is expected go in.
Here’s the top 10 in SQ (GP>200):
And here’s the other end of the spectrum – the bottom 10 in SQ (GP>200):
Some teams obviously play a higher risk game, and therefore allows higher danger chances, so we could just accept the data. However, if we look at the numbers Home vs. Away, we see some interesting trends.
The table below shows the shot quality faced being much higher at home than it is on the road for Henrik Lundqvist. This would suggest that NYR plays a much more high-risk game at home, but then we would expect a much lower save percentage as well. We don’t see that, and so the GSAx at home is much higher than the GSAx on the road. The numbers show a great Lundqvist on the road, and a completely unreal Lundqvist at home. Maybe the home ice data is inflated.
If we do the same analysis for Tuukka Rask, we see a totally different picture. His numbers on the road is very comparable to Lundqvist’s road numbers, but at home he’s way, way worse – in fact he’s below average in terms GSAx at home.
These examples could indicate, that there’s something wrong with the xG data. That shot location tracking is different depending on the arena. However, this is just anecdotal evidence, so I will take a more general approach now.
You would expect some correlation between Shot quality and goalscoring, so if the Shot quality is higher at certain arenas, you would expect the goalscoring to be higher as well. To test this, I’ve defined Shot result (SR) as goals per fenwick.
So, Shot quality is simply expected goals per fenwick and Shot result is actual goals per fenwick. How well does SQ and SR correlate? I’ve looked at 5v5 team data since the 2007/2008 season. The graph below shows how Shot quality correlates with Shot result for each team in every season:
I honestly expected a greater correlation, but obviously goaltending and shooting ability plays an integral part in goalscoring as well.
Let’s now look at the correlation at home versus on the road. I’m using overall Shot quality and overall Shot result, so it’s:
SQ = (xGF+xGA) / (FF+FA)
SR = (GF+GA) / (FF+FA)
This means that both teams shooting is accounted for. If there’s no Arena effect then we should see similar correlation Home and Away.
The correlation Away is much greater than it is at home. This indicates that there is an Arena effect on xG and therefore on the shot location data.
I’ve also looked at the correlation between Shot quality and Shot result, when I’m adding the data from all seasons. In other words, it’s how each team has performed from 2007 to 2020 – Atlanta, Winnipeg and Vegas are included even though their dataset is smaller.
Now, we see the Arena effect much clearer. When we use a sample size this large the noise from goaltending and shooting ability becomes much smaller.
How can we then define this Arena effect? I’ve simply defined Arena effect as the Shot quality at home minus the Shot quality away:
Arena effect = SQ(Home) – SQ(Away)
Arena effect = (xGF(H)+xGA(H)) / (FF(H)+FA(H)) – (xGF(A)+xGA(A)) / (FF(A)+FA(A))
The thought process is that the tracking differences accumulate at home, but evens out on the road. The Away data can therefore be seen as a baseline.
With this definition of Arena effect, we can compare teams. Here are the top 10 teams in terms of Arena effect.
Top 10 – Arena Effect:
|Team||Season||SR Home||SR Away||SQ Home||SQ Away||Arena Effect|
This list is dominated by early NYR teams, giving a plausible explanation for Henrik Lundqvist’s superhuman GSAx stats.
Bottom 10 – Arena Effect:
|Team||Season||SR Home||SR Away||SQ Home||SQ Away||Arena Effect|
Not quite as dominated by one particular team, but we see T.B appear numerous times. Here is the overall result, if we look at all seasons combined
Arena Effect – All seasons:
|Team||SR Home||SR Away||SQ Home||SQ Away||Arena Effect|
The New Yorker teams tops the list, whereas T.B, MIN and BUF have the lowest Arena effect.
If we look at the top and bottom teams from the first two tables, we won’t find any current teams. This could indicate, that the shot tracking has become more streamlined, and it’s all a problem of the past.
I’ve therefore looked at the average Arena effect (positive or negative) over time to see if the effect is trending downwards:
The effect is definitely smaller now compared to earlier, but it’s still pretty significant. And something crazy happened in the shortened 2012/2013 season. Obviously, the sample was smaller, but I don’t think that alone can explain such a spike in Arena effect.
Arena Effect and GSAx
Previously in the article I looked at specific goalies, and how Shot quality affected their numbers. Now I will take a more general approach, and look at how Arena effect correlates with a team’s home ice GSAx:
There’s a pretty good correlation, and if we look at data from all seasons combined instead of single-season data, the correlation is even greater:
Overall, there seems to be a really good correlation between home ice GSAx and Arena effect, so people should be very cautious when using GSAx as their preferred goalie metric.
Arena Effect and GAR
Clearly these Arena effects doesn’t just impact goaltending, so now I will turn my attention towards GAR – more specifically towards the even strength components of GAR (offense and defense).
Unfortunately, I can’t isolate GAR numbers in terms of home ice, so I will have to look at both home and away data. Here’s the correlation between GAR_EVO (offense) and Arena effect on the team level:
So, there’s no correlation between the two at the team level. Theoretically, Arena effects could still impact the GAR_EVO of certain type of players, but at the team level there appear to be no impact.
And, here’s the correlation between Arena effect and GAR_EVD (defense):
Now we see some inverse correlation, meaning a high Arena effect decreases the team GAR_EVD, whereas a low Arena effect increases the GAR_EVD. This isn’t particularly surprising, since GAR_EVD relies heavily on xGA.
I’ve done the same for the xGAR model with similar results, although not as clear:
Finally, I did the same analysis of my own model (sGAA), which you can read about here. The results were pretty much the same:
The findings in this article raise quite a few questions. First of all, why do we see these differences in Arena Effect? The rink dimensions are exactly the same in every NHL Arena, so you really shouldn’t see such big differences in the shot location data. The only viable explanation I can come with lies in the tracking process. Every game is tracked manually with a specific team of trackers associated with each Arena. From a scientific standpoint, this isn’t a great way to track data, since even small tracking differences accumulate onto specific teams. Instead they should randomize the tracking teams, so that the differences would even out.
Another question raised by these findings, is how to use and interpret this newfound information. I’ve already shown that there’s a correlation between Arena effect and GSAx and there’s an inverse correlation between Arena effect and the defensive GAR components.
So, how does the Arena effect affect xGF%? A high Arena effect increases the xG totals, whereas a low Arena effect decreases the xG totals. The Arena effect should therefore primarily impact the extremities. A high Arena effect pushes players further away from the average (xGF% = 50), and a low Arena effect pulls players towards league average.
On a team like NYR with a high Arena effect this means that a player like Artemi Panarin has a xGF% that seems better than it really is, whereas a player like Kaapo Kakko seem worse than he really is.
One analytical approach to all of this, is to only look at road data. This way the tracking differences should even out. It’s easy to do on www.naturalstattrick.com, but unfortunately home/away is not a sorting criteria on www.Evolving-hockey.com.
The positive in all of this, is that the Arena effects seem to be fairly consistent from year to year, so it’s possible to adjust for it. I’ve already tried factoring in Arena effects when looking at GSAx, and it does increase the repeatability of GSAx. Goaltending is still unpredictable, but at least this helps ever so slightly.
In the near future shot tracking will become automated and all of this will be obsolete, but until then adjustments to the current xG models are needed.
Besides from using data from www.evolving-hockey.com, I’ve also calculated the Arena effect using xG-models from www.naturalstattrick.com and www.moneypuck.com.
The findings from NaturalStatTrick were very similar, but MoneyPuck already accounts for the arena differences. However, the approach used on MoneyPuck is very different, so I still think some adjustment is needed. For goalie evaluations I would definitely recommend using MoneyPuck or only looking at road data.
All data in this article is from www.Evolving-hockey.com.
Also thanks to www.naturalstatrick.com and www.Moneypuck.com.
5 thoughts on “Indications that shot location data is flawed – Depends on where games are being played”