Building An Expected Goals Model From Scratch

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Building an Expected

Goals model from scratch

Dan Morse

Github
Twitter
Introduction

● Expected goals are an increasingly common metric used to describe


individual and team performance in a hockey game
● Shots are not created equal — xG tries to account for that
● There isn’t one single model that everyone uses
○ Evolving Hockey, Hockeyviz, Natural Stat Trick, MoneyPuck

Why are we making yet another expected goals model?

● To provide xG data at the individual shot level


● Because I wanted to
Expected by whom exactly?

● More than a decade’s worth of NHL games


featuring over 1 million unblocked shots and
80k goals
● Mathematical models can take into account
any number of variables
● Variables, in this case, represent context
○ Where is the shot coming from?
○ Was the shot a rebound? A rush chance?
○ Was there a power play?
○ Did it come right after a takeaway?
The context of a model

● Context is added
through feature
selection
● The model is trained
with varying features —
only the best make it in
the final product
● Some context does
have to be estimated
Different models include different context

● Does the model account for who is shooting the puck?


○ The Hockey Graphs model (Sprigings & Toumi) does
● The hockeyR model excludes shooter talent — can infer shooter talent via
the model instead
● Alex Ovechkin since
2010-11 shoots over
2 percentage points
higher than his
expectation
Different models include different context
● How do models account for game strength? (Even, PP, PK, etc.)
● Matthew Barlowe’s public model includes strength as a variable
● Josh & Luke Younggren’s
model for Evolving
Hockey is four separate
models — even strength,
power play, shorthanded,
and empty net
● hockeyR is split into two
models — 5-on-5 and
special teams
Making the model

● Many earlier models used logistic regression


○ Hockey Graphs, Corsica (Perry), and Barlowe all used logistic regression
● Later models have tended towards gradient boosting
○ Evolving Hockey, MoneyPuck, and hockeyR

● Extreme Gradient Boosting


is a form of supervised
machine learning
● Rachel Tatman provides a
great introduction here
Evaluating the model

● Commonly evaluated using


log-loss and area under the
curve
● How similar are expected goal
probabilities to observed goal
probabilities?
● Performs well on lower
probabilities
● Underestimates higher
probabilities
Evaluating the model

● At the player
level, this model
does quite well at
predicting
season-long goal
totals
Evaluating the model

● Team level
expected goals
also predict actual
goals quite well,
though not quite
as well as at the
player level
Final Thoughts

● Expected goals can help account for the difference in danger between
different types of shots
● Building a model involves many human choices
● Expected goals models can watch every single minute of every game

Thanks for your time!

You might also like