Professional Documents
Culture Documents
Super Bowl Prediction
Super Bowl Prediction
WINNER PREDICTION
Team Members
Matthew Littman, Vedant Kshirsagar,
Liz Li, Yu Zhang, Addy Dam
INTRODUCTION 01
PROJECT OVERVIEW 02
05 METHODOLOGY
06 CONCLUSION
01
INTRODUCTION
NFL PLAYOFF STRUCTURE
SUPER BOWL 2019
Broadcasted in more than 170 countries
Stats for regular seasons Records: wins, losses, ties Past Super Bowl winners
Consolidated Data
242 columns & 1,578 rows
DATA PREPROCESSING
Choose between 242 Split variables that
columns of stats from carries more than one
year 1966- 2019 piece of information
Understand Values
Attributes Cleaning
Raw Handling
Scraped Rename Missing
Data Values
Combine scraped
data from both NFL Rename some Use -99999 to replace
& Pro Football abbreviated missing values that were
Reference websites attribute names N/A before certain year
MERGED DATASET
Year 1966
242 NFL statistics
:. ...
Year 2018
Year 2019
MERGED DATASET
Play around Calculate VIFs
with bin size and p-values
04
EXPLORATORY ANALYSIS
WINNERS OVER 53 YEARS
Run more
Throw further
Steal the ball more than you give it away
And sack more!
05
METHODOLOGY
METHODS USED
SMOTE LOGISTIC
Synthetic Minority Transforms output
Over-sampling data using the
Technique logistic sigmoid
MICE RFE function to return a
Increases the probability value of
Multiple Imputation number of Recursive Feature winner and loser in
by Chained under-represented Elimination Super Bowl.
Equations cases in our NFL
dataset due to Recursively
Runs multiple imbalanced dataset removes features,
regressions over builds a model
random sample and using the remaining
take of average of attributes and
those regressions calculates model
to fill in missing accuracy using
data. sklearn package.
DATA IMPUTATION METHODS
Po
() ol
()
ice
M
Data
frame With () Final
with
results
missing
values
1129 1129
Oversampling
Imbalanced class
1 0
53 1525
FEATURE SELECTION
RFE
LOGISTIC REGRESSION
Top 20
Features
LOGISTIC REGRESSION
AUC = 0.95
PREDICTED WINNERS
96%
76
%
Liz
300/1
Who Matthew
wants to win!
9/2
BET
06
CONCLUSION
Best Methodology?
Significant attributes:
❏ Percentage of offense
receiving touchdowns MICE SMOTE RFE LR
❏ No. of offense rushing
attempts
❏ Turnover differentials
Recommendations
❏ Create a recommendation system (similar to Netflix’s) for buying players
❏ Predict player injuries based on past data
❏ Predict at the game level who will win & individual matchups based on
❏ Player data
❏ Weather data
❏ Home/away game
❏ Team history
What I knew about football
before this project
What I know about football
after this project
THANK YOU