Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

SUPER BOWL 2019

WINNER PREDICTION
Team Members
Matthew Littman, Vedant Kshirsagar,
Liz Li, Yu Zhang, Addy Dam
INTRODUCTION 01

PROJECT OVERVIEW 02

DATA SUMMARY 03 TABLE OF


CONTENTS 04 EXPLORATORY DATA
ANALYSIS

05 METHODOLOGY

06 CONCLUSION
01
INTRODUCTION
NFL PLAYOFF STRUCTURE
SUPER BOWL 2019
Broadcasted in more than 170 countries

2nd most-watched sporting event in the world AND the


most-watched sporting event in America

Draws an average TV audience of about 98.2 million viewers


world-wide

A 30-second commercial reportedly cost between $5 million


and $5.5 million

$6 billion gambled worldwide on the game


02
PROJECT OVERVIEW
OUR STEP-BY-STEP APPROACH

Transform raw data into Apply advanced methods


an understandable format to build prediction model
EXPLORATORY
WEB SCRAPING DATA ANALYSIS
DATA
Extract data from 1966 Run exploratory data
until 2019 from National PREPROCESSING analysis to gather insights METHODOLOGY
Football League official on trends & patterns
website nfl.com
03
DATA SUMMARY
DATA SOURCES

Stats for regular seasons Records: wins, losses, ties Past Super Bowl winners

Offense Defense Super Bowl


Win/Loss/Tie
Data Data 1 table
Winners
11 tables 8 tables 1 table

Consolidated Data
242 columns & 1,578 rows
DATA PREPROCESSING
Choose between 242 Split variables that
columns of stats from carries more than one
year 1966- 2019 piece of information

Understand Values
Attributes Cleaning

Raw Handling
Scraped Rename Missing
Data Values

Combine scraped
data from both NFL Rename some Use -99999 to replace
& Pro Football abbreviated missing values that were
Reference websites attribute names N/A before certain year
MERGED DATASET
Year 1966
242 NFL statistics

:. ...

Year 2018

Year 2019
MERGED DATASET
Play around Calculate VIFs
with bin size and p-values
04
EXPLORATORY ANALYSIS
WINNERS OVER 53 YEARS

Shows the total number of Super Bowls each


team won from 1966 till 2018
WINS & LOSSES OVER 53 YEARS

Show wins & losses of each


team from 1966 to 2018
TYPES OF TOUCHDOWNS

Shows the average percentage breakdowns


of different types of Offense Touchdowns
SUPER BOWL WINNERS VS.
EVERYONE ELSE

Run more
Throw further
Steal the ball more than you give it away
And sack more!
05
METHODOLOGY
METHODS USED

SMOTE LOGISTIC
Synthetic Minority Transforms output
Over-sampling data using the
Technique logistic sigmoid
MICE RFE function to return a
Increases the probability value of
Multiple Imputation number of Recursive Feature winner and loser in
by Chained under-represented Elimination Super Bowl.
Equations cases in our NFL
dataset due to Recursively
Runs multiple imbalanced dataset removes features,
regressions over builds a model
random sample and using the remaining
take of average of attributes and
those regressions calculates model
to fill in missing accuracy using
data. sklearn package.
DATA IMPUTATION METHODS

Linear Uses one missing variable as dependent variable


& remaining independent variables to predict
Regression the label.

KNN Predicts missing variables in our dataset


row-wise using nearby datapoints.

Runs multiple regressions over random sample


MICE and take of average of those regressions to fill in
missing data.
DATA IMPUTATIONS (MICE)

Po
() ol
()
ice
M
Data
frame With () Final
with
results
missing
values

Imputed datasets Results of statistical analysis


MINORITY OVERSAMPLING
1 0

1129 1129

Oversampling

Imbalanced class

1 0

53 1525
FEATURE SELECTION

RFE

LOGISTIC REGRESSION
Top 20
Features
LOGISTIC REGRESSION

AUC = 0.95
PREDICTED WINNERS
96%

76
%

WHAT OUR MODEL PREDICTS CURRENT NFL RESULTS


ELIMINATED TEAMS

CURRENT NFL RESULTS

WHAT OUR MODEL PREDICTS


Teams in the playoffs OUR PICKS
Sarah & Matthew
8/1

Addy & Vedant


100/1

Teams in the hunt

Liz
300/1
Who Matthew
wants to win!
9/2
BET
06
CONCLUSION
Best Methodology?
Significant attributes:

❏ Percentage of offense
receiving touchdowns MICE SMOTE RFE LR
❏ No. of offense rushing
attempts
❏ Turnover differentials

Recommendations
❏ Create a recommendation system (similar to Netflix’s) for buying players
❏ Predict player injuries based on past data
❏ Predict at the game level who will win & individual matchups based on
❏ Player data
❏ Weather data
❏ Home/away game
❏ Team history
What I knew about football
before this project
What I know about football
after this project
THANK YOU

You might also like