Bnad Final Project

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

College Football

Profitability Analysis
Scenario 3

Braden Miller, Carson Davis, Jackson Samuelson, Patrick


Bessey, & Ryann Thomas
GROUP 18
1

Table of Contents
Introduction ........................................................................................................................... 2
Data Exploration .................................................................................................................... 2
Statistical Analysis .................................................................................................................. 2
Discussion of Model Development: .................................................................................................. 2
Regression Equation: ....................................................................................................................... 3
R2 ......................................................................................................................................................................... 3
Se.......................................................................................................................................................................... 3
Test of Joint Significance: ................................................................................................................. 3
Significance Testing and Coefficient Interpretations: ........................................................................ 3
Rain (Inches) ........................................................................................................................................................ 3
Snow (Inches) ...................................................................................................................................................... 4
Tailgating (d) ........................................................................................................................................................ 4
Power 5 (d) .......................................................................................................................................................... 4
Score Differential................................................................................................................................................. 4
TV (d) ................................................................................................................................................................... 5
Checks for Common Violations: ....................................................................................................... 5
Relevance of Findings ............................................................................................................. 5
Recommendation ................................................................................................................... 6
Conclusion .............................................................................................................................. 6
Appendix ................................................................................................................................ 7
2

Introduction
We were given data on college football individual games, stadiums, and weather conditions.
The data set for games included teams, ranking, conference, if the game was televised, and the
results of the game score. We decided to use the teams conference and develop the data to
show whether the teams playing were in the Power 5 Conferences or not. We also decided to
show the score differential of the game as well as if it was on TV or not. We then went through
the stadium data, matched stadiums to games, and divided the location into city and state so
we could get a better picture geographically of attendance. Finally, the last sheet included
temperatures and amount of rain and snow at each game. We chose to look at the snow and
rain data for games. For ClubCorp, we were asked to identify factors that could contribute to
higher profitability for a stadium club on game day. We decided to try and predict profitability
by developing a regression model to predict game attendance. If games had more attendees at
them, the profits of the games would be higher. We discovered the three factors with the
largest impact on attendance, and thus, profitability of the stadiums, are the amount of rain,
whether there was tailgating available, and if the game was televised. Low amounts of rain,
tailgating, and televised games yield the most attendance, and thus ClubCorp should look at
these factors to predict profitability. The Denny Stadium in Tuscaloosa, AL, Beaver Stadium in
University Park, PA, and Kyle Field in College Station, TX seem to be stadiums that are likely to
be profitable based on their attendance.

Data Exploration
This dashboard includes a bar graph showing percent of games televised by conference, a
scatter plot showing average attendance by total games televised, and two dot maps. The first
dot map (red gradient) shows average score differential colored by average attendance, and
the second dot map (red and blue) shows average attendance by power 5/non-power 5. These
visualizations help reveal common trends across data. Two of the biggest pattens seen are
power 5 teams on average having higher average attendance and televised games also having
higher average attendance.

https://public.tableau.com/app/profile/patrick.bessey1097/viz/Bnad277Final_1651867037081
0/Dashboard1?publish=yes

Statistical Analysis
Discussion of Model Development:
To develop the final regression model to predict Stadium Attendance, we went through a
process of filtering data, refashioning variables, and attempting other models. We had to
combine three sheets of data and rematch them together based on unique game ID. We
matched weather conditions, game variables, and stadium variables together and then chose
variables to test. Initially, we noticed many variables were categorical. While developing
regressions for these types of variables, dummy variables had to be incorporated. Doing this, if
there was tailgating at the stadium the value would be 1 and if there was no tailgating, the
value would be 0. We also had to do this for if the game was on TV and similarly if the game
3

was on TV the value would be 1 and if it was not the value would be 0. We had to incorporate
dummy variables because you cannot incorporate categorical data in regressions. While
developing regression models for fill rate we incorporated the following variables: Rain in
inches, Snow in inches, If the game was on TV or not, if there was Tailgating for the game. To
develop the best model of predicting attendance, which in turn predicts profitability for the
stadium, we tried to use both Stadium Fill Rate and Raw Stadium Attendance. Our model for
stadium attendance was more accurate than the one for fill rate. After running regressions for
the individual variables, we found that Rain in inches, Tailgating, and if the game was on TV are
all significant indicators of stadium attendance while Snow in inches and Score Differential were
not.

Regression Equation:
.
𝑆𝑡𝑎𝑑𝚤𝑢𝑚 𝐴𝑡𝑡𝑒𝑛𝑑𝑎𝑛𝑐𝑒 = 20534.45 − 1086.819𝑅𝑎𝑖𝑛(𝐼𝑛𝑐ℎ𝑒𝑠)A − 549.919𝑆𝑛𝑜𝑤(𝐼𝑛𝑐ℎ𝑒𝑠)A +
34120.869𝑇𝑎𝑖𝑙𝑔𝑎𝑡𝑖𝑛𝑔(") A + 21581.239𝑃𝑜𝑤𝑒𝑟 5(")A − 0.11(𝑆𝑐𝑜𝑟𝑒 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙) +
7372.719𝑇𝑉(")A

R2 = 0.73
This means that we are 73% of the way towards perfectly predicting stadium attendance using
this model, on average and all else constant.

Se = 13,146.96
The Standard Error means that the model’s predictions of stadium attendance are off by an
average of 13,146.96 or 13,147 attendees, on average and all else constant.

Test of Joint Significance:


H0: None of the explanatory variables are significant to Attendance.
Ha: At least one of the explanatory variables is significant to Attendance.

We conducted a Regression Analysis to test our hypothesis. Because the F-Value (2,843.85) was
larger than the F-Critical Value (0), we can reject the null hypothesis, and conclude with nearly
100% certainty that there is at least one explanatory variable that is significant out of the
variables tested. Since we were able to determine the significance of at least on variable
influencing stadium attendance, we will conduct hypothesis testing to determine which
variables were significant out of Rain (Inches), Snow (Inches), Tailgating(d), Power 5(d), Score
Differential, and TV(d).

Significance Testing and Coefficient Interpretations:


Rain (Inches)
H0: Rain (Inches) is not a significant predictor of Stadium Attendance.
HA: Rain (Inches) is a significant predictor of Stadium Attendance.
4

After performing a Hypothesis test, we found the P-Value (0.025) was less than alpha (.1) thus,
we can reject the null and conclude that the amount of Rain in inches is a significant predictor
of Stadium Attendance.

As the amount of Rain increases by 1 inch, Stadium Attendance decreases by 1,087 attendees
on average and all else constant.

Snow (Inches)
H0: Snow (Inches) is not a significant predictor of Stadium Attendance.
HA: Snow (Inches) is a significant predictor of Stadium Attendance.

After performing a Hypothesis test, we found the P-Value (0.423) was greater than alpha (.1)
thus, we fail to reject the null hypothesis.

Tailgating (d)
H0: If there was Tailgating at the game or no Tailgating is not a significant predictor of Stadium
Attendance.
HA: If there was Tailgating at the game or no Tailgating is a significant predictor of Stadium
Attendance.

After performing a Hypothesis test, we found the P-Value (0) was less than alpha (.1) thus, we
can reject the null and conclude that Tailgating is a significant predictor of Stadium Attendance.

Games played that have tailgating have an average stadium attendance that is 34,120.86 or
34,121 attendees higher than games without tailgating, on average and all else constant.

Power 5 (d)
H0: If the teams were in the Power 5 Conferences or not in the Power 5 Conferences is not a
significant predictor of Stadium Attendance.
HA: If the teams were in the Power 5 Conferences or not in the Power 5 Conferences is a
significant predictor of Stadium Attendance.

After performing a Hypothesis test, we found the P-Value (0) was less than alpha (.1) thus, we
can reject the null and conclude that being in the Power 5 Conferences is a significant predictor
of Stadium Attendance.

Games played with teams in the Power 5 Conferences have an average stadium attendance
that is 21,581.23 or 21,581 attendees higher than if the teams were not in the Power 5
Conferences on average and all else constant.

Score Differential
H0: Score Differential is not a significant predictor of Stadium Attendance.
HA: Score Differential is a significant predictor of Stadium Attendance.
5

After performing a Hypothesis test, we found the P-Value (0.408) was greater than alpha (.1)
thus, we fail to reject the null hypothesis.

TV (d)
H0: If the game was streamed on TV or not streamed on TV is not a significant predictor of
Stadium Attendance.
HA: If the game was streamed on TV or not streamed on TV is a significant predictor of Stadium
Attendance.

After performing a Hypothesis test, we found the P-Value (1.274 E-65) was less than alpha (.1)
thus, we can reject the null and conclude that the game being on TV is a significant predictor of
Stadium Attendance.

Games streamed on TV have an average stadium attendance that is 7,372.70 or 7,373


attendees higher than games not streamed on TV on average and all else.

Checks for Common Violations:


To check for the four common violations (non-linear patterns, heteroskedasticity,
multicollinearity, and endogeneity) that are associated with linear regression we looked at
residual plots and ran a correlation matrix between the variables and residuals. The residual
plots were insignificant for the dummy variables, so we looked at the residual plots for the
three variables of Snow (Inches), Rain (Inches), and Score Differential to see if there was any
evidence of non-linear patterns or heteroskedastic violations. Rain showed evidence of
Heteroskedasticity and began with higher variability and as the rain in inches increased showed
low variability (Figure 3). This means that the coefficients are still reasonably unbiased, and we
can still use the regression equation to make predictions. Also, this means that the standard
errors may not be accurate, and the T and F tests are no longer accurate. But we can still use
the regression for predictions, it just means that we may not be able to trust the coefficient
interpretations. There was no evidence of either violation for Snow (Inches) or Score
Differential. We then ran the correlation matrix for all explanatory variables used in the model
and residuals (Figure 5). We found no evidence of Endogeneity or Multicollinearity. The
correlation between variables was less than 0.8 for all variables and the correlation between
residuals was nearly 0. This means that the coefficients of the variables were unbiased and that
there was no excluded variable bias.

Relevance of Findings
Through running regressions of different variables, we found significant and insignificant
variables for predicting game attendance. An important trend for the Club Corp to know is that
they can control a variable that helps attendance increase. They can incorporate a tailgate,
which is a significant indicator of game attendance. Something our team found interesting was
that snow in inches was not a significant indicator of game attendance. As a group we felt like
snow would be an important indicator of game attendance. In our groups research we found
6

that Snow in inches was insignificant. Also, although Rain in inches is significant, it showed
evidence of Heteroskedasticity. This means that the data could be untrustworthy, but to find
out if the data is untrustworthy, you can use the white’s standard error.

Recommendation
Club Cop can use the various data and insights we have found when trying to predict
attendance of a college football game. Rain in inches, tailgating, if the school is in a power 5
conference, and if the game is on TV, are all variables that are found significant in our
regression model. Also, looking at the Tableau you can look at the variety of data represented
in maps and charts to help better understand and make predictions of stadium attendance. For
future testing, we would like to incorporate the ranking of the home team into the data. Also,
another avenue that would be interesting to test is the score deferential related to the fill rate
of the stadium. To conclude, The Denny Stadium in Tuscaloosa, AL, Beaver Stadium in
University Park, PA, and Kyle Field in College Station, TX seem to be stadiums that are likely to
be profitable based on their attendance.

Conclusion
In conclusion, we found that the main factors that contribute to the profitability of college football
events is attendance, whether the team is a power five team, and if the game is televised. Our
team analyzed the attendance in greater detail and found that the main components behind
having a higher attendance are the amount of rain and whether there is tailgating available at
the event. We found this using data from various college football games and ran regression
models and tests of significance to narrow down the key findings. We recommend that
ClubCorp focuses on these factors when trying to gain the highest profitability for a stadium club
on game day. Along with that, some top performers that would be a profitable investment for
ClubCorp are The Denny Stadium, The Beaver Stadium, and Kyle Field. If you have any further
questions or concerns, please feel free to contact us a group18@gmail.com.
7

Appendix
Figure 1- Regression Output for Prediction of Attendance

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.854456
R Square 0.730094
Adjusted R Square 0.729838
Standard Error 13146.96
Observations 6315

ANOVA
df SS MS F Significance F
Regression 6 2.94923E+12 4.91539E+11 2843.854928 0
Residual 6308 1.09029E+12 172842453.5
Total 6314 4.03952E+12

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0%Upper 95.0%
Intercept 20534.45 403.3524789 50.90943766 0 19743.73983 21325.16 19743.74 21325.16
Rain (Inches) -1086.82 485.572033 -2.23822157 0.025241363 -2038.704138 -134.931 -2038.7 -134.931
Snow (Inches) -549.918 686.8764893 -0.8006073 0.423389177 -1896.429875 796.5932 -1896.43 796.5932
Tailgating Dummy 34120.86 441.0523157 77.36238313 0 33256.24567 34985.47 33256.25 34985.47
Power 5(d) 21581.23 366.1291413 58.94432039 0 20863.49576 22298.97 20863.5 22298.97
Score Differential -0.1061 0.128295009 -0.82701235 0.408261345 -0.357603411 0.1454 -0.3576 0.1454
TV Dummy 7372.707 436.5470102 16.88868865 1.27481E-62 6516.925913 8228.487 6516.926 8228.487

Figure 2-Residual Plots for Common Violations: Snow (Inches)

Snow (Inches) Residual Plot


50000

0
Residuals

0 1 2 3 4 5 6 7 8
-50000

-100000
Snow (Inches)

Figure 3-Residual Plots for Common Violations: Rain (Inches)

Rain (Inches) Residual Plot


50000
Residuals

0
0 0.5 1 1.5 2 2.5 3 3.5 4
-50000

-100000
Rain (Inches)
8

Figure 4-Residual Plots for Common Violations: Score Differential

Score Differential Residual Plot


60000

40000

20000
Residuals

0
0 5 10 15 20 25 30 35 40 45 50
-20000

-40000

-60000
Score Differential

Figure 5-Correlation Matrix for Common Violations

Residuals Rain (Inches) Snow (Inches) Tailgating Dummy Power 5(d) Score Differential TV Dummy
Residuals 1
Rain (Inches) -2.14342E-16 1
Snow (Inches) 1.9549E-16 0.013791416 1
Tailgating Dummy -5.99987E-15 -0.007381734 -0.003157162 1
Power 5(d) -1.04608E-14 -0.003313301 -0.015424441 0.400264022 1
Score Differential 3.36367E-16 -0.008817121 -0.001743721 -0.012141064 -0.019551203 1
TV Dummy -4.41347E-15 0.009402738 0.014478656 0.161267144 0.196927052 -0.002947916 1

You might also like