Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

This report analyzes college football stadium profitability to be

College Football Analysis


ClubCorp

Group 83 -Brooklyn Rule, Kylie Merritt, Edan


Levy, Paola Mendivil, and Cayla Rosenberg
Table of Contents
Introduction............................................................................................................................2
Data Exploration.....................................................................................................................2
Visualizations...................................................................................................................................2
Insights and Recommendations........................................................................................................2
Data Overview and Statistical Analysis...................................................................................2
Data Variables.................................................................................................................................2
Attempted Models...........................................................................................................................2
Equations.........................................................................................................................................3
Final Population Regression Equation:...............................................................................................................3
Estimated Sample Regression Equation:............................................................................................................3
Interpretations.................................................................................................................................3
Test of Joint Significance..................................................................................................................3
Tests of Individual Significance.........................................................................................................3
Coefficient Interpretations...............................................................................................................3
Common Violations..........................................................................................................................4
Relevance of Findings..............................................................................................................4
Recommendation....................................................................................................................4
Conclusion...............................................................................................................................4
Appendix.................................................................................................................................5

1
Introduction
In the following report we examined over 6,300 college football games to search for factors that
heavily affect game-day attendance to make a recommendation regarding ClubCorp’s potential
next location. We utilized a regression model to analyze potential profitability of college football
stadiums. Variables that were analyzed include: win/loss ratio, conference, and whether the team
tailgated. We used these to predict attendance rates, as attendance rates have a direct influence
on profitability. Next, we visualized these findings to create a recommendation. We recommend
ClubCorp look at Oklahoma, Nebraska, and Pennsylvania as the top locations for a potential new
location. These stadiums contain the factors that we found to lead to the highest increase in
attendance.

Data Exploration
Visualizations
We have created a map showing each state and their average attendance, average capacity, as
well as listed the top 5 states with the highest average attendance. A line graph showing the
win/loss ratio compared to the average attendance of the game. Lastly, a map of the top 3 states
and conferences with the highest average attendance. All of these visualizations can be found
here on Tableau

Insights and Recommendations


The visualizations created aligned with the statistical data analysis helped us come to the
following conclusion. ClubCorp should focus on average attendance and consistency when
looking for a stadium to place a club in. Therefore, we suggest Pennsylvania, Nebraska, and
Oklahoma as the top 3 to stadiums to look at for future clubs, as we believe they result in the
highest profitability.

Data Overview and Statistical Analysis


Data Variables
Shown in (Table 1).

Attempted Models
We ran 5 separate regressions to determine which model was the best at predicting attendance.
We found each regression model’s Adjusted R , standard error, as well as the p-values of the
2

independent variables to determine the best model. Our final model was the one with the lowest
standard error and highest Adjusted R . (Figure 1) We found that early game time slots, rain,
2

maximum temperature, tailgating, Power 5 team, winning record, TV broadcasting, and a ranked
opponent were the best at predicting attendance. Snow was the only explored variable that
wasn’t a significant predictor. This was explored in earlier regression model not our final one.
Attendance is a great indicator of profitability since the more attendees at a game lead to
increased ticket sales, concession sales, and merchandise sales, which all lead to increased
profitability.

2
Equations
Final Population Regression Equation:
Attendance=β 0 + B1 ( Early ( d )) + B2 (Rain ¿ ¿(d ))+ B3 ( Maximum Temperature )+ B 4 ( Tailgating( d )) + B 5 ( Power 5 (d ) ) + B6

Estimated Sample Regression Equation:


^
Attendance=19892.68−3308.85 ( Early (d ) )−930.93(Rain¿¿ (d ))+93.91 ( Maximum Temperature )+ 33935.94 ( Tailg

Interpretations
The R of 0.7412 means we are 74.12% of the way toward perfectly predicting attendance for a
2

college football game using this model.

The standard error of 12875.38 tells us that our predictions are off by 12875.38 attendees on
average.

Test of Joint Significance


H0: None of the explanatory variables significantly affect the attendance of a college football
game
HA: At least one of the explanatory variables significantly affect the attendance of a college
football game

Tests of Individual Significance


Since each explanatory variable had a p-value < 0.1 each individual explanatory variable was
determined to be a significant predictor of attendance.

Coefficient Interpretations
 If a game is in the Early time slot, then attendance decreases by 3309 attendees more on
average then not early time slot, and all else constant.
 If a game has Rain, then attendance decreases by 931 attendees more than if there was no
rain on average, and all else constant.
 As Maximum Temperature increases by 1℉, attendance increases by 94 attendees on
average, and all else constant.
 If a game has Tailgating then attendance increases by 33936 attendees more than if there
was no tailgating on average, and all else constant.
 If the home team is in a Power 5 conference, then attendance increases by 21264
attendees more than a non-Power 5 team on average, and all else constant.
 If the home team has a Winning Record attendance increases by 2473 attendees more
than a team without a winning record on average, and all else constant.
 If the game is Not on TV, then attendance decreases by 7130 attendees more than a game
on TV on average, and all else constant.

3
 If the home team is playing a Ranked Opponent, then attendance increases by 3850
attendees more than playing a non-ranked opponent on average, and all else constant.

Common Violations
There were worries about multicollinearity early in the regression process, however after
creating a correlation matrix including all the independent variables, we found that none of the
variables had a correlation above .8 or below -.8 to indicate multicollinearity. None of the
variables had a correlation with the residuals above .8 or below -.8. (figure 2)
The residual plots showed random scatters so there is no evidence pointed towards
heteroskedasticity or non-linear patterns. (Figure 3)

Relevance of Findings
As shown in the tableau dot map, we observed the stadiums that have the highest attendance
have tailgating and are in one of the power-5 conferences. Our regression model revealed that a
game with an early time slot has a negative impact on attendees, so stadiums have more potential
profit from later games. Rain also had a negative impact on attendance, though surprisingly not
as significant as an early time slot, so climate is not critical when picking a stadium. As
temperature has increases, attendance does the same, so warmer weather stadiums tend to fare
better for attendance. Televised games, games against ranked opponents, and games involving
winning record teams are also key factors in determining attendance for a given game or
stadium.

Recommendation
We recommend that ClubCorp investigates placing clubs into the Oklahoma, Nebraska, or
Pennsylvania stadiums. These three stadiums have the potential to be very profitable because of
their attendance and consistency. Since all these stadiums participate in tailgating and are part of
the power 5 conferences, they are consistent with our statistical analysis that concluded that
conference and tailgating are two factors that lead to the biggest increase in attendance. With this
we concluded that attendance is the highest contributor to profitability.

Conclusion
After analyzing data from over 6,300 college football games, we have concluded that tailgating
and if a school is in a Power 5 conference have the most significant impact on attendance, and
therefore on profitability. To guide ClubCorp with their decision, we utilized a regression model
that can predict the estimated attendance of a given stadium when considering various factors.
Our data visualizations provide an overview of top stadiums for attendance based on location,
conference, and wins/losses. Taking our findings into consideration, we recommend that
ClubCorp looks to open their next location in Oklahoma, Nebraska, or Pennsylvania.
Appendix
Table 1: Data Variables................................................................................................................. 5
Figure 1: Regression Analysis....................................................................................................... 6
Figure 2: Residual and Independent Variable Correlations ...........................................................6

4
Figure 3: Residual Plots ……………………………………………………………………………………………………………. 6

Table 1: Data Variables

Early(d) This is a dummy variable describing whether the game was in the early time
slot (before 12:00 PM). A 1 indicates that the game was during the early time
slot and a 0 indicates that it was in another time slot.

Rain(d) This is a dummy variable that indicates whether it rained during the game or
not. A 1 indicates that it did rain during the game and a 0 indicates that it did
not rain.
Maximum The maximum temperature recorded during the game measured in ℉.
Temperature
Tailgating(d) This is a dummy variable indicating whether there was tailgating at the game
or not. A 1 indicates there was tailgating and a 0 indicates there was not
tailgating.
Power 5(d) This is a dummy variable indicating whether the home team is in a Power 5
conference (SEC, Big-10, Big-12, ACC, or Pac-12) or not. A 1 indicates that the
home team was in a Power 5 conference and a 0 indicates that is in a non-
Power 5 conference.

Winning This is a dummy variable indicating whether the home team had a winning
Record(d) record (more wins than losses) or not. A 1 indicates that the home team has
a winning record and 0 indicates that the home team does not have a
winning record.

Not on TV(d) This is a dummy variable indicating if the game was not broadcasted on TV
or if it was. A 1 indicates that it was not broadcasted on any network and a 0
indicates that it was broadcasted on a TV network.

Ranked This is a dummy variable indicating whether the home team was playing a
Opponent(d) team ranked in the top 25 or not. A 1 indicates that the opponent was
ranked in the top 25 and a 0 indicates that the opponent was not ranked in
the top 25.

*return

5
Figure 2: Regression Analysis

*return

Figure 3: Residual and Independent Variable Correlations

*return

Figure 3: Residual Plots

Maximum Temperature Residual Plot


60000
40000
Residuals

20000
0
-20000 10 20 30 40 50 60 70 80 90 100 110
-40000
-60000
Maximum Temperature

*return
6

You might also like