Professional Documents
Culture Documents
Info4602 Final Group Project
Info4602 Final Group Project
Introduction
In the world of professional sports, the concept of “home court advantage” stands out as a
subject of fascination and debate. It’s more than just the roar of the crowd or the familiarity of
the court; it’s a phenomenon that while hard to prove, undeniably influences game outcomes.
Our project aims to explore the mysteries surrounding home-court advantage. By examining
vast datasets spanning over many seasons, teams, and game statistics, we look to shed light on
the true extent and nature of this phenomenon. From win-loss ratios to player performance
metrics to fan attendance, our exploration seeks to uncover the underlying factors that
contribute to the perceived advantage of playing on one’s home court in the NBA.
The Data
Snapshot of the NBA Team Statistics dataset from our
Using www.nba.com and an API Client we collected over 50,000 rows of gamewise data, with 65
different metrics and 23 years of data. The data consisted of two point, three point, free throws
made and attempted, as well as various aggregate metrics that measure overall performance. We
relied heavily on the aggregate statistics mainly using PIE, which represents a teams overall
effect on game events. This is just one metric that measures a teams performance, others being
FG% and TS%, which are used to measure a team’s overall shooting performance and efficiency.
determine which of the 60+ metrics to use. We found that PIE is the most highly correlated with
a team winning, and eFG%, TS%, Offensive and Defensive rating were not far behind. These
metrics were the foundation of our analysis and gave us an idea of what the teams overall
performance was that game, allowing us to start measuring home court advantage.
Another snapshot of the dataset
One common thought among NBA analysts and fans is that home court advantage is fueled by
fan attendance and court familiarity. To measure how the environment affects home court
advantage, we used average stadium attendance over the 23 year period to compare and contrast
the effects of fans on home court advantage. This data was gathered from ESPN, where we used
the requests package to collect a dataframe for each team’s average attendance.
Mapping
The first step in our design process includes steps from mapping. Inspired by both Harvard’s
Visualization Design Sprint and the Five Design-Sheet Method developed by Think Up Themes
Ltd., which was modeled by Kieran Tan Kah Wang, we looked to expand on our ideas and
explore varying possibilities. Although we did not follow these frameworks explicitly, we used
them as guiding principles for structuring our ideation, sketching, prototyping, and testing
phases. By doing so, we were able to facilitate an environment where creative ideas flow and
In essence, mapping involves evaluating our data and asking important questions, such as
“What data types are we dealing with?” “Who is our target audience?” and “How might we use
the data in question to communicate a captivating story?”. By asking these questions and
critically examining the data, we can select the most appropriate visualizations that clearly
avg attendance)
● Guiding Question: How has home court advantage changed and factors affect it?
● How might we use the data: Determine a team’s better or worse years and eras.
Ideation/Sketching
As part of the sketching phase, each one of us took time to engage with the data individually and
allow for full creativity. Loosely inspired by Kieran Tan Kah Wang’s rendition of the Five
out any ideas for data visualizations that were worth a story, based on the datasets we became
Callan’s ideation/sketching
Jason’s ideation/sketching
Deciding
A slide from Tamara Muzner’s that defines marks
Another slide from Munzner’s lecture, which defines channels
As mentioned in the data section, choosing metrics to measure home court advantage was
difficult. To determine the best metrics to use for overall performance we relied upon correlation
with winning or losing. Using seaborn we mapped the correlations to a heatmap and made our
We all focused on measuring home court advantage through different game metrics, chart
marks, and encodings. We used a variety of line graphs to compare the teams performance and
home game advantage over time, allowing us to tell a compelling story of past home court
advantage by team and league overall. Presenting the data in a time series fashion also allowed
us to show the variance over time in win/loss rates, team performance, home court advantage,
Using a scatter plot we were able to visualize the relationship between the extent of a loss or win
to the overall performance of the team, then encoded home/away to color to visualize the
patterns. Scatter plots are powerful tools in showing underlying trends with color encodings,
according to our reading on color “Scatter plots are one of the few chart types where coloring by
underlying values without encoding them by, e.g., position, length, etc., can work surprisingly
well.” In this case the color encoding turned out to show the underlying values quite well.
Interactivity was an important part of this project and was a great tool for telling a story with
visualizations. Using dropdown menus and brush selection, we were able to add a level of
granularity that could help move and evolve the story behind home court advantage.
Unfortunately not all graphs worked out as intended, nevertheless the interactivity helped drive
the story.
User Testing
From our in-class demo day, we received feedback that highlights both strengths and areas for
improvement. Our peers appreciated the creative use of visualizations and consistent color
schemes, which made the data easy to understand and engaging. Specifically, the inclusion of
markers for significant events like COVID-19 and the utilization of scrolling features were
received well. However, there were suggestions to improve clarity in certain visualizations, such
as adding legends or clearer labels to enhance understanding, especially in charts like the victory
correlation table where abbreviations were unclear. Additionally, splitting up certain charts and
providing more context, such as including full names of statistics, was recommended to enhance
overall comprehension. Due to this feedback, in the next iterations of our charts, we made it
important to add labels and annotations for certain variables that might be misconstrued.
Despite these suggestions, most people praised our charts for their cohesive storytelling and
insightful exploration of the relationship between fan attendance, performance statistics, and
In the user testing performed outside the classroom, we saw a common theme in the responses
regarding labels and annotations for specific abbreviations and statistics. In Kyle’s test his
participant felt that without adequate background information, it was hard to tell what was
being portrayed. It seemed that without someone to explain the data, our visualizations failed to
properly communicate our findings. However, feedback was generally positive from the rest of
tests conducted by Alex, Callan, and Jason, with most people commending our ability to make
interactive graphs. After collecting all the responses, our group came together to collectively edit
and refine our charts, making it a priority to add annotations for better clarity.
Final Design
Based upon the feedback generated from user testing, we were able to ascertain areas which
needed improvement. The only suggestions we received were focussed on clarity regarding the
65 metrics provided in the dataset. This had been a topic of discussion among our team as even
we encountered difficulties deciphering what some of the metrics were trying to measure during
the early stages of this project. To address these concerns, we adapted the existing victory
only the relevant metrics from the dataset. This allows us to focus the viewer on the important
aspects of the data, reducing their overall cognitive load. Furthermore, it provides the definition
of each metric, hopefully lending the user some more insight into the data and addresses the
concerns noted in the user testing. A possible shortcoming of this addition is that viewers would
have to continuously refer back to the correlation map when viewing other visualizations in
order to identify the metrics of focus. While this addition doesn’t seem substantial, we believe it
will greatly assist future viewers greatly. Aside from the tooltip, our team struggled to identify
Takeaways
At the end of the day, our team was extremely satisfied with the final results. We were excited to
have synthesized a project from scratch that yielded genuinely fascinating results. To have
investigated a seemingly superstitious phenomenon like home-court advantage and decipher its
causes through real data has been exciting for us all. In addition to this, as a team we reflected
on the usefulness of COVID-19 in this data. Without the pandemic, it would have been harder to
demonstrate the relationship between fan attendance, performance statistics, and home-court
advantage in the NBA. Overall, we hope that others will enjoy these findings as much as we did.