Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Decoding NBA Home Court Advantage: Insights from Data Analysis

By Kyle Gragnola, Alex Pensotti, Callan Riggs, and Jason Lee

ArcGIS Storymap: https://storymaps.arcgis.com/stories/73c1e73c5745453fbd18d57ad91b6bf1

Introduction

In the world of professional sports, the concept of “home court advantage” stands out as a

subject of fascination and debate. It’s more than just the roar of the crowd or the familiarity of

the court; it’s a phenomenon that while hard to prove, undeniably influences game outcomes.

Our project aims to explore the mysteries surrounding home-court advantage. By examining

vast datasets spanning over many seasons, teams, and game statistics, we look to shed light on

the true extent and nature of this phenomenon. From win-loss ratios to player performance

metrics to fan attendance, our exploration seeks to uncover the underlying factors that

contribute to the perceived advantage of playing on one’s home court in the NBA.

The Data
Snapshot of the NBA Team Statistics dataset from our

Using www.nba.com and an API Client we collected over 50,000 rows of gamewise data, with 65

different metrics and 23 years of data. The data consisted of two point, three point, free throws

made and attempted, as well as various aggregate metrics that measure overall performance. We

relied heavily on the aggregate statistics mainly using PIE, which represents a teams overall

effect on game events. This is just one metric that measures a teams performance, others being

FG% and TS%, which are used to measure a team’s overall shooting performance and efficiency.

PIE: % of game events affected by a given team.

FG%: % of 2 point field goals made.

TS%: True shooting shows a team’s shooting efficiency.


Correlation table for NBA metrics
Deciding on which metrics to use was difficult, so we relied on Pandas correlation method to

determine which of the 60+ metrics to use. We found that PIE is the most highly correlated with

a team winning, and eFG%, TS%, Offensive and Defensive rating were not far behind. These

metrics were the foundation of our analysis and gave us an idea of what the teams overall

performance was that game, allowing us to start measuring home court advantage.
Another snapshot of the dataset
One common thought among NBA analysts and fans is that home court advantage is fueled by

fan attendance and court familiarity. To measure how the environment affects home court

advantage, we used average stadium attendance over the 23 year period to compare and contrast

the effects of fans on home court advantage. This data was gathered from ESPN, where we used

the requests package to collect a dataframe for each team’s average attendance.

The Design Process

Harvard’s Visual Design Sprint process

Mapping

The first step in our design process includes steps from mapping. Inspired by both Harvard’s

Visualization Design Sprint and the Five Design-Sheet Method developed by Think Up Themes

Ltd., which was modeled by Kieran Tan Kah Wang, we looked to expand on our ideas and

explore varying possibilities. Although we did not follow these frameworks explicitly, we used

them as guiding principles for structuring our ideation, sketching, prototyping, and testing
phases. By doing so, we were able to facilitate an environment where creative ideas flow and

innovative solutions emerge

In essence, mapping involves evaluating our data and asking important questions, such as

“What data types are we dealing with?” “Who is our target audience?” and “How might we use

the data in question to communicate a captivating story?”. By asking these questions and

critically examining the data, we can select the most appropriate visualizations that clearly

convey the story we are trying to tell.

● Data Types: Nominal (team_name, home/away), Quantitative (eFG%, FG%, PIE,

avg attendance)

● Guiding Question: How has home court advantage changed and factors affect it?

● Audience: NBA Fans and Analysts

● How might we use the data: Determine a team’s better or worse years and eras.

Determine the change of home court advantage over time.

Ideation/Sketching

As part of the sketching phase, each one of us took time to engage with the data individually and

allow for full creativity. Loosely inspired by Kieran Tan Kah Wang’s rendition of the Five

Design-Sheet design methodology, these individual moments to ourselves enabled us to sketch

out any ideas for data visualizations that were worth a story, based on the datasets we became

well-acquainted with through mapping.


Alex’s ideation/sketching
Kyle’s ideation/sketching

Callan’s ideation/sketching
Jason’s ideation/sketching

Deciding
A slide from Tamara Muzner’s that defines marks
Another slide from Munzner’s lecture, which defines channels

As mentioned in the data section, choosing metrics to measure home court advantage was

difficult. To determine the best metrics to use for overall performance we relied upon correlation

with winning or losing. Using seaborn we mapped the correlations to a heatmap and made our

decisions from there.

We all focused on measuring home court advantage through different game metrics, chart

marks, and encodings. We used a variety of line graphs to compare the teams performance and

home game advantage over time, allowing us to tell a compelling story of past home court

advantage by team and league overall. Presenting the data in a time series fashion also allowed

us to show the variance over time in win/loss rates, team performance, home court advantage,

and fan attendance.

Using a scatter plot we were able to visualize the relationship between the extent of a loss or win

to the overall performance of the team, then encoded home/away to color to visualize the

patterns. Scatter plots are powerful tools in showing underlying trends with color encodings,

according to our reading on color “Scatter plots are one of the few chart types where coloring by

underlying values without encoding them by, e.g., position, length, etc., can work surprisingly

well.” In this case the color encoding turned out to show the underlying values quite well.

Interactivity was an important part of this project and was a great tool for telling a story with

visualizations. Using dropdown menus and brush selection, we were able to add a level of

granularity that could help move and evolve the story behind home court advantage.
Unfortunately not all graphs worked out as intended, nevertheless the interactivity helped drive

the story.

User Testing

From our in-class demo day, we received feedback that highlights both strengths and areas for

improvement. Our peers appreciated the creative use of visualizations and consistent color

schemes, which made the data easy to understand and engaging. Specifically, the inclusion of

markers for significant events like COVID-19 and the utilization of scrolling features were

received well. However, there were suggestions to improve clarity in certain visualizations, such

as adding legends or clearer labels to enhance understanding, especially in charts like the victory

correlation table where abbreviations were unclear. Additionally, splitting up certain charts and

providing more context, such as including full names of statistics, was recommended to enhance

overall comprehension. Due to this feedback, in the next iterations of our charts, we made it

important to add labels and annotations for certain variables that might be misconstrued.

Despite these suggestions, most people praised our charts for their cohesive storytelling and

insightful exploration of the relationship between fan attendance, performance statistics, and

home-court advantage in the NBA.

In the user testing performed outside the classroom, we saw a common theme in the responses

regarding labels and annotations for specific abbreviations and statistics. In Kyle’s test his

participant felt that without adequate background information, it was hard to tell what was

being portrayed. It seemed that without someone to explain the data, our visualizations failed to

properly communicate our findings. However, feedback was generally positive from the rest of
tests conducted by Alex, Callan, and Jason, with most people commending our ability to make

interactive graphs. After collecting all the responses, our group came together to collectively edit

and refine our charts, making it a priority to add annotations for better clarity.

Final Design
Based upon the feedback generated from user testing, we were able to ascertain areas which

needed improvement. The only suggestions we received were focussed on clarity regarding the

65 metrics provided in the dataset. This had been a topic of discussion among our team as even

we encountered difficulties deciphering what some of the metrics were trying to measure during

the early stages of this project. To address these concerns, we adapted the existing victory

correlation map to include a tooltip for each metric.


The addition of the tooltip to the correlation map is advantageous as this visualization considers

only the relevant metrics from the dataset. This allows us to focus the viewer on the important

aspects of the data, reducing their overall cognitive load. Furthermore, it provides the definition

of each metric, hopefully lending the user some more insight into the data and addresses the

concerns noted in the user testing. A possible shortcoming of this addition is that viewers would

have to continuously refer back to the correlation map when viewing other visualizations in

order to identify the metrics of focus. While this addition doesn’t seem substantial, we believe it

will greatly assist future viewers greatly. Aside from the tooltip, our team struggled to identify

any other aspects of the visualizations that needed attention.

Takeaways

At the end of the day, our team was extremely satisfied with the final results. We were excited to

have synthesized a project from scratch that yielded genuinely fascinating results. To have

investigated a seemingly superstitious phenomenon like home-court advantage and decipher its

causes through real data has been exciting for us all. In addition to this, as a team we reflected

on the usefulness of COVID-19 in this data. Without the pandemic, it would have been harder to

demonstrate the relationship between fan attendance, performance statistics, and home-court

advantage in the NBA. Overall, we hope that others will enjoy these findings as much as we did.

You might also like