Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332406802

A systematic review of sports analytics

Conference Paper · March 2019

CITATIONS READS
0 72

2 authors, including:

Rakshit Bhatnagar
Lal Bahadur Shastri Institute of Management
2 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Twitter Sentiment Analysis View project

All content following this page was uploaded by Rakshit Bhatnagar on 14 April 2019.

The user has requested enhancement of the downloaded file.


“A SYSTEMATIC REVIEW OF SPORTS ANALYTICS”

A TERM PAPER SUBMITTED IN PARTIAL FULFILLMENT


FOR THE REQUIREMENT OF THE
TWO YEAR (FULL-TIME)
POST-GRADUATE DIPLOMA IN MANAGEMENT (Research & Business
Analytics)
(2018 – 20)

BY
Mridul Babbar (416/ 2018)
Rakshit (419/ 2018)

UNDER THE GUIDANCE OF


Dr. Shivani Bali

LAL BAHADUR SHASTRI INSTITUTE OF MANAGEMENT, DELHI


March, 2019
A systematic review of sports analytics

ABSTRACT

The interest from both academics and practitioners in the application of big data analytics (BDA)
in sports has been rapidly growing which has resulted in the increasing need to research, develop
and explore new techniques. This review proposed a framework that provides a full picture of
current literature on where and how Analytics has been applied within the sports industry. We will
try to present an overview of major analytical tools and technologies creating value in the sports
industry, with the major focus on soccer, basketball, and cricket. Unlike other sports, cricket is yet
to see more researches from a perspective of sports analytics.
The researches reviewed in this paper employs various techniques to measure and improve player’s
performance such as developing a player’s ranking model, visual analysis to identify movement
patterns and study numerical metrics during important events of the game.

The present paper discusses how big data and modern Business Intelligence tools & technologies
may help to address the big data issues and aid in developing a theoretical model for tactical
decision making in team sports. Use of Business Intelligence and Analytics (BI&A) in competitive
sports is still emerging, we identify a set of avenues for future research that will stimulate further
development in sports analytics.

Keywords: Sports, Analytics, Business Intelligence, Big data, Predictive Analytics

INTRODUCTION
Analytics in sports cannot be discussed without a reference to “Moneyball” and although
“Moneyball” got the ball rolling, sports analytics has come a long way since. The velocity and
volume of data generated along with advanced techniques have made sports analytics one of the
most dynamic fields to work upon. Sports forms a part of many people’s lives, especially in today’s
times, where everyone is becoming more aware and concerned about their health and lifestyle.
Watching and keeping track of professional sports is a major activity shared by both young and
adult individuals and people all around the globe watch sports on a television basis, many on a
daily basis. Global events such as the Olympic Games, the World Cup in soccer are among the
most popular sports events worldwide.

Huge amounts of money are also involved in the sports industry from ticket pricing to broadcasting
to advertising and also player’s salaries and to ensure all this money gets utilized effectively.
Nate Silver [1], in his book on the promises and pitfalls of big data, elaborated on the possibilities
of performance assessment and sports scouting that have been unleashed in the big data era. He
succeeded in determining causality and aggregating extensively large datasets on major league
baseball players’ performance. Silver speculated that baseball may offer the world’s richest
dataset, where just about everything that has happened on a major league field in the past 140 years
has been accurately recorded and is now available for analysis. Sports provides a unique area for
exploring research ideas. Due to the global outreach of sports, analysis can be done, either to
compare different nations or compare the different techniques employed by them. How players
behave during games can also provide us insights about certain aspects of human behavior.

SPORTS ANALYTICS

Sports has evolved from just being a sport to the involvement of science in it. Sports analytics has
been a mix of a collection of data, predicting the game and visualizing the game strategy with the
help of tools and techniques to enhance the performance of a player individually and for the team.
Sports analysis is expected to foster many new applications for end users, sports coaches, and
sports managers alike [1]. Analytical goals in these applications include a comparison of players’
and team’s performance, prediction and correlation of behavior with different attributes. A
structured data may be either quantitative or qualitative and are typically collected from
biographical data, performance and medical reports of the athletes and scouting reports. The data
collected then needs to be standardized, centralized, integrated and analyzed. This process will
provide a reliable and systematic data enabling athletes, coaches, and policymakers to improve
their decisions.
Real-time systems are used for finding key analysis points. Capturing the position of ball and
movement of players throughout the game and combining it with advanced statistical algorithms
and software would enable coaches, managers to alter their tactics to gain an upper hand on the
competitor.
Baseball is often claimed to be the sport where notable visionaries have concepts to enhance on-
field tactics and player selection. Percentage Baseball [3], Earnshaw Cook’s book, published in
1964 talks about sabermetrics. Another example is the acclaimed movie Moneyball, in which the
use of sabermetrics and advanced statistics by Oakland Athletics’ general manager Billy Beane
and his assistant drove strategic decisions about player selection and game tactics. The movie
provided some factual evidence about the overall effectiveness of analytics and advanced statistics
in professional baseball. Multiple sports organizations and athletes have Business Intelligence &
Analytical tools to look up to create value from their usage.

SPORTS ANALYTICS MARKET

The Sports Analytics Market was valued at USD 0.38 billion in 2017 and is expected to reach USD
2.09 billion by 2022, registering a CAGR of 30.13% over the forecast period of 2018-2023.
America has the highest implementation of sports analytics, being early adopters of analytics
resulted in higher revenues associated with the sports industry in the region. Teams that value more
and has huge investment drives more towards Sports analytics. Football expected to be one of the
Prominent Users of Sports Analytics Solutions
Global Sports Analytics Market Segmentation
By Type
• Services
• Solutions
By Deployment
• Cloud
• On-premise
By Application
• Player Analysis
• Team Performance Analysis
• Health Assessment
• Video Analysis
• Others
By Region
• North America
• Europe
• Asia-Pacific
• Rest of the World

FOOTBALL

Soccer is one of the most popular sports today and also very interesting from a scientific point of
view. Analytics has been successfully applied in sports like baseball and basketball and in recent
times, football has also seen considerable researches being done. Statistics such as goals, shots,
and assists are still the most common way to compare player performances. We often see pundits
giving players ratings, but these ratings may suffer from perception and judgment errors. Hence,
to prevent this, researchers have been developing player ranking models and visually analyzing
soccer data. More and more work is emerging [15] that leverages the rich datasets available to
make discoveries about soccer.
Soccer is a very popular sport and the arising data is gaining interest from researchers to conduct
their analysis to provide better decision-making capabilities to the players and coaches. So far,
varying methodologies have been adopted by the researchers to conduct their analysis. Some of
these researches look at the player’s performance attributes while others focus more on the spatial
and temporal part of the game. Some have focused on building probabilistic models to simulate
game actions and predict the outcomes of matches or goals scored. We dive deeper into these
researches and review how their analysis will help make better decisions both on and off the pitch.

One of such research has been conducted by Sacha et al (2014) [4] where they perform feature
driven visual analytics of soccer data. Data collected from a soccer game is visually analyzed at
three levels: single-player, multi-player and event. Events and phases were detected semi-
automatically by integrating statistical features. Analysis of a certain game situation can be done
at different levels, such as taking only one single player into account or by considering several
players. Recording a player’s trajectory during a game can give us loads of insights on how he/she
performed with respect to the team and the strategy. Examples of soccer analysis include [5], where
player formations are analyzed. Specifically, the spatial constellation between all defenders of one
team is analyzed over time, which can reveal tactical maneuvers. In [6], distances between player,
puck, and goal within hockey games were used as features of analysis. Time series analysis
approaches have also been widely adopted by past researches such as in [10], it was evaluated
which statistic measures correlate with the outcome of a game. The temporal development of
geometric statistics, like the convex hull, circumference, or center of a team were analyzed in [13].

However, Sacha et al (2014) [4] have presented an interactive system for explorative analysis of
soccer data. They worked upon a layer-based soccer-pitch visualization, with several visualization
techniques available (e.g., player position renderer and heat map). The system developed is not
limited to detecting a certain number of pre-configured situations as a user-configurable classifier
has been incorporated which detects events based on a number of example events and input
features. Techniques such as interactive and automatic data filtering, a visual representation of
trajectories on a soccer field, and compact time series visualization using horizon graphs have been
combined to answer different analytical questions in the context of soccer data.
Single player analysis has been performed by classifying features into three relevant categories:
Individual Characteristics (e.g., coordinates and speed), Game Context (e.g., distance to the ball),
and Events (e.g., shots, receptions and fouls) features. These features are further clustered into
similar phases which are detected by applying k-means and DBSCAN algorithms on WEKA [12].

Figure 1: Workflow analysis of a single player; image from Sacha et al. (2014)
The combination of phases together with the possibility to inspect selected features visually can
reveal interesting patterns. These patterns become clearer when comparing several players.
Comparing a team’s players may tell us about the formation and strategy being followed.
Formations tell us more about tactics than single player analysis. In [4], analysis of the defensive
lines is done with more focus on the back-four formation. Based on where the attacks happen (side
or middle), the computational assessment is performed by scoring the defensive triangle by
computing the angles and distances between the involved players.
Multiple events take place in these attacks, like fouls, crosses, shot attempts etc. These events have
been manually annotated by the authors. Features which may involve only a single player or the
whole team develop right before these events and are analyzed using Decision Trees as a classifier.
This aids the researchers in training a classifier able to differentiate between events and study
features important for this differentiation. Table 1 tells us about the features implemented and
available in the system.

Table 1: Features implemented in the system

Now let’s look at another research in soccer analytics focusing entirely on the value of passes
completed. A novel method was adapted to understand the relationship between pass location and
shot opportunities (Brooks, J., Kerr, M., & Guttag, J.,2016, August) [13]. Opta [16] extends the
idea of assists to include all passes that lead to shots (whether or not they lead to a goal) in their
“key passes" metric, but both this metric and assists are only applicable to passes immediately
preceding a shot. Similar researches have been done previously by Reep and Benjamin where
models were developed for the success rates of differing passing length sequences. Probabilistic
models involving possession rates and historical statistics have been developed to predict the
outcome of a match [7][10].
Gyarmati et al. (2014) leveraged ball-event data and passing sequences to cluster the playing styles
of different teams [15]. Lucey et al. (2012) used ball data to determine the location of the ball
throughout a game. They then constructed “entropy-maps" to visualize how different teams move
the ball during a game [17]. In another work, the authors combine match statistics, event data and
player tracking data to identify the teams playing a given game with at least 70% accuracy [18].

Brooks et al. (2016) [13] gathered their data from the 2012-13 La Liga (premier league in Spain)
season. The soccer field is discretized into 18 zones and the game is segmented at the level of
“possessions”. Possession contains a sequence of passes between players of the same team. Each
pass was converted into a feature vector which was labeled with either 1 or -1. Possessions that
ended in a shot taken by the offensive team were assigned a label of 1, and all others were assigned
a label of -1.

Figure 2: Playing area split into 18 zones. 1-3 being defensive side, and 16-18 offensive side; image taken from
Brooks et al. (2016)
This model is then used to study the relationship between the features and the shots taken by
looking at the importance of each zone. It was found that 6 of the top 10 features are closely
associated with features involving zone 14 which is centered in front of the penalty box. To rank
the players according to their tendency to complete passes that lead to a shot, Average Pass Shot
Value (APSV) was commuted. Players were segmented by their roles i.e. offense, midfield, and
defense. The model ranked some of the elite attacking players at the top and the results also
correlate with the standard offense metrics such as goals and assists.

We have looked at two approaches to conduct the analysis of the game and the players. Next, we
discuss the research done by Ruiz, H., Power, P., Wei, X., & Lucey, P (2017, August) [19]. They
have analyzed the 2015-16 season of English Premier League with emphasis on the champions
Leicester City and further stated how the analyzed features can be used to predict future
performances and outcomes.
The EPL season of 2015-16 was quite unique and astonishing to watch. A team like Leicester City
which had miraculously avoided relegation in the previous season i.e. 2014-15, performed yet
another miracle by winning the Premier League. To everyone’s disbelief, Leicester beat one of the
top sides in Europe. Ruiz et al. (2017) [19] analyze and compare Leicester’s performance with
other teams using a set of tools such as Expected goal value, expected save value, strategy-plots
and passing quality. Analyzing the data, researchers attributed defensive effectiveness to be a
major contributor in winning the title. This was proved by the fact that Leicester City faced more
shots but conceded significantly fewer goals. This tells us that Leicester’s players went above and
beyond to negate the scoring chances of their opponents. Kasper Schmeichel, the goalkeeper, was
the second most effective keeper in the league.

The researchers also studied the effectiveness of a strategy adopted through strategy plots. In figure
3, the size of the square corresponds to the number of shots and intensity of the color is in relation
with the effectiveness with respect to goals for (offense) or goals against (defense).
Figure 3: Strategy plots; image taken from Ruiz et al. (2017)

Such strategy plots were plotted for each team for both offense and defense. This enabled the
researchers to analyze the playing style of different teams.
Using machine learning, descriptive power of strategy plots was harnessed to create a model
capable to predict future outcomes. A recommender system, shown in Figure 4, was built which
predicted the expected shot production by each shot type.

Figure 4: Output from the recommender system. Blue bars show the pre-match model expectations and green values
represent the actual values; image taken from Ruiz et al. (2017)

The tools and techniques covered in [19] provide a novel way of analyzing the soccer data.
BASKETBALL

Court Vision

A score has just been a conventional way to compare different players and their games. There must
be a way to level up the way to compare two teams and their players respectively, that’s where
visual Analytics has come into the sports commonly known as sports Analytics. Basketball is a
spatial sport and Court vision is a new ensemble of spatial and visual analytics designed to provide
on-court precision and clarity. Points/Scores will help to compare which team has scored well and
which player has contributed the most in the game, but court vision will be able to provide answers
to the rising questions like: Where are players’ most common shot locations and how successful
are they at these locations? Which point guard has the highest points per attempt at the top of the
key? Which court locations are most or least effectively defended by the Orlando Magic?

Researcher Kirk Goldsberry [22] figured out that the most commonly used method to evaluate an
NBA game was missing the spatial analysis i.e. Field goal percentage (FG%). In his research, he
has discussed Spatial analytics and the complexity of spatial performance. They tried to analyze
the game of the team and how they have using the space in the court to shoot. Defending the
majorly used area to shoot by a team will give the advantage to the team. The question arises how
they have identified and analyzed the court space. He has divided the court space into three regions
“the paint”, “the wing” and “perimeter”.
Figure 5: Left: density of all field goal attempts. Right: League-wide tendencies in shot-attempts and points per
attempt

Kirk tried to find out who is the best shooter using Visual Analytics with the help of spatial
analytics. They tried this method as the conventional method fails to identify things like who has
the best shooting range as compared to other NBA players. By analyzing the different shooting
range researcher was able to comprehend that which player has the highest range values and was
also able to identify which player performs better shoots in which space of the court. Special
metrics were derived to gauge the shooting performances. First metric, “spread”, tells us about the
overall size of a player’s shooting territory. Spread reveals the player’s shooting areas but inform
little about the accuracy or potency. The second metric, “range”, counts the number of unique
shooting cells in which a player leverages at least 1 point per attempt.

Insofar we could conclude that indeed basketball is a spatial sport. The structure and dynamics of
court space directly have an impact on every second of a basketball game. Although the case study
focuses on shooting performance, there are several other potential applications of spatial and visual
analytics that can enhance basketball comprehension. Court Vision also aims to quantify and
visualize other aspects of basketball. For example, the ability of a team to effectively defend court
space influences the outcome of every NBA possession.
BKViz: A Basketball Visual Analysis Tool

Antonio G. Losada et al. (2016) [21] agrees that most of the analysis methods rely on the spatial
and numerical metrics that depicts the trends. Earlier methods used the data of the players game-
by-game to analyze the gameplay and visualize it to understand the space, however, this method
requires to store a lot of data and the efforts required to understand the data is beyond human
capabilities. Therefore, researchers believe there should be a new and better method which will
help to analyze a large amount of data on a deeper level.
They have developed BKViz which will analyze individual games to reveal how players perform
together and individually which was way different than the older method of collecting past data to
analyze though Antonio G. Losada et al (2016) never suggested that it was a wrong method to
analyze. The main purpose of the software was to give valuable insights to the players and the
coach with the approach of the analyst’s point of view. Simple interactions with different views
enable fast comparisons among players and at-a-glance explorations of the entire game.

Any given game could generate up to 1,500 individual values or data strings, which presented
textually demand a massive cognitive effort on the part of the analyst. A large amount of data not
being utilized at its par escalated the urge to develop the prototype of BKViz.
Figure 6: Game’s progress based on the point difference between the two teams; image source Losada et al. (2016)

Purpose of the prototype was to bring useful insight into a large number of data in an easier
approach which is easily understandable by the players, coach, and people associated with the
game.
CRICKET

Cricket, unlike basketball, baseball and football, has seen considerably fewer researches from a
sports analytics perspective. Sankaranarayanan, V. V., Sattar, J., & Lakshmanan, L. V. (2014,
April) [20] has built a model for ODI matches to predict the number of runs scored and have taken
into account number of features that affect the prediction model. They start by segmenting the 50-
over window into 20 segments of 5 overs each. Sankaranarayanan et al. have then studied different
features and classified them into two categories: historical features and instantaneous features. A
total of six historical features have been considered: (1) Average runs scored (by the team) in an
innings; (2) Average number of wickets lost in an innings; (3) Frequency of being all-out;3 (4)
Average runs conceded in an innings; (5) Average number of opponent wickets taken in an innings;
(6) Frequency of getting opposition all-out.
During the game, parameters are continuously changing, therefore, instantaneous features have
been considered. The following features were put in this category: (1) Home or Away factor; (2)
Powerplay; (3) Target Score; (4) Batsmen performance features; (5) Game snapshot
The prediction model employed separately predicts the number of boundaries and number of runs
taken by running between the wickets. Different machine learning techniques were used to train
the model. Ridge Regression and attribute bagging algorithms are used on the features to
incrementally predict the runs scored in the innings. The authors found that the developed model
was accurate up to 68-70%. They also went on to claim the highest winner prediction accuracy in
ODI cricket mining literature.

CONCLUSION

In this paper, we have looked upon multiple types of researches done in the three most popular
sports that are football, cricket, and basketball. We saw how conventional attributes of different
sports have been evolving and aiding into analytics and how this data has can be transformed to
predict the future as well. We came across multiple applications in sports analytics which we have
reviewed. We saw what all different machine learning techniques have been used by the authors
to complete their researches. Visualizations were able to help teams, coaches and people associated
with it get deeper into the study of sports.
Looking into the future, exciting times await for sports performance analysis. The era of big data
has just started and as we move towards automation, more and more data will be getting generated.
Hence, future collaborations between sports researchers and scientists would see novel ways of
applying machine learning techniques to analyze the sports data to enhance the decision-making
process in minimal time.

REFERENCES

[1] Silver, N. (2012). The signal and the noise: the art and science of prediction. Penguin UK.
[2] R. Basole, E. Clarkson, A. Cox, C. Healey, J. Stasko, and C. S. (Organizers), First IEEE
vis workshop on sports data visualization, Oct. 14, 2013.
[3] Cook, E. (1964). Percentage baseball. Waverly Press.
[4] Sacha, D., Stein, M., Schreck, T., Keim, D. A., & Deussen, O. (2014, October). Feature-
driven visual analytics of soccer data. In 2014 IEEE conference on visual analytics science
and technology (VAST) (pp. 13-22). IEEE.
[5] Kim, H. C., Kwon, O., & Li, K. J. (2011, November). Spatial and spatiotemporal analysis
of soccer. In Proceedings of the 19th ACM SIGSPATIAL international conference on
advances in geographic information systems (pp. 385-388). ACM.
[6] Fujimura, A., & Sugihara, K. (2005). Geometric analysis and quantitative evaluation of
sport teamwork. Systems and Computers in Japan, 36(6), 49-58.
[7] Collet, C. (2013). The possession game: A comparative analysis of ball retention and team
success in European and international football, 2007–2010. Journal of sports
sciences, 31(2), 123-136.
[8] Reep, C., & Benjamin, B. (1968). Skill and chance in association football. Journal of the
Royal Statistical Society. Series A (General), 131(4), 581-585.
[9] Lago-Peñas, C., Lago-Ballesteros, J., Dellal, A., & Gómez, M. (2010). Game-related
statistics that discriminated winning, drawing and losing teams from the Spanish soccer
league. Journal of sports science & medicine, 9(2), 288.
[10] Constantinou, A. C., Fenton, N. E., & Neil, M. (2012). pi-football: A Bayesian network
model for forecasting Association Football match outcomes. Knowledge-Based
Systems, 36, 322-339.
[11] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009).
The WEKA data mining software: an update. ACM SIGKDD explorations
newsletter, 11(1), 10-18.
[12] Duarte, R., Araújo, D., Folgado, H., Esteves, P., Marques, P., & Davids, K. (2013).
Capturing complex, non-linear team behaviors during competitive football
performance. Journal of Systems Science and Complexity, 26(1), 62-72.
[13] Brooks, J., Kerr, M., & Guttag, J. (2016, August). Developing a data-driven player
ranking in soccer using predictive model weights. In Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 49-
55). ACM.
[14] Soccer Analytics | Presented by Prozone | MIT Sloan Sports Analytics Conference.
http://www.sloansportsconference.com/?p=9740/
[15] Gyarmati, L., Kwak, H., & Rodriguez, P. (2014). Searching for a unique style in
soccer. arXiv preprint arXiv:1409.0308.
[16] OptaPro. http://www.optasportspro.com/.
[17] Lucey, P., Bialkowski, A., Carr, P., Foote, E., & Matthews, I. A. (2012, July).
Characterizing Multi-Agent Team Behavior from Partial Team Tracings: Evidence from
the English Premier League. In AAAI.
[18] Bialkowski, A., Lucey, P., Carr, P., Yue, Y., Sridharan, S., & Matthews, I. (2014,
December). Identifying team style in soccer using formations learned from spatiotemporal
tracking data. In 2014 IEEE International Conference on Data Mining Workshop (pp. 9-
14). IEEE.
[19] Ruiz, H., Power, P., Wei, X., & Lucey, P. (2017, August). The Leicester city fairytale?:
Utilizing new soccer analytics tools to compare performance in the 15/16 & 16/17 EPL
seasons. In Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (pp. 1991-2000). ACM.
[20] Sankaranarayanan, V. V., Sattar, J., & Lakshmanan, L. V. (2014, April). Auto-Play: A
data mining approach to ODI Cricket simulation and prediction. In Proceedings of the
2014 SIAM International Conference on Data Mining (pp. 1064-1072). Society for
Industrial and Applied Mathematics.
[21] Losada, A. G., Theron, R., & Benito, A. (2016). Bkviz: A basketball visual analysis
tool. IEEE computer graphics and applications, 36(6), 58-68.
[22] Goldsberry, K. (2012, March). Courtvision: New visual and spatial analytics for the NBA.
In 2012 MIT Sloan sports analytics conference (Vol. 9, pp. 12-15).

View publication stats

You might also like