Professional Documents
Culture Documents
27 - Deep Learning-Based Automatic Player Identification and Logging in American
27 - Deep Learning-Based Automatic Player Identification and Logging in American
Football Videos
Abstract
In comparison with most sport analysis tasks, players par- summarize our contributions as:
ticipating in football games are usually crowded together
1. We propose, to our best knowledge, the first deep
and are prone to occlusion. The size of players also varies
learning-based football analysis system for jersey
significantly depending on their positions relative to the
number identification and logging.
camera. The small size of objects increases the difficulty of
identifying players. In addition, the instantaneous and rapid 2. We devise a two-stage neural network to address
movement of the players can blur the image and distorts player detection and jersey number recognition, re-
their shape. Different poses during falling and stumbling spectively. The two-stage design highlights the player-
can further make detection more challenging. A motion- related region for precise jersey number recognition.
based background subtraction method performs weakly on The first stage, a transformer design, addresses the is-
blurred and occluded objects. Classical object detection sue of identifying a player in a crowded setting. The
algorithms give unsatisfactory results when dealing with second stage further addresses the issue of data imbal-
crowded settings, because overlapping objects can lead to ance in jersey number identification.
false negative predictions. Since useful information is ig-
nored, sports analysis systems bring incomplete results that 3. We build a database for play indexing. The system
may mislead viewers. automatically extracts game clock information along
with identified jersey number, making it possible to
In this work, we developed an innovative football analy- generate a game log for player indexing.
sis system that automatically identifies players on a per- 4. We conduct experiments on American football videos
play basis and generates logs for play indexing. The pro- and demonstrate that our proposed system outper-
posed system operates directly on the video footage and si- forms Faster R-CNN by 37.7% in player detection and
multaneously extracts player information and game clock achieves 91.7% mean average precision in jersey num-
information on a frame-by-frame basis. Fig.1 shows the ber recognition.
typical output of proposed system. We utilize a detection
transformer-based network to detect players in a crowded
2. Related Work
setting and employ a focal loss-based network as a digit
recognition stage to index each player given an imbalanced Recent advances in image processing techniques have
dataset. We then capture game clock data per play. The opened the door for many interesting and effective solutions
football analysis system outputs a database containing an to this problem. Player identification can be classified into
index of each play in the game, as well as a list of quarters, two main categories: Optical Character Recognition (OCR)
game clock start/end times, and participating players. We based and Convolution Neural Networks (CNN) based ap-
proaches [20].
Figure 8. (a) Digit counts grouped by jersey color. (b) Confusion Matrix of digit recognition result, with IoU =0.55 and Confidence=0.97.
Fig. 7 (a-c) are original frames from a sparse scenario, mod- R-CNN with an IoU of 0.5 (the 3rd row) is able to detect
erate scenario, and crowded scenario, respectively. On the some of the players, however, the detection could be only
second row, faster R-CNN fails to detect most players with part of the players and therefore hurt the next digit recogni-
an IoU of 0.75. In sparse and moderate scenarios, faster tion subsystem. The DETR in the last row outputs a desired
Figure 9. Part of the output game log, including the player tracking information, play number, quarter, start/end time, home/away team.
result with the majority of players detected with high con- the jump in the play clock.
fidence score. In crowded setting, Faster R-CNN fails to
detect relatively smaller object and especially misses a lot
4.3.4 Database Subsystem
of players in the crowded region. As the crowded scene is
very common and is the focus of our work, we favor the re- We present a screenshot of data format in the integrated
sult that the DETR delivered with almost every player fully database in Fig. 9. Each play is related with game informa-
detected. tion, such as quarter number, start and end time, home and
away team, and participating players of the home team. It
4.3.2 Jersey Number Recognition Subsystem is worth noting that the duplicated numbers are for offense
and defense. However, it doesn’t happen in the NFL be-
The final break-down and distribution of the jersey num- cause duplicated numbers are not allowed. A viewer could
ber dataset is shown in Fig.8a. The horizontal axis shows retrieve information and perform analysis either by play in-
the individual digits, and the vertical axis shows the num- dex or the jersey number of players. Integration of player
ber of instances. All columns are colored to show the split information and clock information can alleviate the impact
between home and away jerseys. The digit 2 with the most of players who may be partially occluded and invisible in
occurrences is twice as frequent as the lowest digit 6, in- some frames.
dicating a data imbalance issue generally existing in Jersey
number dataset, even for a single-digit dataset. However,
such imbalance issue is not dominant and we achieve a high
5. Conclusion
mAP of 91.7% in evaluation. In this work, we present a football game analysis sys-
In Fig.8b, we show a confusion matrix which is normalized tem based on deep learning. The proposed system en-
to the most popular class, digit 2. Digits 2,4,7,9 achieve ables player identification and time tracking. We propose
higher accuracy compared to digits 6 and 8, which corre- a two-stage network design to first highlight player region
sponds to the distribution of classes shown in Fig.8a. Even and then identify jersey number in zoomed in sub-regions.
though the utilization of focal loss alleviates the data im- The experimental results demonstrate robustness in han-
balance issue to some extent, we notice that most of the in- dling crowded scenarios and the capability to alleviate data
correct classifications come from digits that were partially imbalance issues. The proposed end-to-end system delivers
obscured or located on a player who was angled substan- complete tracking information using a single pass of foot-
tially with respect to the camera. ball game footage. Our player detection and jersey num-
ber recognition subsystems can be directly generalized to
other football game footage. The qualitative and quanti-
4.3.3 Game Clock Subsystem tative analysis have been thoroughly performed over three
Evaluation results are shown in Table 2. OCR performs well subsystems separately and over the entire system. The sys-
on recognizing the clock information with an accuracy of tem achieves accurate and efficient performance on separate
95.6%. Play window detection achieves 100% accuracy on tasks. The experimental result illustrates the reliability on
quarter transition, and an accuracy of 62% for start/end time handling the challenges of football game analysis.
designation.
For the future work, we plan to enhance the player tracking
In full game footage, the reset of the play clock can be used by integrating information through consecutive frames and
to track the end and subsequent start of each play. Although replenishing the players whose jersey numbers are not vis-
there are no standards for the end and start of a play in ac- ible in some frames due to movement. Very small objects
celerated footage, it is possible to start a new play based on could be detectable by using ultra-high resolution camera or
in combination with a super resolution method. Ball track- [12] Joffrey L Leevy, Taghi M Khoshgoftaar, Richard A Bauder,
ing could be fused to the current design and serves as com- and Naeem Seliya. A survey on addressing high-class im-
plimentary information towards game clock tracking, there- balance in big data. Journal of Big Data, 5(1):1–30, 2018.
fore helping with the play divide. The user interface that 5
outputs the database will be further improved by develop- [13] Gen Li, Shikun Xu, Xiang Liu, Lei Li, and Changhu
ing a more robust graphic user interface. Wang. Jersey number recognition with semi-supervised spa-
tial transformer network. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition Work-
Acknowledgement(s) shops, pages 1783–1790, 2018. 3
The authors would like to thank Nguyen Hung [14] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
Nguyen, Jeff Reidy, and Alex Ramey for their pre- Bharath Hariharan, and Serge Belongie. Feature pyra-
liminary study, data annotation, and supportive experi- mid networks for object detection. In Proceedings of the
ment. IEEE conference on computer vision and pattern recogni-
tion, pages 2117–2125, 2017. 4
[15] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He,
References and Piotr Dollár. Focal loss for dense object detection.
[1] Super bowl ratings history (1967-present), Feb 2022. 1 CoRR, abs/1708.02002, 2017. 4
[2] Zubaer Ahammed. Basketball player identification by jersey [16] Hengyue Liu and Bir Bhanu. Pose-guided r-cnn for jer-
and number recognition. PhD thesis, Brac University, 2018. sey number recognition in sports. In Proceedings of the
3 IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops, pages 0–0, 2019. 3
[3] Adrià Arbués-Sangüesa, Coloma Ballester, and Gloria Haro.
Single-camera basketball tracker through pose and semantic [17] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian
feature fusion. arXiv preprint arXiv:1906.02042, 2019. 3 Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C
Berg. Ssd: Single shot multibox detector. In European con-
[4] Matija Burić, Miran Pobar, and Marina Ivašić-Kos. Ob-
ference on computer vision, pages 21–37. Springer, 2016. 3
ject detection in sports videos. In 2018 41st International
[18] Wei Liu, Lin Chen, and Yajun Chen. Age classification us-
Convention on Information and Communication Technology,
ing convolutional neural networks with the multi-class focal
Electronics and Microelectronics (MIPRO), pages 1034–
loss. In IOP conference series: materials science and en-
1039. IEEE, 2018. 3
gineering, volume 428, page 012043. IOP Publishing, 2018.
[5] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas 5
Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-
[19] Chun-Wei Lu, Chih-Yang Lin, Chao-Yung Hsu, Ming-Fang
end object detection with transformers. In European Confer-
Weng, Li-Wei Kang, and Hong-Yuan Mark Liao. Identifica-
ence on Computer Vision, pages 213–229. Springer, 2020. 3,
tion and tracking of players in sport videos. In Proceedings
4, 6
of the fifth international conference on internet multimedia
[6] Anthony Cioppa, Adrien Deliege, Maxime Istasse, computing and service, pages 113–116, 2013. 3
Christophe De Vleeschouwer, and Marc Van Droogen- [20] Ahmed Nady and Elsayed E Hemayed. Player identification
broeck. Arthus: Adaptive real-time human segmentation in different sports. In VISIGRAPP (5: VISAPP), pages 653–
in sports through online distillation. In Proceedings of the 660, 2021. 3
IEEE/CVF Conference on Computer Vision and Pattern
[21] Sauradip Nag, Raghavendra Ramachandra, Palaiahnakote
Recognition Workshops, pages 0–0, 2019. 3
Shivakumara, Umapada Pal, Tong Lu, and Mohan Kankan-
[7] PC Alabama Enthusiast. Alabama @ South Carolina, 2019 halli. Crnn based jersey-bib number/text recognition in
(in under 38 minutes), 2019. 6 sports and marathon images. In 2019 International Confer-
[8] Sebastian Gerke, Karsten Muller, and Ralf Schafer. Soccer ence on Document Analysis and Recognition (ICDAR), pages
jersey number recognition using convolutional neural net- 1149–1156. IEEE, 2019. 3
works. In Proceedings of the IEEE International Conference [22] Chirag Patel, Atul Patel, and Dharmendra Patel. Optical
on Computer Vision Workshops, pages 17–24, 2015. 3 character recognition by open source ocr tool tesseract: A
[9] ROLL TIDE Graham. 2015-16 cotton bowl - Michigan State case study. International Journal of Computer Applications,
vs. Alabama (hd), 2015. 6 55(10):50–56, 2012. 5
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [23] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster,
Deep residual learning for image recognition. In Proceed- stronger. In Proceedings of the IEEE conference on computer
ings of the IEEE conference on computer vision and pattern vision and pattern recognition, pages 7263–7271, 2017. 3
recognition, pages 770–778, 2016. 4 [24] Joseph Redmon and Ali Farhadi. Yolov3: An incremental
[11] Benjamin Langmann, Seyed E Ghobadi, Klaus Hartmann, improvement. arXiv preprint arXiv:1804.02767, 2018. 3
and Otmar Loffeld. Multi-modal background subtraction us- [25] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
ing gaussian mixture models. In ISPRS Symposium on Pho- Faster r-cnn: Towards real-time object detection with region
togrammetry Computer Vision and Image Analysis, pages proposal networks. Advances in neural information process-
61–66, 2010. 3 ing systems, 28:91–99, 2015. 4, 6
[26] Bama Rewind. 2015 cotton bowl, Michigan State vs. Al-
abama highlights, 2015. 6
[27] rtsportsuploads. 2019 - LSU Tigers at Alabama Crimson
Tide in 40 minutes, 2019. 6
[28] Matko Šari, Hrvoje Dujmi, Vladan Papi, and Nikola Roži.
Player number localization and recognition in soccer video
using hsv color space and internal contours. In The Interna-
tional Conference on Signal and Image Processing (ICSIP
2008). Citeseer, 2008. 3
[29] Arda Senocak, Tae-Hyun Oh, Junsik Kim, and In So Kweon.
Part-based player identification using deep convolutional
representation and multi-scale pooling. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recogni-
tion Workshops, pages 1732–1739, 2018. 3
[30] Himanshu Singh. Advanced image processing using opencv.
In Practical Machine Learning and Image Processing, pages
63–88. Springer, 2019. 4
[31] Shuya Suda, Yasutoshi Makino, and Hiroyuki Shinoda. Pre-
diction of volleyball trajectory using skeletal motions of set-
ter player. In Proceedings of the 10th Augmented Human
International Conference 2019, pages 1–8, 2019. 3
[32] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
Polosukhin. Attention is all you need. In Advances in neural
information processing systems, pages 5998–6008, 2017. 4
[33] Kanav Vats, Mehrnaz Fani, David A Clausi, and John Zelek.
Multi-task learning for jersey number recognition in ice
hockey. In Proceedings of the 4th International Workshop on
Multimedia Content Analysis in Sports, pages 11–15, 2021.
3
[34] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen
Lo, and Ross Girshick. Detectron2. https://github.
com/facebookresearch/detectron2, 2019. 3, 4, 6
[35] Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xin-
dong Wu. Object detection with deep learning: A review.
IEEE transactions on neural networks and learning systems,
30(11):3212–3232, 2019. 3