Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Deep Learning-based Automatic Player Identification and Logging in American

Football Videos

Hongshan Liu Colin Aderon Noah Wagon


Stevens Institute of Technology The University of Alabama The University of Alabama
Hoboken, NJ, 07030 Tuscaloosa, AL, 35401 Tuscaloosa, AL, 35401
hliu94@stevens.edu cadreon@crimson.ua.edu ndwagnon@crimson.ua.edu
arXiv:2204.13809v1 [cs.CV] 26 Apr 2022

Huapu Liu Steven MacCall Yu Gan


The University of Alabama The University of Alabama Stevens Institute of Technology
Tuscaloosa, AL, 35401 Tuscaloosa, AL, 35401 Hoboken, NJ, 07030
hliu68@crimson.ua.edu smaccall@ua.edu ygan5@stevens.edu

Abstract

American football games attract significant worldwide at-


tention every year. Game analysis systems generate cru-
cial information that can help analyze the games by pro-
viding fans and coaches with a convenient means to track
and evaluate player performance. Identifying participating
players in each play is also important for the video indexing
of player participation per play. Processing football game
video presents challenges such as crowded setting, distorted
objects, and imbalanced data for identifying players, es-
pecially jersey numbers. In this work, we propose a deep Figure 1. An example of the output of proposed system: player
learning-based football video analysis system to automati- detection, jersey number recognition and game log.
cally track players and index their participation per play. It
is a multi-stage network design to highlight area of inter- ers around the world [1]. With more sophisticated devel-
est and identify jersey number information with high accu- opments in data analysis and storage, there has been signif-
racy. First, we utilize an object detection network, a detec- icant growth in the demand for data-intensive analytic ap-
tion transformer, to tackle the player detection problem in plications in sports. Player identification is a standard way
crowded context. Second, we identify players using jersey for interpreting sports videos and for helping commenta-
number recognition with a secondary convolutional neural tors explain games on television. It also facilitates the com-
network, then synchronize it with a game clock subsystem. prehensive indexing of game video based on player partic-
Finally, the system outputs a complete log in a database ipation per play. However, in order to meet this demand,
for play indexing. We demonstrate the effectiveness and re- data analysis methods need to be equally improved over
liability of player identification and the logging system by traditional manual methods. Such traditional methods are
analyzing the qualitative and quantitative results on foot- time-consuming and impractical for handling large number
ball videos. The proposed system shows great potential for of football games recorded today.
implementation in and analysis of football broadcast video.
Although different player tracking strategies have been ex-
plored in various sports, the achievement of fast and accu-
rate player tracking in American football remains elusive.
1. Introduction
A key element of a successful analysis system is detection
American football is one of the most popular sports world- accuracy. However, camera blur, player motion, and small
wide. The 2022 Super Bowl had 99.18 million televi- targets pose many challenges to achieve a satisfactory accu-
sion viewers in the US and 11.2 million streaming view- racy for data analysis.
Figure 2. Flowchart of the proposed work.

In comparison with most sport analysis tasks, players par- summarize our contributions as:
ticipating in football games are usually crowded together
1. We propose, to our best knowledge, the first deep
and are prone to occlusion. The size of players also varies
learning-based football analysis system for jersey
significantly depending on their positions relative to the
number identification and logging.
camera. The small size of objects increases the difficulty of
identifying players. In addition, the instantaneous and rapid 2. We devise a two-stage neural network to address
movement of the players can blur the image and distorts player detection and jersey number recognition, re-
their shape. Different poses during falling and stumbling spectively. The two-stage design highlights the player-
can further make detection more challenging. A motion- related region for precise jersey number recognition.
based background subtraction method performs weakly on The first stage, a transformer design, addresses the is-
blurred and occluded objects. Classical object detection sue of identifying a player in a crowded setting. The
algorithms give unsatisfactory results when dealing with second stage further addresses the issue of data imbal-
crowded settings, because overlapping objects can lead to ance in jersey number identification.
false negative predictions. Since useful information is ig-
nored, sports analysis systems bring incomplete results that 3. We build a database for play indexing. The system
may mislead viewers. automatically extracts game clock information along
with identified jersey number, making it possible to
In this work, we developed an innovative football analy- generate a game log for player indexing.
sis system that automatically identifies players on a per- 4. We conduct experiments on American football videos
play basis and generates logs for play indexing. The pro- and demonstrate that our proposed system outper-
posed system operates directly on the video footage and si- forms Faster R-CNN by 37.7% in player detection and
multaneously extracts player information and game clock achieves 91.7% mean average precision in jersey num-
information on a frame-by-frame basis. Fig.1 shows the ber recognition.
typical output of proposed system. We utilize a detection
transformer-based network to detect players in a crowded
2. Related Work
setting and employ a focal loss-based network as a digit
recognition stage to index each player given an imbalanced Recent advances in image processing techniques have
dataset. We then capture game clock data per play. The opened the door for many interesting and effective solutions
football analysis system outputs a database containing an to this problem. Player identification can be classified into
index of each play in the game, as well as a list of quarters, two main categories: Optical Character Recognition (OCR)
game clock start/end times, and participating players. We based and Convolution Neural Networks (CNN) based ap-
proaches [20].

Before deep learning became popular, OCR-based ap-


proaches segmented images into HSV color space and clas-
sified numbers based on segmentation results [28]. In an-
other example, [19] extracted bounding boxes of players
by a deformable part model detector, then performed OCR
and classification. However, deep learning models have
shown to provide greater capacity in object detection tasks
and present more accurate results [35]. The CNN approach
in [8] detected players by histogram of oriented gradients
and a linear support vector machine (SVM) and classified
Figure 3. Image processed with one-stage YOLOv2, shows lim-
player numbers within bounding box by training a deep ited performance if a one-stage model is directly applied.
convolutional neural network. [2, 8, 13, 21, 33] explored jer-
sey number recognition by adopting various deep learning
models. [11] utilized Gaussian Mixture Model (GMM) for
background subtraction to locate moving foreground ob- Two-stage design. We choose a two-stage object detection
jects from static background and to detect moving players network to enhance the capability of small object-detection
from the field. In [4], Mask R-CNN and YOLOv2 were in high definition videos. In sports broadcasting and player
compared for player detection using the pre-trained mod- identification, the input image is usually in high definition.
els because of a lack of annotated data. [3, 16, 29, 31] uti- Conventional object detection networks usually downsize
lized pose information to guide the player detection deep an input image to a specific size, e.g. YOLO [24] resizes
learning network. [16] explored human body part cues and input to 416 × 416, SSD [17] resizes input to 512 × 512,
proposed a faster region-based convolution neural network which makes the player region unrecognizable. As shown
(R-CNN) approach for player detection and number recog- in Fig.3, a one-stage YOLOv2 [23] leads to an unsatisfac-
nition, which works well when the player and number are tory performance for two aspects. First, multiple players
shown clearly to camera, but is not often the case for broad- are identified within a single bounding box, lacking capabil-
cast video. To address player identification problem in ity to differentiate crowded players. Second, jersey number
broadcast video, [29] utilized body part features to design information are eliminated in downsized image, as limited
a system to recognize body representation at multi-scale number of pixels associated with a jersey number. It thus is
and enhanced accuracy. [6] proposed an adaptive real-time prone to detection error. To address this issue, we propose a
player segmentation approach by training a student net- two-stage design to first highlight a player region and then
work with the data annotated by a teacher network, which zoom into the player region for jersey number recognition.
saves manual annotating. However, segmentation requires
more computational resources, and the use of mask R-CNN
causes degraded performance for complex scenarios. Three subsystems. The overall flowchart consists of three
subsystems of pattern recognition: player detection, jer-
Most of the existing sports analysis studies have focused on sey number recognition, and clock identification. For the
player detection in soccer, hockey, and basketball. These first subsystem, we employ the Detection Transformer [5],
studies lack in-depth analysis of football player tracking. which can detect players within extremely crowded sce-
Football player identification is more challenging due to nario in order to address the technical challenge caused
the clustering pattern of crowded players. Besides the ex- by highly overlapped targets. The second subsystem pro-
ploration of player identification, specific player detection cesses the detected player proposals and identifies jersey
systems for football games have not been very extensively numbers accordingly. The digit recognition subsystem is
studied. To date, no database systems for automatic jer- implemented using RetinaNet50 [34]. The third subsystem
sey number detection and player information indexing have extracts game clock information for play indexing. The sub-
been reported. system operates on frame-by-frame basis to extract game
clock time at the start of a play using the Tesseract for op-
3. Methodology tical character recognition. Clock information gets filtered
into plays, which provides additional information of the es-
3.1. Overall Framework
timated time of each play within a game. Upon completing
We propose an automated football analysis system to index the three subsystems, the information discovered in the jer-
each play in a game by quarter, game clock start/end time, sey number is further associated to a readable and compre-
and a list of participating players, as shown in Fig.2. hensive database indexed by its game clock timestamp.
Figure 4. Example of data augmentation: (a) Original frame; (b)
Original frame with blurry effect; (c) Scaled frame; (d) Scaled
frame with blurry effect.
Figure 5. Example of players’ proposals; example of digit on jer-
3.2. Player Detection Subsystem sey.
Two main challenges are presented for player detection. feature map. At the last step, a feed-forward network (FFN)
First, we use data augmentation to alleviate data variation is connected to each prediction from the decoder to predict
caused by motion effects, such as blurry and distorted ob- the final class label, center coordinates, and dimensions of
jects. Second, we employ DEtection TRansformer (DETR) the bounding box. As in [5], the loss function contains Hun-
[5] to address the player detection problem against crowded garian loss and bounding box loss.
background settings.
3.3. Jersey Number Recognition Subsystem
To ensure our method is robust to frames that are motion-
corrupted, we augment our training dataset with more in- Jersey number recognition severely suffers from a data im-
stances having motion effects. Specifically, we apply a balance issue. Obtaining a balanced jersey number dataset
Gaussian filter [30] to a portion of training images to cre- with the jersey numbers from 1 to 99 requires a massive
ate realistic motion-corrupted images. The training images dataset for training. We propose to address this problem
are also scaled for data augmentation, as shown in Fig.4. in two directions. First, rather than recognizing two-digit
numbers, we strategically target single digit recognition,
Inspired by the transformer model in natural language pro-
therefore dramatically reduce the needs for training data.
cessing, the player detection model is capable of identify-
Second, we employ RetinaNet [15, 34] with focal loss de-
ing overlapped bounding boxes by reasoning their spatial
sign as the digit recognition step to alleviate the imbalance
relationships of those bounding boxes. Combining a set-
issue in the jersey number dataset.
based Hungarian loss which enables unique matching be-
tween predictions and ground-truth, DETR solves the set With our single digit problem formulation, the number of
prediction problem with a transformer. Compared to con- classes is reduced from 100 to 10. With this significant
ventional object detection methods, such as Faster R-CNN reduction in the number of classes, it is then possible to
[25], DETR benefits from the bipartite matching and atten- achieve accurate results with a training dataset at the thou-
tion mechanism of the transformer encode-decode architec- sand level. Although the distribution of the 10 classes may
ture [32], and therefore it excels in solving the detection still be unevenly, the data imbalance issue will be further
task within a crowded setting of a football game. addressed by focal loss.
The DETR framework in our implementation stems from The jersey number recognition subsystem is applied to the
ResNet50 [10], a conventional CNN backbone. The fea- detected players from the previous subsystem. Fig.5 shows
ture map is extracted from the input image, then augmented the example of proposed detected players and the jersey
with a positional encoding and then fed into the trans- numbers that are expected to be recognized. RetinaNet is
former encoder. Each encoder layer has a multi-head self- a one-stage detection with fast performance due to a one-
attention module to explore the correlation within the input time process for each input image. Two essential compo-
of each layer. For the transformer decoder, the learnable nents of the network are Feature Pyramid Networks (FPN)
object queries pass through each decoder layer with a self- and Focal Loss. FPN is built on the top of ResNet-50, for
attention module to explore the relations within itself. In the purpose of feature extraction, which therefore can take
the decoder, another cross-attention module takes the key images of any arbitrary size as input and then output pro-
elements of the feature map from the encoder output as side portionally sized feature maps at multiple levels in the fea-
input to let object queries cross-learn the features from the ture pyramid [14]. RetinaNet handles imbalance training
classes better with the use of focal loss. Traditionally, class
imbalance limits the performance of a classification model
because most regions of an image can be classified as back-
ground and result in unhelpful contributions towards learn-
ing [12]. As a consequence, the gradient is overwhelmed
and leads to degenerated models. Focal loss introduces a
focusing parameter to cross entropy loss as defined by [18]:
N
X
F ocalLoss = − yi log(pi )(1 − pi )γ (1)
i=1

where N denotes the number of classes; yi is binary and


equals 1 when i belongs to a true label, otherwise is 0; pi
is the probability distribution of the prediction of the i-th
class; γ is a focusing parameter to down-weight the loss of
easily classified regions therefore differentiating the major-
ity/minority regions.
3.4. Game Clock Subsystem
Game clock identification first extracts the game clock from
the game video. The subsystem then filters through the
game clock data to derive individual play windows. It pro- Figure 6. Color filtering demonstration: (a) player wears red jer-
sey, (b) histogram of center region of red jersey, (c) player wears
vides the information of the estimated start and end time of
white jersey; (d) histogram of center region of white jersey.
each play. The play windows feature the absolute range of
frames for each play and are associated with the player de- dows that contain the range of frames for each play. As
tection subsystem to index the players present in each play. shown in Fig.2, the output of the play designation filter from
Detection of Play Clock Information. We choose a simple left to right is play number, quarter number, frame at play
and fast solution, the Tesseract Optical Character Recogni- start, frame at play end, play start time, and play end time.
tion (OCR) [22], to build the main software used for game 3.5. Database Subsystem
clock subsystem because the clock region location in broad-
cast video has a distinctive background and uniform font as The database subsystem synchronizes the results from the
shown in lower left of Fig.2. The program begins by first game clock and player identification subsystems, and writes
breaking down the video into individual frames. Individ- the results into a readable format. In addition to the input
ual frames are then cropped into specific regions to isolate from the two subsystems, there are two additional inputs
the region with the game clock information. The cropped that we consider as prior knowledge: team jersey color and
image is then altered three different ways to reach the de- team roster.
sired OCR detection format. First, the image is grayscaled
Home and away team players are categorized by color fil-
to enhance contrast between the text and background. Sec-
tering. Different features of a color histogram are used to
ond, binary thresholding is applied to make the image black
differentiate teams. As examples shown in Fig.6, in prac-
and white. Finally, the image is color-inverted to reach the
tice, the most representative region which is a small strip
desired OCR format, which is black text on a white back-
in the middle of player proposals is extracted to isolate the
ground. After the preprocessing steps, the image is pro-
jersey. The x axis of the color histograms of images is the
cessed using the Tesseract OCR software for character de-
intensity from 0 to 255, and the y axis is the number of oc-
tection and is outputted together with the absolute frame
currences of that intensity. As an example of the red team
number. Zero is designated as output for scenes in which
player, the red channel intensity value is much higher than
clock information is absent. As shown in Fig.2, the infor-
blue and green on average. Validation on a large number of
mation in the output text file is, from left to right, the abso-
red team jersey images analyzed in a similar manner agrees
lute frame number, game clock information and play clock
with the observation that the red intensity value is consis-
information. Based on this setup, the absolute frame index
tently much higher. On the other hand, a white jersey from
allows for simple synchronization with the player detection
the other team is hypothesized to show no dominant color
subsystem.
channel. As shown in the corresponding color histogram,
Play Window Detection. A play designation filter program the white jersey has no dominant color channel. For the
is designed to sift through the text file and output play win- other jersey colors, a color filtering method can be easily
generalized. Finally, jersey numbers are converted to the Table 1. Player detection performance. AP0.05:0.95 denotes aver-
names of the players by roster. age AP over IoU from 0.05 to 0.95 with step size of 0.05. AP0.75
denotes the AP of IoU = 0.75. APs and APl denote AP0.05:0.95
of small and large objects, respectively.
4. Experiments
4.1. Implementation Details AP0.05:0.95 AP0.75 APs APl
Faster R-CNN 24.5 5.7 21.2 26.7
In our experiments, we compare the performance of our DETR 38.6 43.4 31.8 38.9
player detection subsystem with a base line model: Faster Table 2. Evaluation of game clock subsystem
R-CNN with ResNet50 backbone [25]. With each model,
we empirically set the IoU to 0.75 to extract the detected Evaluation Criteria Result
bounding box. We compute the Average Precision (AP) of Clock Detection Time are read correctly 95.6%
different Intersection of Union (IoU) and of different sized Quarter Number Quarter transitions are correct 100%
object. The size of small object is from 32 × 32 to 96 × 96; Start/ End Time Play Windows begin before the ball is snapped;
62%
the size of large object is above 96 × 96. We exclude the of each play end before the next snap of the ball

objects of size less than 32 × 32 because the jersey num-


ber is generally invisible when the whole player is included tractions caused by graphics and other media found in full
in such a small region. We transfer the weight that learned length footage. It is in standard high-definition (HD) dis-
by [5] and further train the DETR with ResNet50 as back- play resolution of 1280x720 pixels. Full-length footage of
bone for 500 epochs until it converges using our football the same game is available at [9]. Frames captured from the
players dataset. For comparison, the pretrained Faster R- video at 30 frames per second totaled 1196, 221 of which
CNN with ResNet-50 backbone was trained on our dataset are further filtered out for training and testing. To make sure
for 10,000 epochs until it converged. For each training the training and testing data are independent to each other,
epoch, we used a batch size of 2, learning rate of 10−4 . 162 frames from the first quarter through the third quar-
ter are used for training; 59 frames from the fourth quarter
To evaluate the jersey number recognition subsystem, we are used for testing. For data augmentation, 50 randomly-
trained the RetinaNet convolutional neural network with a selected frames from training set are enlarged for a zoom-in
pretrained weight from Model Zoo dataset [34]. All images effect. One copy of each image is given a blurry effect.
are padded with zeros to avoid distortion. The model was Therefore, the training set contains 424 frames after aug-
trained for 20,000 epochs with a learning rate of 10−4 and mentation, as examples shown in Fig.4.
a batch size of 81. We adopt the focal loss with focusing
parameter γ = 2. The performance of this model is mea- In the digit recognition subsystem, 2401 player proposals
sured by assessing its mAP of detection and classification extracted from [7] with visible jersey numbers are manu-
for all single digits. The optimal IoU threshold is found to ally labeled. Besides, we identify and label 2464 more im-
be 0.55, and the optimal confidence threshold is 0.97. Note ages from the 2015 Cotton Bowl [26] to add to this dataset.
that this approach considers the two digits of a two-digit Lastly, two copies of each image are created. One copy is
number independently. For example, the number “51” is la- a scaled version, and the other is a motion blurred version,
beled as a “5” and a “1”, separately. Combining the digit which brings the final image count to 14595, with 9730 of
recognition result on a left-to-right manner is further done the images being augmented copies.
as a post-processing step. 4.3. Results and Discussion
For the game clock subsystem, after clock information is ex- 4.3.1 Player Detection Subsystem
tracted by the OCR, we set the condition for play switching
as the period of 40 seconds of the game clock or the period We quantitatively analyze the performance of the player de-
of 5 seconds of the snap clock. We test the subsystem with tection subsystem. As shown in Table 1. We compare the
two game videos [7, 27] and three evaluation criterion: the player detection and Faster R-CNN in AP. We compute the
accuracy of the OCR, quarter transition and start/end time AP of an object with small size and large size, respectively.
of each play. All experiments were carried out in parallel The results show that DETR delivered better performance
on four Quadro RTX 6000 GPUs. in an extremely crowded setting. With an IoU of 0.75, the
average precision is 43.5%, which outperforms the Faster
4.2. Dataset R-CNN by 37.7%. For an object with small and large size,
the AP of the proposed model surpasses Faster R-CNN by
We validated our player detection subsystem using a 22
10.6% and 12.2% respectively.
minutes highlight video of a football broadcast game
footage from the 2015 Cotton Bowl [26]. As a proof of To analyze the performance qualitatively, we list 3 represen-
concept, we choose highlighted footage which is free of dis- tative images and their test results using different models.
Figure 7. Representative testing results. (a-c) original frames; (d-f) Faster R-CNN with IoU = 0.75; (g-i) Faster R-CNN with IoU = 0.5;
(j-l) DETR with IoU = 0.75.

Figure 8. (a) Digit counts grouped by jersey color. (b) Confusion Matrix of digit recognition result, with IoU =0.55 and Confidence=0.97.

Fig. 7 (a-c) are original frames from a sparse scenario, mod- R-CNN with an IoU of 0.5 (the 3rd row) is able to detect
erate scenario, and crowded scenario, respectively. On the some of the players, however, the detection could be only
second row, faster R-CNN fails to detect most players with part of the players and therefore hurt the next digit recogni-
an IoU of 0.75. In sparse and moderate scenarios, faster tion subsystem. The DETR in the last row outputs a desired
Figure 9. Part of the output game log, including the player tracking information, play number, quarter, start/end time, home/away team.

result with the majority of players detected with high con- the jump in the play clock.
fidence score. In crowded setting, Faster R-CNN fails to
detect relatively smaller object and especially misses a lot
4.3.4 Database Subsystem
of players in the crowded region. As the crowded scene is
very common and is the focus of our work, we favor the re- We present a screenshot of data format in the integrated
sult that the DETR delivered with almost every player fully database in Fig. 9. Each play is related with game informa-
detected. tion, such as quarter number, start and end time, home and
away team, and participating players of the home team. It
4.3.2 Jersey Number Recognition Subsystem is worth noting that the duplicated numbers are for offense
and defense. However, it doesn’t happen in the NFL be-
The final break-down and distribution of the jersey num- cause duplicated numbers are not allowed. A viewer could
ber dataset is shown in Fig.8a. The horizontal axis shows retrieve information and perform analysis either by play in-
the individual digits, and the vertical axis shows the num- dex or the jersey number of players. Integration of player
ber of instances. All columns are colored to show the split information and clock information can alleviate the impact
between home and away jerseys. The digit 2 with the most of players who may be partially occluded and invisible in
occurrences is twice as frequent as the lowest digit 6, in- some frames.
dicating a data imbalance issue generally existing in Jersey
number dataset, even for a single-digit dataset. However,
such imbalance issue is not dominant and we achieve a high
5. Conclusion
mAP of 91.7% in evaluation. In this work, we present a football game analysis sys-
In Fig.8b, we show a confusion matrix which is normalized tem based on deep learning. The proposed system en-
to the most popular class, digit 2. Digits 2,4,7,9 achieve ables player identification and time tracking. We propose
higher accuracy compared to digits 6 and 8, which corre- a two-stage network design to first highlight player region
sponds to the distribution of classes shown in Fig.8a. Even and then identify jersey number in zoomed in sub-regions.
though the utilization of focal loss alleviates the data im- The experimental results demonstrate robustness in han-
balance issue to some extent, we notice that most of the in- dling crowded scenarios and the capability to alleviate data
correct classifications come from digits that were partially imbalance issues. The proposed end-to-end system delivers
obscured or located on a player who was angled substan- complete tracking information using a single pass of foot-
tially with respect to the camera. ball game footage. Our player detection and jersey num-
ber recognition subsystems can be directly generalized to
other football game footage. The qualitative and quanti-
4.3.3 Game Clock Subsystem tative analysis have been thoroughly performed over three
Evaluation results are shown in Table 2. OCR performs well subsystems separately and over the entire system. The sys-
on recognizing the clock information with an accuracy of tem achieves accurate and efficient performance on separate
95.6%. Play window detection achieves 100% accuracy on tasks. The experimental result illustrates the reliability on
quarter transition, and an accuracy of 62% for start/end time handling the challenges of football game analysis.
designation.
For the future work, we plan to enhance the player tracking
In full game footage, the reset of the play clock can be used by integrating information through consecutive frames and
to track the end and subsequent start of each play. Although replenishing the players whose jersey numbers are not vis-
there are no standards for the end and start of a play in ac- ible in some frames due to movement. Very small objects
celerated footage, it is possible to start a new play based on could be detectable by using ultra-high resolution camera or
in combination with a super resolution method. Ball track- [12] Joffrey L Leevy, Taghi M Khoshgoftaar, Richard A Bauder,
ing could be fused to the current design and serves as com- and Naeem Seliya. A survey on addressing high-class im-
plimentary information towards game clock tracking, there- balance in big data. Journal of Big Data, 5(1):1–30, 2018.
fore helping with the play divide. The user interface that 5
outputs the database will be further improved by develop- [13] Gen Li, Shikun Xu, Xiang Liu, Lei Li, and Changhu
ing a more robust graphic user interface. Wang. Jersey number recognition with semi-supervised spa-
tial transformer network. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition Work-
Acknowledgement(s) shops, pages 1783–1790, 2018. 3
The authors would like to thank Nguyen Hung [14] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
Nguyen, Jeff Reidy, and Alex Ramey for their pre- Bharath Hariharan, and Serge Belongie. Feature pyra-
liminary study, data annotation, and supportive experi- mid networks for object detection. In Proceedings of the
ment. IEEE conference on computer vision and pattern recogni-
tion, pages 2117–2125, 2017. 4
[15] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He,
References and Piotr Dollár. Focal loss for dense object detection.
[1] Super bowl ratings history (1967-present), Feb 2022. 1 CoRR, abs/1708.02002, 2017. 4
[2] Zubaer Ahammed. Basketball player identification by jersey [16] Hengyue Liu and Bir Bhanu. Pose-guided r-cnn for jer-
and number recognition. PhD thesis, Brac University, 2018. sey number recognition in sports. In Proceedings of the
3 IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops, pages 0–0, 2019. 3
[3] Adrià Arbués-Sangüesa, Coloma Ballester, and Gloria Haro.
Single-camera basketball tracker through pose and semantic [17] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian
feature fusion. arXiv preprint arXiv:1906.02042, 2019. 3 Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C
Berg. Ssd: Single shot multibox detector. In European con-
[4] Matija Burić, Miran Pobar, and Marina Ivašić-Kos. Ob-
ference on computer vision, pages 21–37. Springer, 2016. 3
ject detection in sports videos. In 2018 41st International
[18] Wei Liu, Lin Chen, and Yajun Chen. Age classification us-
Convention on Information and Communication Technology,
ing convolutional neural networks with the multi-class focal
Electronics and Microelectronics (MIPRO), pages 1034–
loss. In IOP conference series: materials science and en-
1039. IEEE, 2018. 3
gineering, volume 428, page 012043. IOP Publishing, 2018.
[5] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas 5
Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-
[19] Chun-Wei Lu, Chih-Yang Lin, Chao-Yung Hsu, Ming-Fang
end object detection with transformers. In European Confer-
Weng, Li-Wei Kang, and Hong-Yuan Mark Liao. Identifica-
ence on Computer Vision, pages 213–229. Springer, 2020. 3,
tion and tracking of players in sport videos. In Proceedings
4, 6
of the fifth international conference on internet multimedia
[6] Anthony Cioppa, Adrien Deliege, Maxime Istasse, computing and service, pages 113–116, 2013. 3
Christophe De Vleeschouwer, and Marc Van Droogen- [20] Ahmed Nady and Elsayed E Hemayed. Player identification
broeck. Arthus: Adaptive real-time human segmentation in different sports. In VISIGRAPP (5: VISAPP), pages 653–
in sports through online distillation. In Proceedings of the 660, 2021. 3
IEEE/CVF Conference on Computer Vision and Pattern
[21] Sauradip Nag, Raghavendra Ramachandra, Palaiahnakote
Recognition Workshops, pages 0–0, 2019. 3
Shivakumara, Umapada Pal, Tong Lu, and Mohan Kankan-
[7] PC Alabama Enthusiast. Alabama @ South Carolina, 2019 halli. Crnn based jersey-bib number/text recognition in
(in under 38 minutes), 2019. 6 sports and marathon images. In 2019 International Confer-
[8] Sebastian Gerke, Karsten Muller, and Ralf Schafer. Soccer ence on Document Analysis and Recognition (ICDAR), pages
jersey number recognition using convolutional neural net- 1149–1156. IEEE, 2019. 3
works. In Proceedings of the IEEE International Conference [22] Chirag Patel, Atul Patel, and Dharmendra Patel. Optical
on Computer Vision Workshops, pages 17–24, 2015. 3 character recognition by open source ocr tool tesseract: A
[9] ROLL TIDE Graham. 2015-16 cotton bowl - Michigan State case study. International Journal of Computer Applications,
vs. Alabama (hd), 2015. 6 55(10):50–56, 2012. 5
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [23] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster,
Deep residual learning for image recognition. In Proceed- stronger. In Proceedings of the IEEE conference on computer
ings of the IEEE conference on computer vision and pattern vision and pattern recognition, pages 7263–7271, 2017. 3
recognition, pages 770–778, 2016. 4 [24] Joseph Redmon and Ali Farhadi. Yolov3: An incremental
[11] Benjamin Langmann, Seyed E Ghobadi, Klaus Hartmann, improvement. arXiv preprint arXiv:1804.02767, 2018. 3
and Otmar Loffeld. Multi-modal background subtraction us- [25] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
ing gaussian mixture models. In ISPRS Symposium on Pho- Faster r-cnn: Towards real-time object detection with region
togrammetry Computer Vision and Image Analysis, pages proposal networks. Advances in neural information process-
61–66, 2010. 3 ing systems, 28:91–99, 2015. 4, 6
[26] Bama Rewind. 2015 cotton bowl, Michigan State vs. Al-
abama highlights, 2015. 6
[27] rtsportsuploads. 2019 - LSU Tigers at Alabama Crimson
Tide in 40 minutes, 2019. 6
[28] Matko Šari, Hrvoje Dujmi, Vladan Papi, and Nikola Roži.
Player number localization and recognition in soccer video
using hsv color space and internal contours. In The Interna-
tional Conference on Signal and Image Processing (ICSIP
2008). Citeseer, 2008. 3
[29] Arda Senocak, Tae-Hyun Oh, Junsik Kim, and In So Kweon.
Part-based player identification using deep convolutional
representation and multi-scale pooling. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recogni-
tion Workshops, pages 1732–1739, 2018. 3
[30] Himanshu Singh. Advanced image processing using opencv.
In Practical Machine Learning and Image Processing, pages
63–88. Springer, 2019. 4
[31] Shuya Suda, Yasutoshi Makino, and Hiroyuki Shinoda. Pre-
diction of volleyball trajectory using skeletal motions of set-
ter player. In Proceedings of the 10th Augmented Human
International Conference 2019, pages 1–8, 2019. 3
[32] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
Polosukhin. Attention is all you need. In Advances in neural
information processing systems, pages 5998–6008, 2017. 4
[33] Kanav Vats, Mehrnaz Fani, David A Clausi, and John Zelek.
Multi-task learning for jersey number recognition in ice
hockey. In Proceedings of the 4th International Workshop on
Multimedia Content Analysis in Sports, pages 11–15, 2021.
3
[34] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen
Lo, and Ross Girshick. Detectron2. https://github.
com/facebookresearch/detectron2, 2019. 3, 4, 6
[35] Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xin-
dong Wu. Object detection with deep learning: A review.
IEEE transactions on neural networks and learning systems,
30(11):3212–3232, 2019. 3

You might also like