Professional Documents
Culture Documents
Recognizing Tactic Patterns in Broadcast B - 2012 - Journal of Visual Communicat
Recognizing Tactic Patterns in Broadcast B - 2012 - Journal of Visual Communicat
Recognizing Tactic Patterns in Broadcast B - 2012 - Journal of Visual Communicat
23 (2012) 932–947
a r t i c l e i n f o a b s t r a c t
Article history: The explosive growth of the sports fandom inspires much research on manifold sports video analyses and
Received 2 February 2012 applications. The audience, sports fans, and even professionals require more than traditional highlight
Accepted 6 June 2012 extraction or semantic summarization. Computer-assisted sports tactic analysis is inevitably in urging
Available online 15 June 2012
demand. Recognizing tactic patterns in broadcast basketball video is a challenging task due to its compli-
cated scenes, varied camera motion, frequently occlusions between players, etc. In basketball games, the
Keywords: action screen means that an offensive player perform a blocking move via standing beside or behind a
Sports video analysis
defender for freeing a teammate to shoot, to receive a pass, or to drive in for scoring. In this paper, we
Object tracking
Camera calibration
propose a screen-strategy recognition system capable of detecting and classifying screen patterns in
Multimedia system basketball video. The proposed system automatically detects the court lines for camera calibration, tracks
Tactic analysis players, and discriminates the offensive/defensive team. Player trajectories are calibrated to the real-
Kalman filter world court model for screen pattern recognition. Our experiments on broadcast basketball videos show
Pattern recognition promising results. Furthermore, the extracted player trajectories and the recognized screen patterns visu-
Broadcast basketball video alized on a court model indeed assist the coach/players or the fans in comprehending the tactics executed
in basketball games informatively and efficiently.
2012 Elsevier Inc. All rights reserved.
1. Introduction bridging the semantic gap between low-level features and high-le-
vel events. Duan et al. [1] employ a supervised learning scheme to
With the rapid advance in video production technology and the perform a top-down shot classification based on mid-level repre-
consumer demand, the proliferation of digital content necessitates sentations, including motion vector field model, color tracking
the development of automatic systems and tools for semantic mul- model and shot pace model. Tien et al. [2] present an automatic sys-
timedia information analysis, understanding, and retrieval. As tem capable of segmenting a basketball video into shots based on
important multimedia content, sports video has been attracting scene change detection. Then, shots are classified into three types:
considerable research efforts due to the commercial benefits, close-up view, medium view, and full court view, based on the ratio
entertaining functionality, and the audience requirements. Gener- of the dominant color pixels in the frames. The shots of full court
ally, the breaks or commercials make the broadcast sports videos view are more informative than the other two and are chosen for
somewhat tedious. Most of the audiences would rather retrieve content analysis. Hung and Hsieh [3] categorize shots into pitch-
the events, scenes, and players of interest than watch a whole er-catcher, infield, outfield, and non-field shots. Then, they combine
game in a sequential way. Therefore, a great amount of research fo- the detected scoreboard information with the obtained shot types
cuses on shot classification, highlight extraction, event detection as mid-level cues to classify highlights using Bayesian Belief Net-
and semantic annotation. Traditional video content analysis for work (BBN) structure. Fleischman et al. [4] use complex temporal
quick browsing, indexing, and summarization has become a basic features, such as object, field type, speech, camera motion start time
requirement. More keenly than ever, the sports fans and profes- and end time, etc. Temporal data mining techniques are exploited
sionals desire to watch a sports game not only with efficiency to discover a codebook of frequent temporal patterns for baseball
but also with variety, profundity, and professionalism. Especially, highlight classification. Cheng and Hsu [5] fuse visual motion infor-
the coach and players prefer better understanding of the tactic pat- mation with audio features, including zero crossing rate, pitch per-
terns taken in a game. Hence, the research on computer-assisted iod, and Mel-frequency cepstral coefficients (MFCC), to extract
sports tactic analysis starts rising and flourishing. baseball highlight based on hidden Markov model (HMM). Assfalg
Research on sports video analysis pours out in the past decade, et al. [6] present a system for automatic annotation of highlights
mainly focusing on feature extraction, structure analysis, and in soccer videos. Domain knowledge is encoded into a set of finite
state machines, each of which models a specific highlight. The vi-
⇑ Corresponding author. Fax: +886 3 573 6491. sual cues used for highlight detection are ball motion, playfield
E-mail address: huatsung@cs.nctu.edu.tw (H.-T. Chen). zone, players’ positions, and colors of uniforms. Chu and Wu [7]
1047-3203/$ - see front matter 2012 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jvcir.2012.06.003
H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947 933
trates the system flowchart. Broadcast basketball video contains To compute the eight independent parameters, we need at least
several prototypical shots, including close-up view shots, median four pairs of corresponding points—the points whose real world
view shots, and court view shot, as exemplified in the top part of coordinates and image coordinates are both known. Since line
Fig. 2. Among the different kinds of shots, court view shots best detection is more robust than locating the accurate positions of spe-
present the spatial relationships between players and the court. cific points, we utilize line intersections to establish point corre-
Thus, our analysis is focused on the court view shots. For court spondence. In the process, we make use of ideas in general
view shot retrieval, please refer to [18]. camera calibration, such as white line pixel detection and Hough
The proposed system starts with camera calibration. To extract Transform-based line extraction [25]. The correspondence between
court lines, we detect white pixels and then eliminate the white the video frame and the basketball court model are illustrated in
pixels in white or textured regions based on the width test and Fig. 3, which also shows the court lines to be extracted: L1 L5 .
line-structure constraint. Court lines are extracted using Hough
transform. The court line intersections are used as corresponding
3.1. Court line extraction
points to compute the camera calibration matrix, which maps real
world coordinates to image points or vice versa. With the court line
In the literature, Farin et al. [25] have proposed an algorithm of
information obtained, we compute the dominant colors and ex-
extracting court lines and finding line-correspondences, which
tract players within the court region. For tactic analysis, we have
performs well in several kinds of sport videos such as tennis, vol-
to classify the players and discriminate the offensive/defensive
leyball, and soccer. Hough transform is applied to the detected
team. Player trajectories are extracted using Kalman filter and
white line pixels. The parameter space (h, d) is used to represent
are mapped to the court model for screen pattern recognition.
the line: h is the angle between the line normal and the horizontal
The visualized presentation of the extracted player trajectories
axis, and d is the distance of the line to the origin. An accumulator
and recognized screen patterns on the court model gives the
matrix for all (h, d) is constructed sampling at a resolution of one
coach/players or the fans a further insight into the game and
degree for hand one pixel for d. Since a line in (x, y) space corre-
guides them through an efficient way to tactic understanding.
sponds to a point in (h, d) space, line candidates can be determined
by extracting the local maxima above a threshold rin the accumu-
3. Camera calibration lator matrix. The line candidates are then classified into the hori-
zontal lines and the vertical lines. Next, the line candidates are
Camera calibration is an essential task to provide geometric sorted according to their distances to the image boundary for find-
transformation mapping the positions of the players in the video ing line-correspondence.
frames to the real-world coordinates or vice versa. Since the bas- Nevertheless, when applying the method [25] to basketball vid-
ketball court can be assumed to be planar, the mapping from a po- eos, we find that the performance is not as good as expected. One
sition p ¼ ðx; y; 1ÞT in the court model coordinate system to the major problem is the determination of the threshold value r.
image coordinates p0 ¼ ðu; v ; 1ÞT can be described by a plane-to- Fig. 4 shows sample results of court line extraction with different
plane mapping (a homography) p0 ¼ Hp, where H is a 3 3 r values. The original frame and the detected white line pixels
homography transformation matrix [25]. Homogeneous coordi- are shown in Fig. 4(a) and Fig. 4(b), respectively. Fig. 4(c)(e) show
nates are scaling invariant, so we can reduce the degrees of free- the extracted court lines with different r values. Small r value leads
dom for the matrix H to only eight. The transformation p0 ¼ Hp to many false lines, whereas large r value causes insufficient court
can be written in the following form: lines to compute camera calibration parameters. Moreover, the
0 10 1 0 0 1 0 1
h00 h01 h02 x u v where free-throw line (see Fig. 3) is very easy to be discarded because it
B CB C B 0 C B C is short in frames. To overcome the obstacles, we propose a step-
@ h10 h11 h12 A@ y A ¼ @ v A ¼ @ v A u ¼ u0 w0 ð1Þ
by-step method to find line-correspondence in basketball video.
h20 h21 1 1 w0 1 v ¼ v 0 c=w0 We search each specific line within a specific range in order to
Fig. 2. Flowchart of the proposed system for screen pattern recognition in broadcast basketball video.
H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947 935
Fig. 3. Correspondence between (a) video frame and (b) basketball court model.
Fig. 4. Sample results of court line extraction. (a) Original frame. (b) Detected white line pixels. (c)-(e) Extracted court lines with different threshold r.
gather all the necessary lines L1 L5 (see Fig. 3). The detailed pro- (3) Free throwlineL5 : Again, we filter out the white line pixels
cess is described in the following, and Fig. 5 presents the outside the region bounded by L3 and L4 , as shown in Fig. 5(e), and
illustration. apply Hough transform to the remaining white line pixels. Because
(1) SidelineL1 and baselineL2 : As shown in Fig. 5(a), the court re- the main camera is located at the center of the court, the angle be-
gion is mainly determined by the sideline L1 and baseline L2 , which tween the free-throw line L5 and the horizontal axis in the frame is
are the longest (near-)horizontal and (near-)vertical line in the greater than the angle between the baseline L5 and the horizontal
frame, respectively. (We say ‘‘near’’ because the baseline and side- axis no matter which side of court is on screen, as show in Fig. 3(a).
line are not exactly vertical and horizontal in the frame.) Thus, we Hence, we extract the free-throw line L5 via searching the local
extract L1 via searching the local maximum in the range jh p/2j6 maximum in the ranges 0 <h < hb and hb <h < p/2 for left court
qof the accumulator matrix produced by Hough transform (h = p/2 frames and right court frames, respectively, where hb is the angle
represents a horizontal line) and extract L2 via searching the local between the baseline normal and the horizontal axis. Fig. 5(f)
maximum in the range jh- p/2j>q, where qis a statistically deter- shows the detected free-throw line L5 .
mined threshold. To prevent extracting the sideline close to the
frame bottom, we add the constraint that the parameter d of the 3.2. Computation of camera calibration parameters
sideline L1 should be greater than quarter the frame height, as shown
in Fig. 6. Furthermore, the slope of the detected baseline also helps to With the court lines L1 L5 extracted, now we are ready to com-
identify whether the frame presents the left court or the right. pute the camera calibration parameters. Multiplying out the linear
(2) LinesL3 and L4 : With the baseline and sideline extracted, we system in Eq. (1), we obtain two equations, Eqs. (2) and (3), for
can determine the court region and filter out the white line pixels each corresponding point—the point whose real world coordinate
outside the court region, as shown in Fig. 5(b) and (c). We apply (x, y) and image coordinate (u, v) are both known.
Hough transform to the remaining white line pixels and extract
h00 x þ h01 y þ h02 ¼ uðh20 x þ h21 y þ 1Þ ð2Þ
the longest two (near-) horizontal lines as L3 and L4 , as shown in
Fig. 5(d). The top edge L3 and bottom edge L4 can be distinguished
h10 x þ h11 y þ h12 ¼ v ðh20 x þ h21 y þ 1Þ ð3Þ
by the slopes.
936 H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947
Fig. 5. Processing steps of extracting the court lines L1 L5 . (a) Detected sideline L1 and baseline L2 . (b) Court region. (c) White line pixels within the court region. (d) Detected
L3 and L4 . (e) White line pixels within the region bounded by L3 and L4 . (f) Detected free-throw line L5 .
From Eqs. (2) and (3), we set up a linear system AB = C as Eq. (4), 4. Player extraction and tracking
where n is the number of corresponding points.
0 10 1 Player tracking in broadcast basketball video is intrinsically
x1 y1 1 0 0 0 u1 x1 u1 y1 h 0 1
B CB 00 C u1 challenge task due to the frequent occlusions of the players, espe-
B0 0 0 x1 y1 1 v 1 x1 u1 y1 C B C cially the players of the same team wearing uniforms of the same
B CB h01 C B C
B
Bx
CB C B v1 C color. In addition to the playing field, a court view frame also con-
B 2 y2 1 0 0 0 u2 x2 u2 y2 CCBB h02
C B
C B C
C
B CB C B u2 C tains some areas of the audience or stadium, as shown in Fig. 3(a).
B0 1 v 2 x2 v 2 y2 C C B C
B 0 0 x2 y2 CB h10 C B C Objects and noises in the audience region will cause false alarms
B CB C¼B C
B .. CBB
C B v2 C and degrade the performance in player segmentation and tracking.
B CB h11 C B C
B . CB C B . C To improve tracking accuracy and computational efficiency, we
B CB B .. C
B CB h C C B C utilize the extracted baseline and sideline to determine the court
B CB 12 C B C
B CB Bu C region, as shown in Fig. 5(b), and direct the following processes
B CB h C @ nA
B xn yn 1 0 0 0 un xn un yn C @ 20 C
A at the court region, as illustrated in Fig. 7.
@ A vn
0 0 0 xn yn 1 v n xn v n yn h21
4.1. Foreground object segmentation
ð4Þ
In our system, the extracted court lines L1 L5 form six intersection As shown in Fig. 3, the court region contains two parts: (1) the
points, namely n = 6. To solve B, we can over-determine A and find a restricted area and (2) the playing field excluding the restricted
least squares fitting for B with a pseudo-inverse solution: area (say the unrestricted area). The color of the restricted area is
almost uniform, as is also the case for the unrestricted area. We
AB ¼ C; AT AB ¼ AT C; B ¼ ðAT AÞ1 AT B ð5Þ can construct the background model via detecting the dominant
Thus, we derive the parameters of camera calibration to form the colors of the restricted area and the unrestricted area individually
matrix transforming the real world coordinate to the image coordi- on the basis of Gaussian mixture model (GMM) [26,27]. Then,
nate. Since H homography matrix performs a plane- to-plane map- region growing is applied to construct the dominant color map,
ping, the image position p0 can also be transformed to a position p as shown in Fig. 7(b), where the non- dominant color pixels in
on the real world court model by black are deemed foreground pixels. The background subtraction
method is used to segment the foreground objects consisting of
p ¼ H1 p0 ð6Þ the difference between the video frames and the background
H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947 937
model, and morphological operations are performed on the fore- We then discriminate the offensive/defensive team to realize
ground pixels to remove small objects, noises and gaps, as shown which team possesses the ball. In a basketball game, the players
in Fig. 7(c). on defense try their best to avoid the offensive players from putt-
ing the ball into the basket, so each defender is expected to stand
4.2. Player clustering and offensive/defensive team discrimination closer to the basket than the offender he is guarding, which helps
to identify the team on offense. As show in Fig. 9, we map the in-
The foreground objects contain players of the two teams, refer- frame positions of players to the real-world court model by Eq. (6).
ees and some noises. Yet, only the player regions are contributive For each team, we compute the average distance from the basket to
to tactic analysis. Based on color information, we attempt to use players. The team with players averagely farther away from the
k-means clustering to separate the foreground regions into two basket than the other team is recognized as the offensive team.
clusters, each of which represents the uniform color of a team.
However, we cannot have just two clusters (k = 2) due to the 4.3. Kalman filter-based player tracking
non-player regions, such as referees or other noises To determine
the cluster number, we experiments on different number of clus- Being widely used in moving object tracking [14,15], the Kal-
ters, as shown in Fig. 8, where the x-axis indicates the cluster num- man filter provides optimal estimates of the system parameters,
ber k and the y-axis gives the clustering error. Generally, the more such as position and velocity, given measurements and knowledge
the clusters, the smaller the total distance between all data points of a system’s behavior [28]. In this work, we apply a Kalman filter-
and their corresponding cluster centroid, which is regarded as clus- based approach to track players. In general, the Kalman filter de-
tering error. Based on the experiment, we choose k = 6 since the scribes a system as:
clustering error almost converges when there are more than six
clusters, and we select YCbCr color space in our system since YCbCr xn ¼ An xn1 þ wn ð7Þ
leads to smaller clustering error. Thus, we separate the foreground
regions into six clusters and take the largest two clusters as the zn ¼ Hn xn þ v n ð8Þ
uniform colors of the two teams. Fig. 7(d) shows the clustered
where xn is the state vector (representing estimated player position
players of two teams.
at the nth frame), An is the system evolution matrix, and wn is the
system noise vector. zn is the vector of measurements (positions
of player candidates), Hn is the unit array, and vn is the measure
noise vector.
Initially, we create a new tracker for each player candidate.
Then, the Kalman filter algorithm is used for predicting each
player’s position in the next frame. The player candidate closest
to the predicted position is regarded as measurement and the
parameters of Kalman filter are update. If there is no candidate
close to the predicted position, we regard the predicted state as
measurement directly. Besides, since all players are expected to
stay in the court, once a tracker state is out of the court, we mark
it as missing. Every time a tracker misses, we double the search
Fig. 8. Experimental data with different color spaces and number of clusters. range, since the object may be occluded and the tracker misses
938 H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947
Fig. 9. Offensive/defensive team discrimination. (a) Original frame (b) Player positions mapped to the court model and the discrimination result.
Fig. 10. Front-screen. (a) Player trajectories in the screen strategy. (b) Before screen. (c) On screen. (d) After screen.
Fig. 11. Back-screen. (a) Player trajectories in the screen strategy. (b) Before screen. (c) On screen. (d) After screen.
Fig. 12. Down-screen. (a) Player trajectories in the screen strategy. (b) Before screen. (c) On screen. (d) After screen.
940 H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947
tances between offensive players. Besides, since the screen is set to not move to the baseline (not a down-screen), we analyze the
block the defender(s), there must be at least one defensive player screenee trajectory. In a back-screen the screenee tends to move
between the screener and his teammate. Thus, the screener can to the basket, while in a front-screen the screenee is likely to move
also be recognized. Generally, the defender stays close to his target around outside the restricted area. Accordingly, we discriminate
but not side by side since the defender has to prevent his target back-screen and front-screen by the angle between the screenee’s
from driving to the basket. On the other hand, the screener must moving direction (dmov ing = p0last - p0screen Þ and the direction from the
be in contact with the defender in order to block the defender screenee to the basket (dbasket = p0basket - p0screen Þ. If the angle between
effectively. Accordingly, if we find that there are two offensive the two directions is small, that is, the screenee tends to move to
players close to each other, and there is at least one defensive the basket, our proposed system recognizes it as a back-screen;
player between them, we can infer that there is a screen and the otherwise, it is recognized as a front-screen.
screener is the offensive player closer to the defensive player. Algo-
screenType
rithm 2 depicts the proposed screen detection method. In the fol- 8
> Down; if jpbasket pinit j > jpbasket pscreen j
lowing, we use screenee to mean the offensive player whom the >
>
>
> d d
screener sets screen for. < Back; if jpbasket pinit j 6 jpbasket pscreen j and cos1 d mov ingjjdbasket j < hs
mov ing basket
¼ ð9Þ
>
> 1 dmov ing dbasket
> Front;if jpbasket pinit j 6 jpbasket pscreen j and cos
> dmov ing jjdbasket j
P hs
Algorithm 2: Screen Detection >
:
undefined; else
Input: playersoff and playersdef //offensive and defensive
players
6. Experimental results and discussion
Output: screener and screenee //the screener sets screen for the
screenee
To evaluate the effectiveness of our proposed framework, we
screener:= nil
conduct the experiments on the video data of the Beijing 2008
screenee:= nil
Olympic Games. The testing videos include four men’s basketball
if $ playeri 2 playersoff, playerj 2 playersoff: ds < distance
matches: USA vs. AUS, ARG vs. USA, USA vs. CHN, and ESP vs.
(playeri, playerj) < Ds and i –j then
USA recorded from live broadcast television programs with frame
if $ defender 2 playersdef: distance(defender, playeri) < ds or
resolution of 640 352 (29.97 fps). We manually select 20 clips
distance(defender, playerj) < ds
with obvious tactic execution from each of the four matches. To-
then
tally, we have 80 video clips (40 for training and the other 40 for
if distance(defender, playeri) < distance(defender, playerj)
testing).
then
screener :¼ playeri
6.1. Performance of court line detection and camera calibration
screenee :¼ playerj
else
In Section 3, we first detect white pixels using color informa-
screener :¼ playerj
tion. To improve the accuracy and efficiency of the subsequent line
screenee :¼ playeri
detection and camera calibration processes, we eliminate the
end if
white pixels in white regions or in textured regions based on the
end if
width test and the line-structure constraint. Fig. 13 demonstrates
end if
sample results of white line pixel detection. The original frames
return screener, screenee
are presented in Fig. 13(a). In Fig. 13(b), although most of the white
pixels in white regions such as white uniforms are discarded, there
are still false detected white line pixels occurring in the textured
areas. With line-structure constraint, Fig. 13(c) shows that the
5.2. Screen classification number of false detections reduced, and white line pixel candi-
dates are retained only if the pixel neighbor shows a linear struc-
The foregoing introduction describes that each screen type fol- ture. Those discarded pixels mostly come from spectators, the
lows a specific pattern. The front-screen is usually set around the score board overlay, the channel mark, and advertisement logos.
three-point line and the screenee moves to an open space instead Table 1 presents the numeric statistics of white pixel numbers
of driving to the basket. A screener sets a back-screen by moving with/without the line-structure constraint. It is obvious the line-
from the low post to the high post, and the screenee drives to structure constraint effectively filters out a high percentage of
the basket after the screen. The down-screen is set by a screener the non-line white pixels.
moving from the high post to the low post. Therefore, we have to In Section 3.1, we have mentioned that Farin’s court line extrac-
utilize the trajectories of the screener and screenee to recognize tion/camera calibration method [25] cannot achieve good perfor-
the screen pattern. Here we denote the initial position of the mance for basketball videos, although it works well on tennis,
screener as pinit , the last position of the screener as plast, the posi- volleyball, and soccer videos. The major reason is that the free-
tion of the screener when the screen is set as pscreen , and those of throw line is much shorter than other court lines and Farin’s meth-
the screenee as p0init , p0last , and p0screen . Also, the position of the basket od cannot effectively extract the free-throw line as well as other
is represented by pbasket . Take a look again at Figs. 10, 11. The down- court lines for finding line correspondences. Hence, we propose a
screen is the only type that the screener moves down to the low step-by-step approach to find the court lines for camera calibra-
post. Hence, we first check the screener trajectory and confirm tion. The camera calibration results are inspected manually that
whether the screener moves to the baseline by jpbasket pinit j and a frame is regarded as ‘‘well calibrated’’ when all the five court
jpbasket -pscreen j. If the screener moves to the baseline, our proposed lines (L1 L5 in Fig. 3) are extracted correctly. The accuracy of cam-
system recognizes it as a down-screen, as formulated in Eq. (9). The era calibration is then defined by
other two screen patterns (front-screen and back-screen) are hard
to be distinguished by the screener trajectory due to the similar #wellcalibratedframe
Accuracy ¼ ð10Þ
movement of the screeners. Once we find that the screener does #totalframes
H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947 941
Fig. 13. Results of white pixel detection. (a) Original frame. (b) Without line-structure constraint. (c) With line-structure constraint.
Table 1 Table 2
Statistics of white line pixel detection with/without line-structure constraint. (N 1 : Performance of the proposed court line extraction/camera calibration method.
average number of white line pixels without line-structure constraint per frame, N 2 :
average number of white line pixels with line-structure constraint, Pd : percentage of Matches # Frames # Well calibrated frames Accuracy
discarded non-line white pixels.) USA vs. AUS 1445 1358 93.98
ARG vs. USA 1242 1087 87.52
Matches N1 N2 P d (%)
USA vs. CHN 1359 1201 88.37
USA vs. AUS 10932.9 7200.1 34.14 ESP vs. USA 1368 1306 95.47
ARG vs. USA 8516.0 4651 45.39 Total 5414 4952 91.47
USA vs. CHN 9061.2 5379.0 40.64
ESP vs. USA 11509.7 6676.4 41.99
Farin’s court line extraction/camera calibration method [25] is
implemented for comparison. The threshold of Hough transform
is set r = 25 experimentally. A larger threshold will lead to the mis-
The results in Table 2 show that up to 91.47% of frames can be well detection of the free-throw line in most frames, resulting in a lower
calibrated by our proposed approach, which facilitates the subse- accuracy, while a smaller threshold will cause the explosively in-
quent analysis. Example calibrated results are demonstrated in crease of false lines, requiring much higher computational cost.
Fig. 14, where black lines are court lines extracted by our proposed The results are presented in Table 3, where N L gives the average
method, and red lines give the real court model projected onto im- numbers of extracted lines per frame. Since the number of court
age coordinates. Since the camera motion in a shot is continuous, lines is fixed, a larger N L value means more false lines are ex-
the corresponding points (the court line intersections) should not tracted. On average, the accuracy of Farin’s method is 77.32% with
move dramatically in successive frames. Hence, though court lines about 22 lines extracted per frame, compared to the accuracy
may be occluded by players in some frames, the incorrect coordi- 91.47% of our proposed approach. The main reason causing errors
nates of the corresponding points can be recovered by interpolation. is the misdetection of the free-throw line.
942 H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947
Fig. 15. Results of player extraction and clustering. (a) Original frame. (b) Dominant color map. (c) Player mask of team 1. (d) Player mask of team 2.
In spite of some errors in player tracking, the proposed system shows three failed cases. In Fig. 18(a), the screener moves from
can still correctly recognize the screen patterns in 35 testing clips. near the top sideline to the top of the three-point line and sets
Only five clips are misrecognized. Example results of correctly rec- the screen, and the screenee moves from near the bottom sideline
ognized screen patterns are demonstrated in Fig. 17. Take Fig. 17(a) to the top sideline. It should be a ‘‘front-screen,’’ but the screener is
for explanation. The screener, labeled with green ellipse under his closer to the basket when he sets the screen than he starts to move.
feet, moves up from near the free-throw line to the top of the As a result, our algorithm regards it as a ‘‘down-screen.’’ In
three-point line and sets the screen, and his offensive teammate Fig. 18(b), the screener moves from the free-throw line to the
(labeled with yellow ellipse) drives past the defender to the upper low post, and the screenee tries to move to the three-point line.
region. Thus, our algorithm regards it as a ‘‘front screen.’’ Fig. 18 It is a special case since the screener is setting the screen on the
944 H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947
Fig. 17. Results of player tracking and screen pattern recognition. (a and b) Front screen. (c and d) Back screen. (e and f) Down screen.
Table 4
Precision and recall of player tracking results for each testing clip.
USA vs. AUS ARG vs. USA USA vs. CHN ESP vs. USA
Clip Precision Recall Clip Precision Recall Clip Precision Recall Clip Precision Recall
1–1 89.27 88.78 2–1 89.68 89.30 3–1 94.26 94.46 4–1 95.65 75.86
1–2 76.39 79.94 2–2 87.26 86.94 3–2 93.28 94.43 4–2 88.33 72.60
1–3 80.46 82.97 2–3 92.83 93.60 3–3 91.40 90.56 4–3 89.83 81.54
1–4 81.53 82.08 2–4 86.80 88.80 3–4 92.01 92.84 4–4 91.94 91.94
1–5 88.16 89.51 2–5 88.64 88.18 3–5 99.19 97.27 4–5 90.91 76.92
1–6 84.83 85.36 2–6 88.15 86.65 3–6 92.56 94.02 4–6 87.67 79.01
1–7 82.91 84.29 2–7 92.70 94.03 3–7 92.50 93.98 4–7 93.94 83.78
1–8 86.02 85.18 2–8 90.36 91.49 3–8 93.24 94.17 4–8 93.62 83.02
1–9 84.74 85.21 2–9 86.88 87.53 3–9 92.34 92.79 4–9 93.75 93.75
1–10 89.18 90.34 2–10 93.48 92.27 3–10 97.02 96.32 4–10 94.74 78.26
All 84.74 85.77 all 89.63 89.72 all 94.07 94.32 all 91.67 81.77
moving path of the screenee instead of standing next to the defen- proposed system with Hu’s work [23], wherein player trajectories
der, so that the screener is farther away from the defender than the are extracted by a CamShift-based tracking method and mapped to
screenee when the screen is detected, and our algorithm fails to the real world court model for professional-oriented applications,
recognize the screener correctly. In Fig. 18(c), the screen is set near including wide-open event detection, trajectory-based target clips
the top edge of the restricted area, and the screenee moves from retrieval, and tactic inference.
outside the three-point line the free-throw line. Although the We apply Hu’s system [23] to our testing data, and Table 5 pre-
screenee does not drive to the basket and it should be a ‘‘front- sents the player tracking results.1 Fig. 19 illustrates the comparison
screen,’’ the angle between his moving direction and the basket on player tracking in terms of precision and recall. Compared with
direction is small so that our algorithm mistakes the screen type the average precision and recall rates (89.71% and 89.20%, respec-
as a ‘‘back-screen.’’ Overall, the proposed system achieves a high tively) of our player tracking method, Hu’s system achieve a similar
accuracy of 87.5% (35/40) in screen pattern recognition. precision rate (85.71% on average) but a lower recall rate (68.35%
averagely). One major factor causing the player tracking perfor-
6.4. Comparison and discussion mance not to be as good as in [23] is that our testing data contain
frequent occlusion of players of the same team since we focus on
In the literature, many approaches of content analysis in basket- screen pattern recognition. Thus, players are easily missed due to
ball games have been developed. The player tracking scheme with
applications to tactic analysis in [23] better meets the basketball 1
Many thanks to Dr. Hu Min-Chun, who is the corresponding author of [23], for
professionals’ requirements. Thus, we compare and discuss our providing us with their system for performance comparison.
H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947 945
Fig. 18. Failed cases of screen pattern recognition. (a) Front screen is misrecognized as down screen. (b) Down screen is misrecognized as back screen. (c) Front screen is
misrecognized as back screen.
Table 5
Player tracking results of [23].
USA vs. AUS ARG vs. USA USA vs. CHN ESP vs. USA
Clip Precision Recall Clip Precision Recall Clip Precision Recall Clip Precision Recall
1–1 90.41 66.73 2–1 85.26 51.92 3–1 83.48 52.46 4–1 89.47 72.34
1–2 92.03 74.26 2–2 76.92 52.33 3–2 82.43 64.44 4–2 94.74 73.97
1–3 91.46 75.76 2–3 91.45 51.94 3–3 85.20 70.46 4–3 96.12 80.00
1–4 88.15 75.50 2–4 74.01 76.71 3–4 95.59 63.93 4–4 91.37 80.38
1–5 91.82 77.68 2–5 82.64 59.52 3–5 78.50 70.34 4–5 87.10 58.70
1–6 80.99 69.28 2–6 83.70 53.47 3–6 88.03 55.38 4–6 84.87 80.63
1–7 79.56 80.00 2–7 78.45 59.87 3–7 95.63 63.22 4–7 88.03 67.76
1–8 85.71 73.33 2–8 83.56 65.56 3–8 98.26 64.20 4–8 92.74 72.78
1–9 86.16 72.49 2–9 72.47 68.32 3–9 94.83 54.19 4–9 83.15 74.50
1–10 79.63 76.44 2–10 83.33 59.52 3–10 76.67 79.58 4–10 74.06 76.29
All 88.11 72.55 all 80.17 60.41 all 85.96 64.97 all 86.56 74.11
Fig. 19. The comparison on player tracking between Hu’s method [23] and ours.
946 H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947
the well-known ‘‘error merge’’ problem, which means trackers lose research and manifold applications of automatic tactic analysis and
their associated objects and falsely coalesce with other objects when intelligence collection in various kinds of sports games.
two or more players of the same team occlude with each other. How-
ever, this comparison cannot assertively conclude which method Acknowledgments
outperforms the other, but shows that different methods have their
advantage in different cases. This research is supported in part by projects ICTL-100-Q707,
As for tactic analysis, three applications are introduced in [23]: ATU-100-W958, NSC 98–2221-E-009–091-MY3, and NSC 101–
(1) Wide-open detection: the relative/absolute player positions in 2218-E-009–004. Many thanks to Dr. Hu Min-Chun for providing
the court model coordinates are used to detect wide open events. us with their system for performance comparison.
Wide-open means that some offensive player is not well defended
by his/her opponents. (2) Trajectory-based video retrieval: the
distance between a query trajectory (manually drawn) and each References
trajectory in the database is computed to retrieve similar video
[1] L.-Y. Duan, M. Xu, Q. Tian, C.-S. Xu, J.-S. Jin, A unified framework for semantic
clips. (3) Defensive strategy analysis: based on player moving shot classification in sports video, IEEE Transactions on Multimedia 7 (6)
trajectories, the ‘‘stability’’ of defensive players is evaluated to (2005) 1066–1083.
classify the defensive strategy into one-on-one defense or zone [2] M.-C. Tien, H.-T. Chen, Y.-W. Chen, M.-H. Hsiao, S.-Y. Lee, Shot classification of
basketball videos and its applications in shooting position extraction, in:
defense. To realize the offensive strategy, we propose the screen Proceedings IEEE International Conference on Acoustic Speech Signal
pattern recognition method in this paper. Playing a fundamental Processing, 2007, pp. I- 1085–1088.
and essential role in offensive tactics, screen is a blocking move [3] M.-H. Hung, C.-H. Hsieh, Event detection of broadcast baseball videos, IEEE
Transactions on Circuits and Systems for Video Technology 18 (12) (2008)
performed by an offensive player, who stands beside or behind 3829–3832.
a defender, in order to free a teammate to shoot, to receive a pass, [4] M. Fleischman, B. Roy, D. Roy, Temporal feature induction for baseball
or to drive in for scoring. Hence, the visualized presentation of the highlight classification, in: Proceedings ACM Multimedia Conference, 2007,
pp. 333–336.
player trajectories and recognized screen patterns on the court [5] C.-C. Cheng, C.-T. Hsu, Fusion of audio and motion information on HMM-based
model assists the professionals or the audience in comprehending highlight extraction for baseball games, IEEE Transactions on Multimedia 8 (3)
the offensive tactics executed in basketball games informatively (2006) 585–599.
[6] J. Assfalg, M. Bertini, C. Colombo, A.D. Bimbo, W. Nunziati, Semantic annotation
and efficiently. It is hard to judge which tactic analysis application of soccer videos: automatic highlights identification, Computer Vision and
is more useful to the professionals or the general audience. Image Understanding 92 (2-3) (2003) 285–305.
Besides, increasing researchers have been devoted in this area. It [7] W. -T Chu, J.-L. Wu, Explicit semantic events detection and development of
realistic applications for broadcast baseball videos, Multimedia Tools and
can be anticipated that a wider variety of tactics analysis applica-
Applications 38 (1) (2007) 27–50.
tions will be designed for basketball and even for more kinds of [8] G. Zhu, Q. Huang, C. Xu, L. Xing, W. Gao, H. Yao, Human behavior analysis for
sports. highlight ranking in broadcast racket sports video, IEEE Transactions on
Multimedia 9 (6) (2007) 1167–1182.
[9] G. Zhu, C. Xu, Q. Huang, Sports Video Analysis: From Semantics to Tactics,
Multimedia Content Analysis, Springer, 2009. pp.1–44.
7. Conclusion [10] G. Zhu, C. Xu, Q. Huang, Y. Rui, S. Jiang, W. Gao, H. Yao, Event tactic analysis
based on broadcast sports video, IEEE Transactions on Multimedia 11 (1)
(2009) 49–67.
The more you know the opponents, the better chance of [11] G. Zhu, Q. Huang, C. Xu, Y. Rui, S. Jiang, W. Gao, H. Yao, Trajectory based event
winning you stand. Thus, game study in advance of the play is tactics analysis in broadcast sports video, in: Proceedings 15th ACM
an essential task for the coach and players. Tactic analysis pro- International Conference Multimedia, 2007, pp.58–67.
[12] G. Zhu, C. Xu, Y. Zhang, Q. Huang, H. Lu, Event tactic analysis based on player
vides informative and detailed insight of sports games but until and ball trajectory in broadcast video, in: Proceedings of the 2008
now little work has been devoted to this topic. It is a growing international conference on Content-based image and video retrieval, 2008,
trend to assist game study for intelligence collection in sports pp. 515-524.
[13] J.-R. Wang, N. Parameswaran, Detecting tactics patterns for archiving tennis
games with computer technology. To cater for this, in this paper video clips, in: Proceedings IEEE 6th International Symposium on Multimedia
we propose a screen-strategy recognition system capable of Software Engineering, 2004, pp. 186–192.
detecting and classifying screen patterns in broadcast basketball [14] X. Yu, C. Xu, H.-W. Leong, Q. Tian, Q. Tang, K.-W. Wan, Trajectory-based ball
detection and tracking with applications to semantic analysis of broadcast
video. The proposed system automatically detects the court lines
soccer video, in: Proceedings 11th ACM International Conference Multimedia,
for camera calibration, determines the court region and extracts 2003, pp.11–20.
the players using color information. The extracted players are fur- [15] X. Yu, H.-W. Leong, C. Xu, Q. Tian, Trajectory-based ball detection and tracking in
ther classified and discriminated into the offensive and defensive broadcast soccer video, IEEE Transaction on Multimedia 8 (6) (2006) 1164–1178.
[16] X. Yu, X. Tu, E.-L. Ang, Trajectory-based ball detection and tracking in
teams. Then, the proposed system tracks players using Kalman broadcast soccer video with the aid of camera motion recovery, in:
filter and calibrates the players’ positions to the real-world court Proceedings IEEE International Conference on Multimedia and Expo, 2007,
coordinates by the homography matrix obtained from camera pp.1543–1546.
[17] H.-T. Chen, H.-S. Chen, M.-H. Hsiao, W.-J. Tsai, S.-Y. Lee, A trajectory-based ball
calibration. Analyzing the extracted player trajectories on the tracking framework with enrichment for broadcast baseball videos, Journal of
real-world court model, the system can detect and recognize Information and Science Engineering 24 (1) (2008) 143–157.
the screen pattern on execution. Our experiments on broadcast [18] H.-T. Chen, M.-C. Tien, Y.-W. Chen, W.-J. Tsai, S.-Y. Lee, Physics-based ball
tracking and 3D trajectory reconstruction with applications to shooting
basketball videos show promising results. Furthermore, the ex- location estimation in basketball video, Journal of Visual Communication
tracted player trajectories and the recognized screen patterns and Image Representation 20 (3) (2009) 204–216.
are visualized on a court model, which indeed facilitates the [19] H.-T. Chen, W.-J. Tsai, S.-Y. Lee, J.-Y. Yu, Ball tracking and 3D trajectory
approximation with applications to tactics analysis from single-camera
coach/ players or the fans understanding the tactics executed in volleyball sequences, Multimedia Tools and Applications (2011), http://
basketball games. dx.doi.org/10.1007/s11042-011-0833-y.
In addition to the player trajectory, ball trajectory is also vital [20] S. Liu, M. Xu, H. Yi, L.-T. Chia, D. Rajan, Multimodal semantic analysis and
annotation for basketball video, EURASIP Journal on Applied Signal Processing
information for tactic analysis. Hence, we are currently working
2006 (2006) 1–13..
on integrating the framework of this paper with our previous work [21] Y. Zhang, C. Xu, Y. Rui, J. Wang, H. Lu, Semantic event extraction from
[18], which tracks the ball and reconstruct the 3D ball trajectory basketball games using multi-modal analysis, in: Proceedings IEEE
for shooting location estimation in basketball video, to build a International Conference on Multimedia and Expo, 2007, pp. 2190–2193,.
[22] M.-H. Chang, M.-C. Tien, J.-L. Wu, WOW: wild-open warning for broadcast
powerful basketball video analysis system. It is our belief that basketball video based on player trajectory, in: Proceedings ACM International
the preliminary work presented in this paper will inspire extensive Conference Multimedia, 2009, pp. 821–824.
H.-T. Chen et al. / J. Vis. Commun. Image R. 23 (2012) 932–947 947
[23] M.-C. Hu, M.-H. Chang, J.-L. Wu, L. Chi, Robust camera calibration and player [26] J. Han, D. Farin, P.H.N. de With, Broadcast court-net sports video analysis using
tracking in broadcast basketball video, IEEE Transactions on Multimedia 13 (2) fast 3-D camera modeling, IEEE Trans. Circuits and System for video
(2011) 266–279. technology 18 (11) (2008).
[24] T.-S. Fu, H.-T. Chen, C.-L. Chou, W.-J. Tsai, S. Y. Lee, Screen-strategy analysis in [27] Y. Liu, S. Jiang, Q. Ye, W.Gao, Q. Huang, Playfield detection using adaptive GMM
broadcast basketball video using player tracking, in: Proceedings IEEE and its application, in: Proceedings IEEE International Conference on
International Conference Visual Communications and Image Processing, Acoustics, Speech, and Signal Processing, vol.2, 2005, pp.421–424.
2011, pp. 1–4. [28] G. Welch, G. Bishop, An Introduction to the Kalman Filter, Technical Report
[25] D. Farin, S. Krabbe, P. H. N. de With, W. Effelsberg, Robust camera calibration TR95-041, University of North Carolina at Chapel Hill, 1995.
for sport videos using court models, in: Proceedings Storage and Retrieval [29] A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM Computing
Methods and Applications for Multimedia, 2004, pp. 80–91. Surveys 38 (4) (2006).