Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Multiple interacting targets tracking with application to team sports∗

Matej Kristan, Janez Perš, Marta Bon


Matej Perše, Stanislav Kovačič Faculty of Sports
Faculty of Electrical Engineering University of Ljubljana, Slovenija
University of Ljubljana, Slovenia
matej.kristan@fe.uni-lj.si

Abstract tions by a new assumption, and extends the tracker to the


case of multiple players. Section 9 describes the experi-
The interest in the field of computer aided analysis of ments for evaluation of the tracker, and in Section 10 we
sport events is ever growing and the ability of tracking ob- draw some conclusions.
jects during a sport event has become an elementary task
for nearly every sport analysis system. We present in this
2 The closed world
paper a color based probabilistic tracker that is suitable
for tracking players on the playground during a sport game.
Since the players are being tracked in their natural environ- The closed world concept has been introduced to the vi-
ment, and this environment is subjected to certain rules of sion community by Intille and Bobick [4] in 1995. The main
the game, we use the concept of closed worlds, to model the premise is that for a given region in space and time, a spe-
scene context and thus improve the reliability of tracking. cific context is adequate to explain that region (i.e. deter-
mine all objects within the region). This region is called
the closed world and the context is a boundary in space
of knowledge, outside of which knowledge is not helpful
1 Introduction in solving the tracking problem. Thus, by considering the
sport match as a closed world, we define the context as a set
Computer aided analysis of human motion is becoming of closed world assumptions: (i) The camera overlooking
an appealing tool for enhancing the training effect in the the playground is static, and positioned such that it’s opti-
domain of multiple player sports. The potential of the com- cal axis is approximately perpendicular to the floor. (ii) The
puter integration with coaching is vast, since the problems playground is bounded, and its model can be calculated.
that demand a mass of computation can relatively easily be (iii) The illumination is nonuniform in space and time. (iv)
dealt with using the computer. Tracking all players on the The players’ textures are known in the beginning, and vary
playground during a game and obtaining their trajectories, during the game. An additional closed world assumption
is thus the cornerstone for nearly every type of analysis that will be defined later in the paper.
is interesting for the sport experts.
In this paper we address the problem of multiple interact-
ing targets tracking in indoor sport games such as basket- 3 Particle filtering
ball and handball, and present a system for tracking play-
ers using a static top-view plan. Since the sports domain The use of particle filters in computer vision is a rela-
can be considered as a semi-controlled environment, we use tively new and popular method for tackling the problem of
the closed world terminology presented in [4], and apply it tracking. In the domain of visual tracking, this approach has
to the case of color based particle filtering. We show how been pioneered by the work of Isard and Blake [5] in 1998,
the concept of the closed world assumptions can be used to and numerous variations and extensions of the method have
combine multiple separate trackers to improve the tracking been presented since. We present here only the basic con-
performance. cept and notations, and refer the interested reader to [1, 3, 5]
The remainder of the paper is organized as follows: Sec- for more details.
tion 2 defines the assumptions of the closed world, and Sec- Let xt−1 denote the state of a tracked object at time-
tion 3 the engine of the tracker. In Sections 4 to 7, a tracker step (t − 1), let yt−1 be an observation at that state, and
using closed world assumptions for a single player is pre- let y1:t−1 denote a set of all observations up to a time-step
sented. Section 8 augments the set of closed world assump- (t − 1). From a Bayesian point of view, all of the interesting
∗ The paper appeared in the proceedings of the 4th International Sympo- information about the target’s state xt−1 is embodied by its
sium on Image and Signal Processing and Analysis ISPA,September 2005, posterior p(xt−1 |y1:t−1 ). The aim of tracking is then to
pp.322-327. recursively estimate this posterior as the new observations

1
yt become available. This process is characterized by two The covariance matrix in Eq. (4) is a diagonal matrix
steps: prediction (Eq. 1) and update (Eq. 2). 2 2 2 2
Z Λt = diag(σxy , σxy , σab , σab ) · Ht2 , (5)
p(xt |y1:t−1 ) = p(xt |xt−1 )p(xt−1 |y1:t−1 )dxt−1 (1) p
where Ht = a2t + b2t is the size of the ellipse belonging
p(xt |y1:t ) ∝ p(yt |xt )p(xt |y1:t−1 ) (2) to the state xt .
The time-step constant drift dt is estimated on a set of
The recursion for the posterior thus requires a specifica- smoothed past estimates of the average states, which are
tion of a dynamical model describing the state evolution calculated as follows: Let ot−T :t−1 = {ok }t−1 k=t−T denote
p(xt |xt−1 ), and a model that evaluates the likelihood of any a set of T past smoothed states ok = (oxk , oyk , oak , obk )
state given the observation p(yt |xt ). of a tracked target and let πt−T :t−1 = {πok }k=t−T t−1
denote
In our implementation, we use the well known C ONDEN - the set of their weights. Let õt = ot−1 + dt denote the
SATION algorithm, where the posterior at time-step t − 1 is
current prediction on these sets. At time-step t, when the
represented by a finite particle set with normalized weights approximation to p(xt |y1:t ) becomes available, an average
(i)
wt−1 state x̂t = (x̂t , ŷt , ât , b̂t ) is calculated as
n oN
(i) (i)
p(xt−1 |y1:t−1 ) ≈ xt−1 , wt−1 , (3) N
i=1
(i) (i)
X
such that all the weights sum to one. At time-step t, the x̂t = wt xt , (6)
particles are resampled according to their weights in or- i=1

der to obtain an unweighted representation of the poste- and the weights wx̂t = p(yt |x̂t ) and wõt = p(yt |õt ) are
n oN
(i)
rior p(xt−1 |y1:t−1 ) ≈ x̃t−1 , N1 and are then sim- obtained via the likelihood function. The new smoothed
i=1 state ot is then calculated as a weighted sum
(i) (i)
ulated according to the dynamical model p(xt |x̃t−1 ), to
obtain a representation of the prediction p(xt |y1:t−1 ) ≈ õt · wõt + x̂t · wx̂t
oN ot = , (7)
wõt + wx̂t
n
(i)
xt , N1 . Finally, a weight is assigned to each parti-
i=1
(i)
cle according to the likelihood function wt = p(yt |xt ),
(i) and the corresponding weight is calculated as πot =
all weights are normalized, and the posterior at time-step p(yt |ot ).
t is approximated by a new particle set p(xt |y1:t ) ≈ The new time-constant drift dt+1 is then estimated as
n
(i) (i)
oN a weighted sum of successive past differences between the
xt , wt . smoothed average states
i=1
t
4 The Player model dt+1 = c−1
X
(ok − ok−1 )Gk (t), (8)
o
k=t−T +1
Our aim is to track players on the playground during
a sport event, which requires modelling the players them- where the weights are defined as
selves, or the regions that contain them. We chose an ellipse (k−t)2
− 21 2
to model these regions, since it provides a low-dimensional Gk (t) = πk πk−1 e , σo
(9)
presentation and is still robust enough to capture differ- Pt
ent appearances of a player. A state of the tracked player and where co = k=t−T +1 Gk (t) is the normalization fac-
xt ∈ X , with X denoting the state shape-space, is thus pa- tor. The first two terms in Eq. (9) are the weights of the cor-
rameterized by an ellipse xt = (xt , yt , at , bt ) with a center responding smoothed states and the last term is a Gaussian,
at (xt , yt ), and with parameters at and bt denoting the width which is used to assign higher a priori weights to the more
and height, respectively. recent states. The number of smoothed states in Eq. (8) is
It is usually argued that the player’s aim is to act in an for practical applications set to T = 3σo , since the a priori
unpredictable fashion to confuse the opponent, which im- weights of all older smoothed states are negligible .
plies that the dynamics should be modelled by a Brownian Assuming that a player can not radically change his or
motion. The player’s motion, however, is restricted by his hers direction and velocity within one half of a second, a
task and laws of physics; i.e. a player’s task is to travel from value for the parameter σo was chosen to comply with this
region A to region B, and during that traversal the position time frame. Since all our test sequences are recorded at a
cannot be changed instantly and arbitrarily due to the effects frame rate of 25 frames per second, we have chosen this pa-
of the inertia. We therefore define the state evolution model rameter to be σo = 4.3, which means, that in our application
p(xt |xt−1 ) as only T = 13 past smoothed states are considered.

p(xt |xt−1 ) = N (xt ; xt−1 + dt , Λt ), (4)


5 The Color cue
where N (·; µ, Σ) is a normal distribution with mean µ and
covariance matrix Σ and dt is a drift constant at time t for Usually, a player is depicted on an image only by a
all particles. small number of pixels, typically about 10x10, and its shape

2
changes rapidly. We therefore model the player by a color We have observed by empirical evaluation that the distance
histogram within an elliptical region. Such an approach has in Eq. (12) follows a Gamma like function and we define
proven on many occasions [2, 6] to be a powerful and ro- the likelihood function of state xt as
bust characterization of an object appearance. To increase
−D(hA ,ĥt ;hB )
robustness to clutter, a weighting kernel as in [2] is em- p(yt |xt ) ∝ D(hA , ĥt ; hB )γ1 −1 e γ2
, (14)
ployed, which assigns higher weights to the pixels that are
closer to the center of an ellipse, and lower weights to those where the parameters a and b were estimated on a large
farther away. Furthermore, if some a priori knowledge of number of successfully tracked objects and are given by
which pixels are not likely to belong to the player is avail-
able, it should be used in the construction of the histogram; γ̂1 = 2.242, γ̂2 = 0.055. (15)
i.e. these pixels should be ignored. An elegant way of dis-
carding the pixels that do not belong to the player is the use
of a mask function, which assigns an a priori zero weight 6 Generating mask
to those pixels that are not likely to have been generated by
the players and a value one to all the other pixels. A con- While histograms are powerful color cues for tracking
struction of such a mask function will be explained in the textured objects, they can fail severely when the object is
following sections. moving on a similarly textured background. This is usually
Let E = (x, y, a, b) be an elliptical region at some state due to their lack of the ability to capture the spatial relations
xt = (x, y, a, b). The RGB color histogram with B bins in texture, and the fact that they are always sub-sampled, in
hx = {hix }B i=1 , sampled within the ellipse E, is then de- order to increase their robustness. There is, however, still
fined as some useful information left in the current and the back-
X ground image; the difference between the two. By thresh-
hix = fh K(u)M (u)δi (b(u)), (10) olding this difference image with some appropriate thresh-
u∈E
old, we can construct a mask image, which filters out the
where u = (x, y) denotes a pixel within the elliptical re- pixels that are likely to belong to the background. Since
gion E, δi (·) is a Kronecker delta function positioned at the lighting conditions and the background color vary heav-
histogram bin i, b(u) ∈ {1...B} denotes a histogram bin in- ily across the playground, the threshold has to be estimated
dex associated with the color of a pixel at location u, K(·) dynamically.
is a weighting kernel as in [2] positioned at the center of an We propose a simple solution to estimating this thresh-
ellipse, M (u) is some a priori binary mask function, and fh old. At time t, a smoothed state ot is calculated via Eq. (7)
PB
is a normalizing constant such that i=1 hix = 1. and histograms hA and hB are sampled at that state from
The presence of a player in a given state is evaluated by the current image A(·) and the background image B(·),
comparing some candidate histograms to a reference his- respectively. If a distance %(hA , hB ) exceeds some pre-
togram of the tracked player. As a measure of distance be- scribed threshold value %thresh then a mask function
tween two histograms, say h1 and h2 , a Bhattacharyya dis- 
tance was chosen due to its optimality when considering a 1 ; kA(u) − B(u)k ≥ Tmask
MD (u) = (16)
frequency coded data, and is defined as 0 ; otherwise
X p
%(h1 , h2 ) = 2(1 − h1i h2i ). (11) is generated with a threshold Tmask calculated such that at
i
least 25 percent of pixels in the mask function MD (·) within
Since the model of the background can be obtained, we the state’s ot ellipse are assigned to the background. If the
define a measure that also considers some information from threshold %thresh is not exceeded, the mask is set to unity,
the background. Let ĥt be a reference histogram of the tar- i.e. MD (u) = 1 for all pixels u.
get, let xt be some hypothesized state of the target and let
hA and hB be the histograms at that state, sampled on the
7 Adaptation
current and the background image, respectively. Further-
more, let β denote a portion of all pixels within the elliptical
region, that are not assigned to the background by the mask As the player moves across the playground, its texture
function M (·). The proposed measure is then defined as varies due to nonuniform lighting conditions, influences of
the background and variations in the player’s pose. To im-
D(hA , ĥt ; hB ) = β −1 Dn (hA , ĥt ; hB ), (12) prove tracking under these conditions, we adapt the refer-
ence histogram ĥt at time-step t with the histogram hot ,
where Dn (hA , ĥt ; hB ) is the normalized distance between sampled at the current smoothed state ot (Eq. 7) of the tar-
hA and ĥt given the background histogram hB and is de- get. The extent by which the model adapts to the target is
fined as determined by the normalized distance function defined in
%(hA , ĥt ) Eq. (13); that is, if the smoothed state is likely to have been
Dn (hA , ĥt ; hB ) = q . (13) falsely estimated, then the model histogram should be up-
%(hB , ĥt )2 + %(hA , ĥt )2 dated by a very small amount, or not at all, and it should be

3
adapted by some larger amount otherwise. The adaptation
s3
equation follows the form of

ĥt = αhot + (1 − α)ĥ−


t , (17) s2

where the superscript in ĥ−


t denotes the reference histogram s1 s4
prior to adaptation. Let hA be a histogram sampled at a
state ot on the current image and let hB be a histogram
sampled at the same state but on the background image. The
adaptation parameter α is then defined as

α = Ωmax · (1.0 − Dn (hA , ĥ− Figure 1. Four seeds divide the space into
t ; hB )), (18)
four pairwise-disjoint convex Voronoi parti-
where Ωmax denotes the maximal adaptation. tions.

8 Tracking multiple players


(j)
dt+1 is the j th player’s drift at the current time-step
The closed world assumption about the camera position (Eq. 8).
imposes a restriction, that players are viewed from above. Prior to a tracking iteration, therefore, an image can be
Since it is usually not a valid situation when one player ends partitioned into Np Voronoi regions based on the predic-
up on top of another, we can assume that it is highly unlikely tions of all Np players. The idea is then to track each player
to observe a complete occlusion during the match. only within a region that contains its prediction. Restriction
This forms a new closed world assumption: (v)At a given of a tracker for the j th player to its partition Pj can easily be
time-step, two players can not occupy the same position. achieved by use of an additional mask function (j) MV (u)
With this in mind, we devise a framework for combin- defined as
ing separate particle filters by incorporating a spatial infor- 
(j) 1 ; u ∈ Pj
mation of the neighboring players. For the sake of the ar- MV (u) = , (20)
0 ; otherwise
gument, let’s presume that at a given time-step, the states
of all players are known. Let (j) x denote the state of the where u is a pixel contained by the region Pj . The mask
j th player and let there be Np players present on the play- function (j) M (u) in Eq. (10) is then defined as an intersec-
ground. The player with an index j is presented in the cur- tion of masks (j) MD (·) (Eq. 16) and (j) MV (·)
rent image by some set of pixels (j) K. Assume that for all
pixels from the set (j) K, the closest center among all play- (j)
M (u) = (j) MD (u) ∩ (j) MV (u), (21)
ers is the one of the j th player. Hence, if the true states
Np (j)
{(j) x}j=1 at a given time-step were in fact known, it would where by superscript (·) we emphasize that all masks are
be possible to partition the image into pairwise-disjoint re- player-dependent.
gions, such that each region would contain exactly one of The closed world tracking scheme for multiple players
the sets (j) K. with separate particle filters is described in the following
A partitioning of the space which achieves this kind of section.
topology is the Voronoi partitioning [7], mainly used by the
surface triangulation and obstacle avoidance algorithms. A 8.1 The Algorithm
Voronoi partitioning among Np points, called seeds in the
triangulation terminology, is completely defined by a set of Let there be a set of Np separate particle filter trackers,
Np
points S = {sj }j=1 and generates a set of Np pairwise- each for one player, defined by
N
p
disjoint convex partitions P = {Pj }j=1 , where each parti- n oNp
(j)
tion contains exactly one seed and for every point in a par- ĥt , (j) p(xt−1 |y1:t−1 ), (j) p(xt |xt−1 ) , (22)
ticular partition, the closest seed is the one included by that j=1

partition. For illustrative purposes an example of a Voronoi


where the j th tracker is defined by its posterior from the
two-dimensional space partitioning by four seeds is shown
previous time-step (j) p(xt−1 |y1:t−1 ), its transition distribu-
in Fig. (1).
tion (j) p(xt |xt−1 ) and the model histogram (j) ĥt . This is
In reality, at time-step t+1, prior to the tracking iteration,
the initial input data for the tracking iterations at the current
the true states of the players are not known, however, the
time-step t.
predictions of type
A one time-step iteration of the multiple-player tracking
(j)
x̃t+1 = (j) ot + (j) dt+1 (19) algorithm can be summarized as follows: Firstly, a set of
seeds S = {sj } is initialized at the predictions of all play-
are available, where ot is the smoothed estimate of the ers sj = (j) x̃t (Eq. 19) and the corresponding Voronoi par-
j th player’s state from the previous time-step (Eq. 7) and titions P = {Pj } are calculated. A tracker with index j = 1

4
is selected, its mask function (1) M (u) (Eq. 21) is con- wore white dresses, and the other six wore dark dresses.
structed and its C ONDENSATION iteration is processed. Af- The playground was mainly yellow and blue, with a few
ter completing this iteration, a smoothed estimate of the first advertisements, and it influenced the colors of the players
player’s average state (1) ot becomes available (Eq. 7) and severely: white players looked yellow on the yellow part
its corresponding weight is calculated as πot = p(yt |ot ). of the playground and became blue when they moved on
A histogram is sampled at that state on the current image the blue part. Since the handball is a sport where contacts
and the model (1) ĥt is adapted according to Eq. (17). As and near-occlusions happen frequently and some players of
described in the Sect. (6) a treshold for the mask function the same team often move close to one another, we feel it
(1)
MD (·) is calculated within the ellipse of the state (1) ot . is a good testing ground for multiple-player trackers. In
Finally, the seed s1 in the set S is replaced by the smoothed Fig. (2), an image from the entire test sequence is presented,
average state (1) ot and the Voronoi partitioning is recalcu- along with some of the contacts that happened during that
lated. This procedure is repeated for the tracker with index sequence and two examples of players moving over the ad-
j = 2 and continued in the same manner, until all of the vertisement.
trackers are processed. The algorithm is depicted by the
Alg. (8.1).

Algorithm 1. The multiple player tracking algo-


rithm. 1 2 4 5 6
N 8
p
• For all j set: S = {sj }j=1 ; sj = (j) x̃t
12 3 7
• For j = 1 : Np 9
10
1. Construct a set of Voronoi regions P = 11
Np
{pj }j=1 on a set of seeds S.
(j)
2. Construct the Voronoi mask MV (u) via
Eq. (20). (a)

3. Construct the difference image mask


(j)
MD (u) as described in Sect. (6).
4. Run the classical C ONDENSATION iteration on
the j th tracker (Sect. 3).
(j)
5. Calculate the smoothed average state ot (b) (c)
(Eq. 7).

6. Sample a histogram at (j) ot and adapt the


model to that histogram as in Sect. (7).
7. If needed, calculate the threshold for the mask
MD (u) (Sect. 6).
(d) (e)
8. Update the j th Voronoi seed by the smoothed
average state: sj = (j) ot .
Figure 2. A typical image with 12 players from
• End For j the test sequence for evaluation of tracker (a),
an example of colliding players (b) and corre-
sponding Voronoi partitions (c), and two oc-
casions of player moving over the advertise-
9 Experiments ment (d),(e).

Tests were conducted on a sequence of 948 images with


size of 384x288 pixels, from a recorded handball match Two tests were conducted to compare the Alg. (8.1) to
where the players were approximately 9x9 pixels large. The the one that considers each player separately and indepen-
frame rate of the sequence was 25 frames per second, which dently of one another. The parameters for both trackers are
was equivalent to approximately 38 seconds of the match. listed in Tab. (1). In the first experiment, the tracker was
All images comprised twelve players, where six of them manually initialized on a single player and the tracking pro-

5
Table 1. Tracker parameters used in experiments.
ellipse size parameter space Hx , Hy ∈ [6, 10]
standard deviations of dynamics σxy = 0.3, σab = 0.05
standard deviation for the state smoothing σv = 4.3f rames or σv = 0.172s
color likelihood parameters γ̂1 = 2.242, γ̂2 = 0.055
number of cells in a color RGB histogram B = 8x8x8
maximal histogram adaptation Ωmax = 0.04
number of samples in a single particle filter N = 20
threshold for masking %thres = 1.56
the state smoothing parameter σo = 4.3

manding data-set with a single-player tracker and came out


Table 2. The average failure rates for track- as being superior. The sequence on which the trackers were
ing twelve players with a single-player and a tested is indeed an arduous one, and one must not be mis-
multiple-player tracker. The symbols Fr /(38s) led to think that the single-player tracker performed poorly.
and Fr /(1min) denote the expected failure The only difference between the single- and multiple-player
rates for 38 seconds of the game and recal- tracker is that the latter considered the spatial relations be-
culated to a 1 min of the game, respectively. tween all players, while still tracking each player separately.
Thus, the decrease in failure-rate of a multi-player tracker is
tracker type Fr /(38s) Fr /(1min) mainly due to the ability of taking into account the clashes
single-player 21.6 34.1 and near-occlusions among the players. Conversely, if a
multiple-player 3.8 6.0 tracker performed poorly on this sequence it would not nec-
essarily mean that it has poor tracking capabilities, whereas
good performance certainly implies good tracking capabili-
ceeded over the entire sequence of the match; this experi- ties.
ment was conducted five times for each of the twelve play- In the future research, we plan to conduct more tests and
ers, and the number of instances when the tracker failed, compare our trackers to some reference trackers presented
and had to be reinitialized was recorded. The tracking fail- in the literature. We are also considering a possibility for a
ure was regarded as an occasion, when the tracker obviously tracker to automatically reinitialize when the tracking fails.
lost the player, and did not seem to be able to properly con-
tinue with tracking. As a result, the expected failure rate
was computed for the given sequence and is reported in
11 Acknowledgement
Tab. (2).
In the second experiment, the tracker was initialized The research described in this paper has been sponsored
manually on all of the twelve players, and all players were by InspireWorks.
tracked simultaneously over the entire sequence. When a
particular player was lost, it was reinitialized manually and References
the tracking proceeded. Again, this experiment was con-
ducted five times, and the expected failure rate was calcu- [1] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A
lated (Tab. 2). tutorial on particle filters for online nonlinear/non-gaussian
From the Tab. (2) it is obvious that the failure-rate de- bayesian tracking. IEEE Transactions on Signal Processing,
50(2):174–188, February 2002.
creased dramatically when the information about the neigh-
[2] D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of
boring players was taken into account, thus rendering the non-rigid objects using mean shift. IEEE Conference on Com-
multiple-player tracker superior to the single-player tracker. puter Vision and Pattern Recognition, pages 142–151, 2000.
[3] A. Doucet, S.Godsill, and C. Andrieu. On sequential monte
10 Conclusion carlo sampling methods for bayesian filtering. Statistics and
Computing, (10):197–208, 2000.
[4] S. S. Intille and A. Bobick. Closed-world tracking. In Int.
An algorithm for tracking multiple interacting players is Conf. of Computer Vision, pages 672–678, 1995.
presented in this article. The problem of tracking is formu- [5] M. Isard and A. Blake. Condensation – conditional den-
lated as a set of closed world assumptions, and the tracker sity propagation for visual tracking. International Journal of
based on particle filtering is built such, to satisfy each as- Computer Vision, 29(1):5–28, 1998.
[6] P. Perez, J. Vermaak, and A. Blake. Data fusion for visual
sumption. A multiple player tracker is proposed as a set of
tracking with particles. Proceedings of IEEE, 292(3):495–
separate trackers, one for each player, which are combined
513, January 2004.
by inferring a Voronoi partitioning among all players at each [7] J. R. Sack and J. Urrutia. Handbook of Computational Geom-
time-step, and tracking each player within its Voronoi cell. etry. Elsevier Science Pub Co, 2000.
The multiple-player tracker was compared on a very de-

You might also like