Professional Documents
Culture Documents
Multiple Interacting Targets Tracking With Application To Team Sports
Multiple Interacting Targets Tracking With Application To Team Sports
1
yt become available. This process is characterized by two The covariance matrix in Eq. (4) is a diagonal matrix
steps: prediction (Eq. 1) and update (Eq. 2). 2 2 2 2
Z Λt = diag(σxy , σxy , σab , σab ) · Ht2 , (5)
p(xt |y1:t−1 ) = p(xt |xt−1 )p(xt−1 |y1:t−1 )dxt−1 (1) p
where Ht = a2t + b2t is the size of the ellipse belonging
p(xt |y1:t ) ∝ p(yt |xt )p(xt |y1:t−1 ) (2) to the state xt .
The time-step constant drift dt is estimated on a set of
The recursion for the posterior thus requires a specifica- smoothed past estimates of the average states, which are
tion of a dynamical model describing the state evolution calculated as follows: Let ot−T :t−1 = {ok }t−1 k=t−T denote
p(xt |xt−1 ), and a model that evaluates the likelihood of any a set of T past smoothed states ok = (oxk , oyk , oak , obk )
state given the observation p(yt |xt ). of a tracked target and let πt−T :t−1 = {πok }k=t−T t−1
denote
In our implementation, we use the well known C ONDEN - the set of their weights. Let õt = ot−1 + dt denote the
SATION algorithm, where the posterior at time-step t − 1 is
current prediction on these sets. At time-step t, when the
represented by a finite particle set with normalized weights approximation to p(xt |y1:t ) becomes available, an average
(i)
wt−1 state x̂t = (x̂t , ŷt , ât , b̂t ) is calculated as
n oN
(i) (i)
p(xt−1 |y1:t−1 ) ≈ xt−1 , wt−1 , (3) N
i=1
(i) (i)
X
such that all the weights sum to one. At time-step t, the x̂t = wt xt , (6)
particles are resampled according to their weights in or- i=1
der to obtain an unweighted representation of the poste- and the weights wx̂t = p(yt |x̂t ) and wõt = p(yt |õt ) are
n oN
(i)
rior p(xt−1 |y1:t−1 ) ≈ x̃t−1 , N1 and are then sim- obtained via the likelihood function. The new smoothed
i=1 state ot is then calculated as a weighted sum
(i) (i)
ulated according to the dynamical model p(xt |x̃t−1 ), to
obtain a representation of the prediction p(xt |y1:t−1 ) ≈ õt · wõt + x̂t · wx̂t
oN ot = , (7)
wõt + wx̂t
n
(i)
xt , N1 . Finally, a weight is assigned to each parti-
i=1
(i)
cle according to the likelihood function wt = p(yt |xt ),
(i) and the corresponding weight is calculated as πot =
all weights are normalized, and the posterior at time-step p(yt |ot ).
t is approximated by a new particle set p(xt |y1:t ) ≈ The new time-constant drift dt+1 is then estimated as
n
(i) (i)
oN a weighted sum of successive past differences between the
xt , wt . smoothed average states
i=1
t
4 The Player model dt+1 = c−1
X
(ok − ok−1 )Gk (t), (8)
o
k=t−T +1
Our aim is to track players on the playground during
a sport event, which requires modelling the players them- where the weights are defined as
selves, or the regions that contain them. We chose an ellipse (k−t)2
− 21 2
to model these regions, since it provides a low-dimensional Gk (t) = πk πk−1 e , σo
(9)
presentation and is still robust enough to capture differ- Pt
ent appearances of a player. A state of the tracked player and where co = k=t−T +1 Gk (t) is the normalization fac-
xt ∈ X , with X denoting the state shape-space, is thus pa- tor. The first two terms in Eq. (9) are the weights of the cor-
rameterized by an ellipse xt = (xt , yt , at , bt ) with a center responding smoothed states and the last term is a Gaussian,
at (xt , yt ), and with parameters at and bt denoting the width which is used to assign higher a priori weights to the more
and height, respectively. recent states. The number of smoothed states in Eq. (8) is
It is usually argued that the player’s aim is to act in an for practical applications set to T = 3σo , since the a priori
unpredictable fashion to confuse the opponent, which im- weights of all older smoothed states are negligible .
plies that the dynamics should be modelled by a Brownian Assuming that a player can not radically change his or
motion. The player’s motion, however, is restricted by his hers direction and velocity within one half of a second, a
task and laws of physics; i.e. a player’s task is to travel from value for the parameter σo was chosen to comply with this
region A to region B, and during that traversal the position time frame. Since all our test sequences are recorded at a
cannot be changed instantly and arbitrarily due to the effects frame rate of 25 frames per second, we have chosen this pa-
of the inertia. We therefore define the state evolution model rameter to be σo = 4.3, which means, that in our application
p(xt |xt−1 ) as only T = 13 past smoothed states are considered.
2
changes rapidly. We therefore model the player by a color We have observed by empirical evaluation that the distance
histogram within an elliptical region. Such an approach has in Eq. (12) follows a Gamma like function and we define
proven on many occasions [2, 6] to be a powerful and ro- the likelihood function of state xt as
bust characterization of an object appearance. To increase
−D(hA ,ĥt ;hB )
robustness to clutter, a weighting kernel as in [2] is em- p(yt |xt ) ∝ D(hA , ĥt ; hB )γ1 −1 e γ2
, (14)
ployed, which assigns higher weights to the pixels that are
closer to the center of an ellipse, and lower weights to those where the parameters a and b were estimated on a large
farther away. Furthermore, if some a priori knowledge of number of successfully tracked objects and are given by
which pixels are not likely to belong to the player is avail-
able, it should be used in the construction of the histogram; γ̂1 = 2.242, γ̂2 = 0.055. (15)
i.e. these pixels should be ignored. An elegant way of dis-
carding the pixels that do not belong to the player is the use
of a mask function, which assigns an a priori zero weight 6 Generating mask
to those pixels that are not likely to have been generated by
the players and a value one to all the other pixels. A con- While histograms are powerful color cues for tracking
struction of such a mask function will be explained in the textured objects, they can fail severely when the object is
following sections. moving on a similarly textured background. This is usually
Let E = (x, y, a, b) be an elliptical region at some state due to their lack of the ability to capture the spatial relations
xt = (x, y, a, b). The RGB color histogram with B bins in texture, and the fact that they are always sub-sampled, in
hx = {hix }B i=1 , sampled within the ellipse E, is then de- order to increase their robustness. There is, however, still
fined as some useful information left in the current and the back-
X ground image; the difference between the two. By thresh-
hix = fh K(u)M (u)δi (b(u)), (10) olding this difference image with some appropriate thresh-
u∈E
old, we can construct a mask image, which filters out the
where u = (x, y) denotes a pixel within the elliptical re- pixels that are likely to belong to the background. Since
gion E, δi (·) is a Kronecker delta function positioned at the lighting conditions and the background color vary heav-
histogram bin i, b(u) ∈ {1...B} denotes a histogram bin in- ily across the playground, the threshold has to be estimated
dex associated with the color of a pixel at location u, K(·) dynamically.
is a weighting kernel as in [2] positioned at the center of an We propose a simple solution to estimating this thresh-
ellipse, M (u) is some a priori binary mask function, and fh old. At time t, a smoothed state ot is calculated via Eq. (7)
PB
is a normalizing constant such that i=1 hix = 1. and histograms hA and hB are sampled at that state from
The presence of a player in a given state is evaluated by the current image A(·) and the background image B(·),
comparing some candidate histograms to a reference his- respectively. If a distance %(hA , hB ) exceeds some pre-
togram of the tracked player. As a measure of distance be- scribed threshold value %thresh then a mask function
tween two histograms, say h1 and h2 , a Bhattacharyya dis-
tance was chosen due to its optimality when considering a 1 ; kA(u) − B(u)k ≥ Tmask
MD (u) = (16)
frequency coded data, and is defined as 0 ; otherwise
X p
%(h1 , h2 ) = 2(1 − h1i h2i ). (11) is generated with a threshold Tmask calculated such that at
i
least 25 percent of pixels in the mask function MD (·) within
Since the model of the background can be obtained, we the state’s ot ellipse are assigned to the background. If the
define a measure that also considers some information from threshold %thresh is not exceeded, the mask is set to unity,
the background. Let ĥt be a reference histogram of the tar- i.e. MD (u) = 1 for all pixels u.
get, let xt be some hypothesized state of the target and let
hA and hB be the histograms at that state, sampled on the
7 Adaptation
current and the background image, respectively. Further-
more, let β denote a portion of all pixels within the elliptical
region, that are not assigned to the background by the mask As the player moves across the playground, its texture
function M (·). The proposed measure is then defined as varies due to nonuniform lighting conditions, influences of
the background and variations in the player’s pose. To im-
D(hA , ĥt ; hB ) = β −1 Dn (hA , ĥt ; hB ), (12) prove tracking under these conditions, we adapt the refer-
ence histogram ĥt at time-step t with the histogram hot ,
where Dn (hA , ĥt ; hB ) is the normalized distance between sampled at the current smoothed state ot (Eq. 7) of the tar-
hA and ĥt given the background histogram hB and is de- get. The extent by which the model adapts to the target is
fined as determined by the normalized distance function defined in
%(hA , ĥt ) Eq. (13); that is, if the smoothed state is likely to have been
Dn (hA , ĥt ; hB ) = q . (13) falsely estimated, then the model histogram should be up-
%(hB , ĥt )2 + %(hA , ĥt )2 dated by a very small amount, or not at all, and it should be
3
adapted by some larger amount otherwise. The adaptation
s3
equation follows the form of
α = Ωmax · (1.0 − Dn (hA , ĥ− Figure 1. Four seeds divide the space into
t ; hB )), (18)
four pairwise-disjoint convex Voronoi parti-
where Ωmax denotes the maximal adaptation. tions.
4
is selected, its mask function (1) M (u) (Eq. 21) is con- wore white dresses, and the other six wore dark dresses.
structed and its C ONDENSATION iteration is processed. Af- The playground was mainly yellow and blue, with a few
ter completing this iteration, a smoothed estimate of the first advertisements, and it influenced the colors of the players
player’s average state (1) ot becomes available (Eq. 7) and severely: white players looked yellow on the yellow part
its corresponding weight is calculated as πot = p(yt |ot ). of the playground and became blue when they moved on
A histogram is sampled at that state on the current image the blue part. Since the handball is a sport where contacts
and the model (1) ĥt is adapted according to Eq. (17). As and near-occlusions happen frequently and some players of
described in the Sect. (6) a treshold for the mask function the same team often move close to one another, we feel it
(1)
MD (·) is calculated within the ellipse of the state (1) ot . is a good testing ground for multiple-player trackers. In
Finally, the seed s1 in the set S is replaced by the smoothed Fig. (2), an image from the entire test sequence is presented,
average state (1) ot and the Voronoi partitioning is recalcu- along with some of the contacts that happened during that
lated. This procedure is repeated for the tracker with index sequence and two examples of players moving over the ad-
j = 2 and continued in the same manner, until all of the vertisement.
trackers are processed. The algorithm is depicted by the
Alg. (8.1).
5
Table 1. Tracker parameters used in experiments.
ellipse size parameter space Hx , Hy ∈ [6, 10]
standard deviations of dynamics σxy = 0.3, σab = 0.05
standard deviation for the state smoothing σv = 4.3f rames or σv = 0.172s
color likelihood parameters γ̂1 = 2.242, γ̂2 = 0.055
number of cells in a color RGB histogram B = 8x8x8
maximal histogram adaptation Ωmax = 0.04
number of samples in a single particle filter N = 20
threshold for masking %thres = 1.56
the state smoothing parameter σo = 4.3