Professional Documents
Culture Documents
Football Analysis
Football Analysis
Center for Automation Research, University of Maryland, College Park, MD 20742, USA
number 𝑧𝐴 can be estimated by Monte Carlo simulation, which is 𝑃 (𝑣) = 𝜋𝑘 𝑝𝐶 (𝑣; 𝜇𝑘 , Σ𝑘 ) (3)
𝑘=1
however not needed in this work.
To get a new instance of co-occurrence function 𝑓 of type 𝐴, where 𝜋𝑘 ’s are the mixing probabilities, 𝑝𝐶 is a single ‘curved’
we need to generate new samples from 𝑃 (𝑓 ∣𝐴). For this purpose, Gaussian distribution defined on 𝔾𝕃(3), 𝜇𝑘 is the mean of each
we generate a function 𝑓 ′ in T𝜇𝐴 , the tangent space at 𝜇𝐴 , such Gaussian, and Σ𝑘 the covariance defined in the tangent space at 𝜇𝑘 .
that < 𝑓 ′ , 𝑓 ′ >= 1. Then we generate a Gaussian random num- We compute the plane homographies from each of the train-
2
ber 𝑟 ∼ N (0, 𝜎𝐴 ) and obtain the new co-occurrence function as ing videos by locating field markers in the image. The CGMM,
𝑓 = cos(𝑟)𝜇𝐴 + sin(𝑟)𝑓 ′ . One issue associated with this approach from a collection of training views {𝑣𝑗 }𝑀 𝑗=1 , is then learned as fol-
is that negative values of 𝑓 may occur, and in this case we discard lows. We first cluster 𝑣𝑗 ’s into different components by computing a
the generated sample. Another issue comes from the integer require- pairwise intrinsic metric between each pair (𝑣1 , 𝑣2 ) as 𝑑(𝑣1 , 𝑣2 ) =
ment of 𝐹 which may not be satisfied by the generated function. To ∣∣ log(𝑣1−1 𝑣2 )∣∣. From the pairwise similarity metric one can employ
overcome it we change the generated co-occurrence function to the any suitable unsupervised clustering technique to cluster 𝑣𝑗 ’s. Here
closest one of integer value. we make use of the repeated quadratic programming algorithm used
2) Generating Ground Plane Motion Pattern. As mentioned, in [5]. Once we have obtained 𝐾 clusters (components), each of
the complete motion trajectories in the ground plane consist of those which contains 𝑀𝑘 samples, we may estimate the 𝑘th mixing proba-
involving both offensive players and others. We model the genera- bility as 𝜋𝑘 = 𝑀 𝑀
𝑘
. Then the center of each component is estimated
tion of these motions in two steps: 1) generating the offensive ones from the samples clustered into that component, using exactly the
from the co-occurrence function, i.e., 𝑃 (𝐷∣𝑓, 𝐴); and 2) generating iterations between the exponential map and the logarithmic map for
the entire set of motions by 𝑃 (𝑇 ∣𝐷). Lie groups (see Appendix). Finally, the covariance Σ𝑘 is calculated
4586
as normally done in the tangent space at 𝜇𝑘 , which contains the log-
arithmically mapped component samples from 𝔾𝕃(3). To simulate
a new view from the learned CGMM, we randomly select a compo-
nent according to the mixing probability, locate the center, generate
a Gaussian random matrix with the covariance in the tangent plane, D6DPSOHVQDSVKRW E*URXQGWUXWKWUDMHFWRULHV F7UDFNLQJ G7UDMHFWRULHVIURPWUDFNLQJ
4587
is also of interest. It is also useful to model the interactions between
two groups (e.g., taking defensive side into account as well). We
may also consider incorporating articulated motion features, besides
simply point motion paths, to establish a ‘panoramic’ characteriza-
tion for a play, which may hopefully help achieving more accurate
recognition.
6. APPENDIX
For spatial co-occurrence functions, the exponential map E𝑓𝑚 :
Fig. 2. The recognition rates (%): D, M, and W stand for Dropback, 1
T𝑓𝑚 → F for 𝑓 ′ ∈ T𝑓𝑚 is defined as E𝑓𝑚 (𝑓 ′ ) = cos(< 𝑓 ′ , 𝑓 ′ > 2
Middle&right Run, and Wideleft Run respectively. 1
sin(<𝑓 ′ ,𝑓 ′ > 2 )
)𝑓𝑚 + 1 𝑓 ′ . The logarithmic map L𝐹 : F → T𝑓𝑚 is
<𝑓 ′ ,𝑓 ′ > 2
arccos(<𝑓,𝑓𝑚 >)
then given by L𝑓𝑚 (𝑓 ) = 1 𝑓 ∗ where 𝑓 ∗ = 𝑓𝑚 − <
random division of sample collection into training (approximately <𝑓 ∗ ,𝑓 ∗ > 2
80%) and testing (approximately 20%) sets. The homographies 𝑓, 𝑓𝑚 > 𝑓 .
corresponding to the view changes are determined by locating the For matrix Lie groups, the exponential map E𝑣𝑚 : T𝑣𝑚 →
landmark points on the football field. The free parameters 𝛼2 and 𝛾 𝔾𝕃(3) for 𝑣 ′ ∈ T𝑣𝑚 is given by E𝑣𝑚 (𝑣 ′ ) = 𝑣𝑚 exp(𝑣𝑚
−1 ′
𝑣 ). The
are simply taken as the variation of all pairwise distances between logarithmic map L𝑣𝑚 : 𝔾𝕃(3) → T𝑣𝑚 , meanwhile, is L𝑣𝑚 (𝑣) =
−1
trajectories and the normalizing factor 𝑁 cancels out. Since the 𝑣𝑚 log(𝑣𝑚 𝑣).
amount of the training samples is limited, we augment the size of Acknowledgment:This research was partially supported by
the training set during each training process as follows. We first learn DARPA VIRAT Phase I program.
a trajectory vocabulary for the trajectories in the original training set.
Then we perturb each original trajectory to get new ones and search 7. REFERENCES
for the new ‘word’ label for the new trajectories. The perturbation
[1] M. Lazarescu and S. Venkatesh, “Using camera motion to iden-
is realized by adding 2-D isotropic Gaussian on ground-plane co-
tify different types of American football plays,” in ICME, 2003,
ordinates at particular time instants (0%,20%,40%,60%,80%,100%
pp. 181 – 184.
of the video duration) and polynomially interpolating the other time
instants. Moreover, we shift the location of each trajectory (original [2] T. Liu, W. Ma, and H. Zhang, “Effective feature extraction for
and generated) entirely with another Gaussian. In this way, for each play detection in American football video,” in MMM, 2005.
of the original training play we get 20 synthetic plays, and thus the [3] C. Huang, H. Shih, and C. Chao, “Semantic analysis of soccer
eventual size of training set is 20 times more than the original one. video using dynamic Bayesian network,” IEEE Transactions on
For each testing play, we use the multi-object tracker reported Multimedia, vol. 8, no. 4, pp. 749 – 760, 2006.
in [9] to generate trajectories, due to its good performance on track- [4] S. Intille and A. Bobick, “Recognizing planned, multiperson
ing soccer players. The tracking results are shown in Figure 1(c)(d), action,” Computer Vision and Image Understanding, vol. 81,
with snapshots with bounding boxes and the tracks in the ground pp. 414 – 445, 2001.
plane. We generate 5 × 104 Mote Carlo samples for each testing
run as well as assume a uniform prior class probability. To evaluate [5] R. Li and R.Chellappa, “Recognizing coordinated multi-object
the effectiveness of models for statistical view change and the op- activity using a dynamic event ensemble model,” in ICASSP,
timal assignment, we design two baselines for a comparative study. 2009.
The first, namely ‘random view selection’ (RVS), does not simu- [6] R. Li, R.Chellappa, and S. Zhou, “Learning multi-modal den-
late a homography from the learned CGMM, but randomly picks out sities on discriminative temporal interaction manifold for group
one from all available training ones. The second baseline, denoted activity recognition,” in CVPR, 2009.
as ‘nearest player selection’ (NPS), picks out the spatially closest [7] G. Zhu et. al., “Trajectory based event tactics analysis in broad-
player in the testing play to match with each of the simulated rele- cast sports video,” in ACM MM, 2007.
vant players, instead of performing the Kuhn-Munkres assignment.
The play recognition results are shown in Figure 2. The proposed [8] H. W. Kuhn, “The Hungarian method for the assignment prob-
method outperforms all baselines and an average recognition rate of lem,” Naval Research Logistics Quarterly, , no. 2, pp. 83 – 97,
approximately 70% is obtained. 1955.
[9] S.W. Joo and R. Chellappa, “A multiple-hypothesis approach
5. DISCUSSION
for multiobject visual tracking,” IEEE Transactions on Image
We have proposed an algorithm to recognize football play strategies Processing, vol. 16, no. 11, pp. 2849 – 2854, 2007.
from realistic sports videos, and have shown preliminary empirical
performance of the approach. We believe that techniques for ex-
tracting high-level semantics in sports videos are worth continuous
investigation. For example, temporal detection of a particular play
4588