Professional Documents
Culture Documents
Video Object Tracking
Video Object Tracking
Andrea Cavallaro
andrea.cavallaro@elec.qmul.ac.uk
http://www.elec.qmul.ac.uk/staffinfo/andrea
20/07/2006
20/07/2006 13:48:36
13:48:38
20/07/2006
20/07/200613:48:37
13:48:39
20/07/2006 13:48:33
20/07/2006 13:48:34
13:48:38
20/07/2006 13:48:34
20/07/2006 13:48:40
20/07/2006 13:48:35
20/07/2006
20/07/2006 13:48:35
13:48:39
20/07/2006 13:48:41
20/07/2006 13:48:36
20/07/2006 13:48:36
20/07/2006
20/07/2006 13:48:40
20/07/200613:48:37
13:48:42
20/07/2006 13:48:37
20/07/2006
20/07/200613:48:38
13:48:41
20/07/2006 13:48:3813:48:43
20/07/2006
20/07/2006 20/07/2006
20/07/200613:48:39
13:48:39 13:48:42
20/07/200613:48:40
20/07/2006 13:48:44
20/07/2006 11:48:40
20/07/2006 13:48:43
20/07/2006 13:48:41
1
Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection
Framework
background
update
a priori information
and info from other cameras
2
Framework
a priori information
and info from other cameras
Framework
3
Why object detection is not enough?
?
?
?
Object tracking
4
Object tracking: examples
Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection
5
Problem statement
• Objective
• To predict the target state over time Æ Position, Shape
• Problems
• Changes in pose and illumination
• Partial and total occlusions
• Clutter and targets with similar appearance
• Steps
• Target representation Æ Normalised colour histogram
• Likelihood of a candidate Æ Based on Bhattacharyya coefficient
• Tracking algorithms
• Mean shift (MS)
• Particle filter (PF)
Likelihood
• Likelihood
• Color Æ RGB space Æ 3D color histograms
10x10x10 Bhattacharya
histogram (h) distance (d)
2
d ( h , href )
−
σ
p (C | X t ) = e
6
Mean shift: description
• Mean shift
• Deterministic non-parametric approach
• Iterative procedure
• Kernel-based
• Gradient-based approach
• If the distance function is smooth (kernel) Æ effective
7
Particle filter: description
• State x k = f k (x k −1 , u k )
• Observation z k = h k (x k , n k )
• Objective
• to estimate unknown state x k
based on a sequence of observations z k , k = 0,1,K
• find the posterior distribution N
p (x k | z1:k ) ≈ ∑ wki δ(x k − x ik )
i =1
• Solution (Bayesian)
• Prediction step
• Based on state equation
• Update step
• Based on likelihood function
• Typically
• Zero-order model x k = x k −1 + u k
• Limitation: random positioning of the particles
• First-order model x k = x k −1 + θ k −1 + u k
• Limitation: high manoeuvring targets
1 k −1
Ck ∝ ∑ xt − xt −1
n t = k − n −1
average state velocity in the previous n frames
8
Re-sampling
Posterior
• Problem
• weight
degeneration
x
• Solution
• re-sampling Re-sampling
(eliminates
particles
with
small weights)
Re-sampling
9
Hybrid tracker
re-sample particles x k = x k −1 + C k u k
Adaptive state transition model
Zero-order model with adaptive
apply state transition noise variances
1 k −1
MS for each particle Ck ∝ ∑ xt − xt −1
n t = k − n −1
E[.]
N
The operator Mean Shift acts on the p (x k | z1:k ) ≈ ∑ wki δ(x k − MS (x ik ))
position 2D state space only
i =1
Hybrid tracker
• Advantages
E[x]
- Extra computation is
compensated by less particles
10
Results
• Initialisation
• Ground-truth initialisation of the target
• Parameters
• Histograms: 10x10x10 (RGB)
• MS: 5 times with different kernel sizes (+/- 10%)
• PF, HT: 3D state model (to compare with MS): position; target size
• Transition model σx = σy = 14; σh = 0.013; ks = 5; kp= 10
• PF: 150 samples; HT: 30 samples
• Presentation of results
• Videos
• Sample frames & objective measure
Evaluation
• Subjective evaluation
• Side-by-side visual comparison of tracking results
ASE = W + H 2 2
W: width error
H: height error
ground truth
11
Results: highway
MS PF
Proposed
Evaluation: highway
MS PF Proposed
APE 0.95 12.8* 0.88
ASE 2.74 22.3* 3.58
MS
PF
Proposed
12
Results: soccer
MS PF
Proposed
Evaluation: soccer
MS PF Proposed
APE 242* 3.9 3.2
ASE 18.2* 10.8 9.8
MS
PF
Proposed
13
Results: table tennis
MS PF
Proposed
MS
PF
Proposed
14
Results: emilio
MS PF
Proposed
15
Multiple object tracking
g ( X ia , X bj ) = α . g1 ( X ia , X bj ) + β . g 2 ( X ia , X bj ) + γ . g 3 ( X ia , X bj ) + δ . g4 ( X ia , X bj ) − ( j − i − 1).τ
velocity size
v(x11) v(x13)
v(x12)
v(x21) v(x23)
v(x22)
v(x31) v(x33)
v(x32)
v(x41) v(x43)
V1 V2 V3
16
Graph matching: max path cover
v(x11) v(x13)
v(x12)
v(x21) v(x23)
v(x22)
v(x31) v(x33)
v(x32)
v(x41) v(x43)
V1 V2 V3
• Detection
• Usually frame-based
• Can be improved with temporal features (e.g., pedestrians)
• Trained classifier
• Choice of training set
• Choice of negative examples
• Choice of poses covered in the training set
• Tracking
• Propagates the initialisation information
• Model: template, statistical representation, parts, …
• Should update the model
• Should self-initialise
Æ Integration!
17
Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection
• Problem
• Detecting objects (e.g., faces) in clutter
• Tracking multiple object (e.g., faces) under occlusions
18
Face classifier
• Approach
• Cascade of classifiers
• Integral image
• Training
• Set of scales
• Output
Haar features
• Few false
for face detection
negatives
• Many false positives …
face detector
only
face detector
with
chromaticity
segmentation
19
Detection and tracking
Removal of
overlapping tracks
20
Particle (temporal) filtering for face tracking
with integration
21
Detection and tracking
particle spread
around detections
colour model
update
22
Automatic tracking with a PTZ camera
Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection
23
Multi-modal data fusion using particle filter (PF)
X t = {x, y, width, height}
reverberation filtering
onset multi-band
Audio p( D | X ) GCCF p( A | X t )
detection analysis
t
p (C | X t )
color
feature
histogram X PF
change
Video
detection
motion multivariate
p( M | X t )
feature gaussian
• Overall likelihood
p (O / X t ) = p ( M / X t ) p (C / X t ) p ( A / X t )
Audio likelihood
speaker
• Time delay of arrival (TDOA) noise
s1 (t ) = v(t ) + n1 (t )
s2 (t ) = λv(t + τ ) + n2 (t ) s2
attenuation
dSinθ
• Reverberation filtering θ s1
θ
• Onset detection based on precedence effect d
M2 M1
• Multi-band analysis
Filter
s1 (t )
GCC-PHAT ω1
s2 (t ) Onset
detection
GCC-PHAT ω2 ∑
Rˆ s1s2 ( f )
GCC-PHAT ω3
−
(ςˆ A(Rˆ s1s2 )− xt )2
1 2 σ 2A
p(A|X t ) = e
σ A 2π
24
Comparison
Audio Visual
50 313 425
25
Results – scene dynamics for teleconferencing
Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection
26
Event detection
Contextual information
• Scene modelling
• Gaussian for each area of interest
• outside zone Æ modelled with multiple Gaussians
50 50
100 100
150 150
outside_zone
200 200
opens
go_down_stairs
250 250 outside_zone
go_up_stairs
300 300 enters_zone
inside_zone
enters_zone
350 outside_zone 350
inside_zone
400 400
450 450
outside_zone
500 500 outside_zone
550 550
100 200 300 400 500 600 700 100 200 300 400 500 600 700
27
Object information
• Object detection and tracking
• Observations O
• Model parameters
• States ω = {ω1, ω2, ω3, … ωN}
• Probability
• State transition probabilities A = {aij}
• Emission probabilities B = {bjO}
• Initial state ω(0)
Results
28
Results
Summary
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection Acknowledgements
Emilio Maggio
Murtaza Taj
Matteo Bregonzio
Huiyu Zhou
Stefan Karlsson
http://www.elec.qmul.ac.uk/staffinfo/andrea
29
EU FP7 project APIDIS (2008 – 2010)
30