Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Video object tracking

Andrea Cavallaro

Queen Mary, University of London

andrea.cavallaro@elec.qmul.ac.uk
http://www.elec.qmul.ac.uk/staffinfo/andrea

20/07/2006
20/07/2006 13:48:36
13:48:38

20/07/2006
20/07/200613:48:37
13:48:39
20/07/2006 13:48:33
20/07/2006 13:48:34
13:48:38
20/07/2006 13:48:34
20/07/2006 13:48:40
20/07/2006 13:48:35
20/07/2006
20/07/2006 13:48:35
13:48:39
20/07/2006 13:48:41
20/07/2006 13:48:36
20/07/2006 13:48:36
20/07/2006
20/07/2006 13:48:40
20/07/200613:48:37
13:48:42
20/07/2006 13:48:37
20/07/2006
20/07/200613:48:38
13:48:41
20/07/2006 13:48:3813:48:43
20/07/2006
20/07/2006 20/07/2006
20/07/200613:48:39
13:48:39 13:48:42
20/07/200613:48:40
20/07/2006 13:48:44
20/07/2006 11:48:40
20/07/2006 13:48:43
20/07/2006 13:48:41

1
Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection

Framework

background
update

input change post-processing


pre-filtering
video detection

3D analysis tracking and


symbols event detection
classification

a priori information
and info from other cameras

2
Framework

input object post-processing


pre-filtering
video detection

3D analysis tracking and


symbols event detection
classification

a priori information
and info from other cameras

Framework

a priori information a priori information a priori information


and info from and info from and info from
other cameras other cameras other cameras

input object post-processing


pre-filtering
video detection

3D analysis tracking and


symbols event detection
classification

a priori information a priori information a priori information


and info from and info from and info from
other cameras other cameras other cameras

3
Why object detection is not enough?

?
?
?

frame n frame n+m

Object tracking

Frame n Frame n+m

4
Object tracking: examples

Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection

5
Problem statement

• Objective
• To predict the target state over time Æ Position, Shape

• Problems
• Changes in pose and illumination
• Partial and total occlusions
• Clutter and targets with similar appearance

• Steps
• Target representation Æ Normalised colour histogram
• Likelihood of a candidate Æ Based on Bhattacharyya coefficient
• Tracking algorithms
• Mean shift (MS)
• Particle filter (PF)

Likelihood
• Likelihood
• Color Æ RGB space Æ 3D color histograms

10x10x10 Bhattacharya
histogram (h) distance (d)

2
 d ( h , href )
− 
σ
p (C | X t ) = e  

6
Mean shift: description

• Mean shift
• Deterministic non-parametric approach
• Iterative procedure
• Kernel-based
• Gradient-based approach
• If the distance function is smooth (kernel) Æ effective

Previous frame position

Mean shift: example

7
Particle filter: description

• State x k = f k (x k −1 , u k )
• Observation z k = h k (x k , n k )
• Objective
• to estimate unknown state x k
based on a sequence of observations z k , k = 0,1,K
• find the posterior distribution N
p (x k | z1:k ) ≈ ∑ wki δ(x k − x ik )
i =1

• Solution (Bayesian)
• Prediction step
• Based on state equation
• Update step
• Based on likelihood function

State transition model

• Typically
• Zero-order model x k = x k −1 + u k
• Limitation: random positioning of the particles

• First-order model x k = x k −1 + θ k −1 + u k
• Limitation: high manoeuvring targets

• Adaptive state transition model


• Zero order model with adaptive noise variances x k = x k −1 + C k u k

1 k −1
Ck ∝ ∑ xt − xt −1
n t = k − n −1
average state velocity in the previous n frames

8
Re-sampling
Posterior
• Problem
• weight
degeneration
x

• Solution
• re-sampling Re-sampling
(eliminates
particles
with
small weights)
Re-sampling

Particle filter: example

9
Hybrid tracker

re-sample particles x k = x k −1 + C k u k
Adaptive state transition model
Zero-order model with adaptive
apply state transition noise variances

1 k −1
MS for each particle Ck ∝ ∑ xt − xt −1
n t = k − n −1

re-weighting Average state velocity in the


previous n frames

E[.]

N
The operator Mean Shift acts on the p (x k | z1:k ) ≈ ∑ wki δ(x k − MS (x ik ))
position 2D state space only
i =1

Hybrid tracker

• Advantages
E[x]

- After MS Æ each particle is near


a local maximum of the filtered
Weighting posterior (position 2D sub-space)

- The efficiency of the particles is


Mean Shift increased

- Multi-modality of the posterior is


maintained

- Extra computation is
compensated by less particles

10
Results

• Initialisation
• Ground-truth initialisation of the target

• Parameters
• Histograms: 10x10x10 (RGB)
• MS: 5 times with different kernel sizes (+/- 10%)
• PF, HT: 3D state model (to compare with MS): position; target size
• Transition model σx = σy = 14; σh = 0.013; ks = 5; kp= 10
• PF: 150 samples; HT: 30 samples

• Presentation of results
• Videos
• Sample frames & objective measure

Evaluation

• Subjective evaluation
• Side-by-side visual comparison of tracking results

• Objective evaluation predicted target


• Deviation from the ground-truth
• APE: average position error (pe)
• ASE: average size error pe

ASE = W + H 2 2

W: width error
H: height error
ground truth

11
Results: highway

MS PF

Proposed

Evaluation: highway
MS PF Proposed
APE 0.95 12.8* 0.88
ASE 2.74 22.3* 3.58

MS

PF

Proposed

12
Results: soccer

MS PF

Proposed

Evaluation: soccer
MS PF Proposed
APE 242* 3.9 3.2
ASE 18.2* 10.8 9.8

MS

PF

Proposed

13
Results: table tennis

MS PF

Proposed

Evaluation: table tennis


MS PF Proposed
APE 43.2* 24.1* 2.0
ASE 6.7* 3.3* 2.8

MS

PF

Proposed

14
Results: emilio

MS PF

Proposed

Single vs. multiple target tracking

• Single target tracking


• Hybrid mean shift / particle filter tracker
• faster and more accurate than particle filter
• more reliable than mean shift with fast targets
• Adaptive transition model
• Deal with highly manoeuvring targets
• Cope with camera motion

• How about multiple targets?


• Need to consider target ‘interactions’
• NP problem
• Complexity grows exponentially with n. of targets (PF)

15
Multiple object tracking

• Graph matching using weighted features


• Data association verified throughout several frames
to validate the correctness of the tracks
• Support track recovery in occlusion scenarios
• Features
• centre of mass
• velocity
• bounding box
• colour
appearance
position
. .
X = [x, y, x , y w, h, H]

g ( X ia , X bj ) = α . g1 ( X ia , X bj ) + β . g 2 ( X ia , X bj ) + γ . g 3 ( X ia , X bj ) + δ . g4 ( X ia , X bj ) − ( j − i − 1).τ

velocity size

Graph matching: full graph

v(x11) v(x13)

v(x12)

v(x21) v(x23)

v(x22)

v(x31) v(x33)

v(x32)

v(x41) v(x43)

V1 V2 V3

16
Graph matching: max path cover

v(x11) v(x13)

v(x12)

v(x21) v(x23)

v(x22)

v(x31) v(x33)

v(x32)

v(x41) v(x43)

V1 V2 V3

Detection vs. tracking

• Detection
• Usually frame-based
• Can be improved with temporal features (e.g., pedestrians)
• Trained classifier
• Choice of training set
• Choice of negative examples
• Choice of poses covered in the training set
• Tracking
• Propagates the initialisation information
• Model: template, statistical representation, parts, …
• Should update the model
• Should self-initialise
Æ Integration!

17
Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection

Integration of detection and tracking

• Problem
• Detecting objects (e.g., faces) in clutter
• Tracking multiple object (e.g., faces) under occlusions

Æ Integration of Adaboost face detector and Bayesian tracker

18
Face classifier

• Approach
• Cascade of classifiers
• Integral image
• Training
• Set of scales

• Output
Haar features
• Few false
for face detection
negatives
• Many false positives …

Æ Need additional evidence


Æ Fusion of color analysis (chromaticity segmentation) and face classification

Filtering through chromaticity segmentation

face detector
only

face detector
with
chromaticity
segmentation

19
Detection and tracking

• Use particle filtering to track between detections


• Initialization
• detection away from current particles Æ candidate track
• candidate track Æ activated after successive detections (confidence)
• Filtering
• if two tracks overlap Æ keep that with highest confidence score
• number of tracked frames
• frequency of detections
• Termination
• segmentation cue (skin)
• detection cue (classifier)
• size cue (ratio and area)

Detection and tracking

Removal of
overlapping tracks

20
Particle (temporal) filtering for face tracking

• Particle filtering integrated with face detector


• Link candidates from prediction (particles) with candidates from
detection Æ connected detection (CD)

• Particle spread (temporal prediction)


• If no CD Æ zero-order motion model
• If CD Æ particles are partially spread in the detection area

• Object model (color histogram)


• If no CD Æ no update
• If CD Æ partially update (e.g., by 25%)

Integrated detection and tracking


without particles without model update
around detections

with integration

21
Detection and tracking

particle spread
around detections

colour model
update

Face detection and tracking

22
Automatic tracking with a PTZ camera

Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection

23
Multi-modal data fusion using particle filter (PF)
X t = {x, y, width, height}

reverberation filtering
onset multi-band
Audio p( D | X ) GCCF p( A | X t )
detection analysis
t

p (C | X t )
color
feature
histogram X PF
change
Video
detection
motion multivariate
p( M | X t )
feature gaussian

• Overall likelihood
p (O / X t ) = p ( M / X t ) p (C / X t ) p ( A / X t )

Audio likelihood
speaker
• Time delay of arrival (TDOA) noise
s1 (t ) = v(t ) + n1 (t )
s2 (t ) = λv(t + τ ) + n2 (t ) s2

attenuation
dSinθ
• Reverberation filtering θ s1
θ
• Onset detection based on precedence effect d
M2 M1
• Multi-band analysis
Filter

s1 (t )
GCC-PHAT ω1
s2 (t ) Onset
detection
GCC-PHAT ω2 ∑
Rˆ s1s2 ( f )
GCC-PHAT ω3


(ςˆ A(Rˆ s1s2 )− xt )2
1 2 σ 2A
p(A|X t ) = e
σ A 2π

24
Comparison

Audio only Video only

Audio Visual

Results – speaker detection

50 313 425

25
Results – scene dynamics for teleconferencing

Original video Abstract representation

Outline
• Introduction
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection

26
Event detection

Contextual information

• Scene modelling
• Gaussian for each area of interest
• outside zone Æ modelled with multiple Gaussians

50 50

100 100

150 150
outside_zone
200 200
opens
go_down_stairs
250 250 outside_zone
go_up_stairs
300 300 enters_zone
inside_zone
enters_zone
350 outside_zone 350
inside_zone
400 400

450 450
outside_zone
500 500 outside_zone
550 550

100 200 300 400 500 600 700 100 200 300 400 500 600 700

Building entrance - Camera 1 Airport - Camera 4

27
Object information
• Object detection and tracking
• Observations O
• Model parameters
• States ω = {ω1, ω2, ω3, … ωN}
• Probability
• State transition probabilities A = {aij}
• Emission probabilities B = {bjO}
• Initial state ω(0)

Results

28
Results

Summary
• Tracking algorithms
• Mean-shift
• Particle filtering
• Graph-matching
• Integration of detection and tracking
• Integration of audio and video
• Event detection Acknowledgements

Emilio Maggio
Murtaza Taj
Matteo Bregonzio
Huiyu Zhou
Stefan Karlsson

http://www.elec.qmul.ac.uk/staffinfo/andrea

29
EU FP7 project APIDIS (2008 – 2010)

30

You might also like