DPM

Object Detection with Discriminatively
Trained Part Based Model

P.F. Felzenszwalb, R.B. Girshick, D. McAllester and D. Ramanan
PAMI 2010
Presented by : Philippe Weinzaepfel

1st February 2013
Introduction Object localization
Object localization
Goal
Detect and localize objects from generic categories in static images
Training: bounding boxes around objects
Challenges
Illumination changes
Viewpoint
Intraclass variability
Non-rigid deformation
Philippe Weinzaepfel Latent SVM for Object Detection 1st Feburary 2013 2 / 28
Introduction Part-based model
Part-based model
A collection of parts arranged in a deformable configuration
Part locations are not known: latent variables
In this paper: star model (1 root + multiple parts)
Parts filter at twice resolution of the root filter
Score of the detection:
X
score(model, x) = score(root, x) + max[score(p, y) cost(p, x, y)]
y
pparts
Introduction Part-based model
Part-based model with latent variables
Simple model
Score:
f (x) = .(x)
Part-based model
Score:
f (x) = max .(x, z)
xZ(x)
: model parameters
z: specification of object configuration
Latent SVM to learn and data-minng for hard negative examples
Introduction Mixture of part-based models
Mixture of part-based models
Score: max over components

Include the component in z
Introduction Outline
Outline
1 Model
2 Latent SVM
3 Training Models
4 Features
5 Post-processing
6 Some experimental results
Model Feature pyramid
Outline
3 Training Models
1 Model
Feature pyramid 4 Features
Deformable part models
Detection process 5 Post-processing
Mixture of models
2 Latent SVM
Model Feature pyramid
Linear filter of features
Each part: linear filter of feature map

Build a feature pyramid H
5 levels per octave at training, 10 at testing
Weight vector F
Position p = (x, y, l)
Score of F at p: F 0 .(H, p)
Model Deformable part models
Outline
3 Training Models
1 Model
Mixture of models
2 Latent SVM
A model
A root filter F0
n parts Pi
A filter Fi
An anchor vi
A deformation cost di
A bias term b (to compare models in mixtures)
An object hypothesis
Position of root and parts z = (p0 , ..., pn )
pi = (xi , yi , li )
Hypothesis: parts at twice the resolution of root
Score of an hypothesis
Pn 0
Pn
score(p0 , ..., pn ) = i=0 Fi .(H, pi ) i=1 di .d (dxi , dyi ) +b
(dxi , dyi ) = (xi , yi ) (2(x0 , y0 ) + vi )
d (dx, dy) = (dx, dy, dx 2 , dy 2 )
Score of an hypothesis
Pn 0
Pn
score(p0 , ..., pn ) = i=0 Fi .(H, pi ) i=1 di .d (dxi , dyi ) +b
(dxi , dyi ) = (xi , yi ) (2(x0 , y0 ) + vi )
d (dx, dy) = (dx, dy, dx 2 , dy 2 )
Link to Latent SVM

The score can be written as .(H, z) with:
= (F00 , ..., Fn0 , d1 , ..., dn , b)
(H, z) = ((H, p0 ), ..., (H, pn ), d (dx1 , dy1 ), ..., d (dxn , dyn ), 1)
Model Detection process
Outline
3 Training Models
1 Model
Mixture of models
2 Latent SVM
Model Detection process
Model Mixture of models
Outline
3 Training Models
1 Model
Mixture of models
2 Latent SVM
Model Mixture of models
Mixture of models
Mixture M of m models
M = (M1 , ..., Mm )
z = (c, p0 , ..., pnc ) = (c, z 0 )
score(z) = c .(H, z 0 )
Score as .(H, z)
= (1 , ..., m )
(H, z) = (0, ..., 0, (H, z 0 ), 0, ..., 0)
Latent SVM Problem
Outline
3 Training Models
1 Model
4 Features
2 Latent SVM
Problem 5 Post-processing
Semi-convexity
Optimization 6 Some experimental results
Data-mining hard examples
Latent SVM Problem
Problem
Classifier that score an example x with:
f (x) = max .(x, z)

zZ(x)
Z(x): set of possible latent values for x

As for SVM, we learn by minimizing
n
1 X
LD () = kk2 + C max 0, 1 yi f (xi )
2
i=1

with D = hx1 , y1 i, ..., hxn , yn i a set of labeled examples
Latent SVM Semi-convexity
Outline
3 Training Models
1 Model
4 Features
2 Latent SVM
Semi-convexity
Semi-convexity
n
1 X
LD () = kk2 + C max 0, 1 yi f (xi )
2
i=1
A latent SVM is semi-convex
The loss function is convex in for negative examples.
Semi-convexity
n
1 X
LD () = kk2 + C max 0, 1 yi f (xi )
2
i=1
SVM
Convexity:
f linear in
Hinge loss is convex as
maximum of two convex
functions
Semi-convexity
n
1 X
LD () = kk2 + C max 0, 1 yi f (xi )
2
i=1
LSVM
SVM
Convexity: f is convex as maximum of convex

functions
f linear in
For negative examples, the loss function
Hinge loss is convex as is convex
functions
Semi-convexity
n
1 X
LD () = kk2 + C max 0, 1 yi f (xi )
2
i=1
LD () is convex when latent variables are specified for positive
examples.
LSVM
SVM
Convexity: f is convex as maximum of convex

functions
f linear in
For negative examples, the loss function
Hinge loss is convex as is convex
functions If there is a single possible latent value
for each positive example: f is linear
Latent SVM Optimization
Outline
3 Training Models
1 Model
4 Features
2 Latent SVM
Semi-convexity
Optimization
Zp specifying a latent value for each positive example

D(Zp ) derived from D by restricting latent values for positive examples
Optimization

LD () = minZp LD (, Zp ) = minZp LD(Zp ) ()
LD () 6 LD (, Zp )
Optimization

LD () = minZp LD (, Zp ) = minZp LD(Zp ) ()
LD () 6 LD (, Zp )
Algorithm
Relabel positive examples: Optimize LD (, Zp ) over Zp , i.e. select
zi = argmax .(xi , z)
zZ(xi )
.
Optimize : Stochastic gradient descent
Stochastic gradient descent
Gradient descent
n
1 2
X
LD () = kk + C max 0, 1 yi f (xi )
2
i=1
n
X
LD () = + C h(, xi , yi ) (subgradient)
i=1
(
0 if yi f (xi ) > 1
h(, xi , yi ) =
yi (xi , zi ()) otherwise
Issue
Too costly to go over data to compute the exact gradient.
Stochastic gradient descent
Approximation of the gradient

Pn
Approximate i=1 h(, xi , yi ) by nh(, xi , yi ) for one example hxi , yi i
Algorithm
Let t be the learning rate for iteration t
Let i be a random example
Let zi = argmaxzZ(xi ) .(xi , z)
Update = t i LD ()
Convergence
1
Learning rate t = t
Depends on the number of training examples
Latent SVM Data-mining hard examples
Outline
3 Training Models
1 Model
4 Features
2 Latent SVM
Semi-convexity
Recall: Data-mining hard examples in SVM

Many negative instances
Bootstrapping
Collect hard negatives as incorrectly classified examples from a previous
model.

Many negative instances
Bootstrapping
Collect hard negatives as incorrectly classified examples from a previous
model.
Hard and easy examples

H(, D) = {hx, yi D | yf (x) < 1}
E(, D) = {hx, yi D | yf (x) > 1}
(D) = argmin LD () (unique)

Goal
Find a subset C D such that (C) = (D).
Algorithm
t = (Ct ) (model training)
If H(t , D) Ct , stop and return t
Remove easy examples
Add hard examples
Algorithm
t = (Ct ) (model training)
If H(t , D) Ct , stop and return t
Remove easy examples
Add hard examples
Proof
Let C D. If H(, D) C, (C) = (D).
The algorithm terminates.
Data-mining hard examples in LSVM

Feature cache F : set of (i, v) with 1 6 i 6 n and v = (xi , z) with
z Z(xi ) (one for positive examples)
I(F ) examples indexedPby F
LF () = 12 kk2 + C iI(F ) max(0, 1 yi (max(i,v)F .v))

Modified stochastic gradient descent

Let i I(F ) be a random example indexed by F
Let vi = argmaxvV (i) .v
Update = t i LD ()

Modified stochastic gradient descent

Let i I(F ) be a random example indexed by F
Let vi = argmaxvV (i) .v
Update = t i LD ()
We would like to find a small F such that (F ) = (D(Zp ))

H(, D) = {(i, (xi , zi )) | zi = argmaxzZ(xi ) .(xi , z) and yi (.(xi , zi )) < 1}
E(, D) = {(i, v) F |yi (.v) > 1}
Same procedure
Training Models Learning parameters
Outline
Initialization
1 Model
4 Features
2 Latent SVM
5 Post-processing
3 Training Models
Learning parameters 6 Some experimental results
Training Models Learning parameters
Training Models Initialization
Outline
Initialization
1 Model
4 Features
2 Latent SVM
5 Post-processing
3 Training Models
Learning parameters 6 Some experimental results
Initialization
Phase 1: Initializing root filters

Split positive examples according to aspect ratio in m groups
Decide area of Fi according to aspect ratio and area
Train the root filter Fi with classical SVM
Initialization

Phase 2: Merging components
Train mixture of models with no part (latent: component and p0 )
Initialization
Phase 2: Merging components
Train mixture of models with no part (latent: component and p0 )
Phase 3: Initializing part filters
Initialize 6 rectangles to cover high energy of the root filter
Anchor at center or symmetric according to vertical axis
di = (0, 0, 1, 1)
Features HOG, PCA and analytic dimension reduction
Outline
4 Features
HOG, PCA and analytic
1 Model
dimension reduction
2 Latent SVM
5 Post-processing
3 Training Models
Features
36-dimensional HOG descriptors (9 orientations, 4 normalizations,
non-contrast sensitive)
11 main eigenvectors
Projection is costly
Features
36-dimensional HOG descriptors (9 orientations, 4 normalizations,
non-contrast sensitive)
11 main eigenvectors
Projection is costly
Similar results when using 9+4 basis vectors
(18+9)+4=31 for contrast and non-contrast sensitive
Post-processing Bounding box prediction
Outline
4 Features
1 Model
5 Post-processing
Bounding box prediction
2 Latent SVM
Non-Maxima Suppression
Contextual information
3 Training Models
Input z: position of each part +

root width
Outputs (x1, y1, x2, y2):
position of the prediction
bounding box
Input z: position of each part +

root width
Outputs (x1, y1, x2, y2):
position of the prediction
bounding box
How: for each component of a
mixture, 4 linear functions
learned by linear least-square
regression on training data
Post-processing Non-Maxima Suppression
Outline
4 Features
1 Model
5 Post-processing
2 Latent SVM
3 Training Models
Post-processing Non-Maxima Suppression
Issue
For one object, there are a lot of overlapping detections
Solution
Sort detections by score
Add the detection one by one and skip if there exists a detection with
overlap of at least 50%
Post-processing Contextual information
Outline
4 Features
1 Model
5 Post-processing
2 Latent SVM
3 Training Models
Post-processing Contextual information
Rescore the detection score

Best score for k other categories: c(I) = (s1 , s2 , ..., sk )
For a detection (B, s), classify g = ((s), xw1 , yh1 , xw2 , yh2 , c(I))
Learned from training data
Some experimental results Models
Outline
4 Features
1 Model
5 Post-processing
2 Latent SVM
Models
3 Training Models
Detections
Models
Models
Some experimental results Detections
Outline
4 Features
1 Model
5 Post-processing
2 Latent SVM
Models
3 Training Models
Detections
Detections
Detections
Conclusion
Conclusion
Recent extensions
Speeding-up detection using cascade classifier
Grammar models
Conclusion
Conclusion
Recent extensions
Speeding-up detection using cascade classifier
Grammar models
Thanks for your attention.
Any question ?

DPM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DPM

Uploaded by

Copyright:

Available Formats

Object Detection with Discriminatively

Trained Part Based Model

Presented by : Philippe Weinzaepfel

Part-based model with latent variables

Mixture of part-based models

Score: max over components

6 Some experimental results

Linear filter of features

Each part: linear filter of feature map

Deformable part models

Deformable part models

Deformable part models

Link to Latent SVM

Classifier that score an example x with:

f (x) = max .(x, z)

Z(x): set of possible latent values for x

Convexity: f is convex as maximum of convex

Convexity: f is convex as maximum of convex

Zp specifying a latent value for each positive example

Zp specifying a latent value for each positive example

Zp specifying a latent value for each positive example

Stochastic gradient descent

Stochastic gradient descent

Approximation of the gradient

Recall: Data-mining hard examples in SVM

Recall: Data-mining hard examples in SVM

Hard and easy examples

(D) = argmin LD () (unique)

Recall: Data-mining hard examples in SVM

Recall: Data-mining hard examples in SVM

Data-mining hard examples in LSVM

Data-mining hard examples in LSVM

Modified stochastic gradient descent

Data-mining hard examples in LSVM

Modified stochastic gradient descent

We would like to find a small F such that (F ) = (D(Zp ))

Phase 1: Initializing root filters

Phase 1: Initializing root filters

Bounding box prediction

Input z: position of each part +

Bounding box prediction

Input z: position of each part +

Rescore the detection score

Thanks for your attention.

You might also like