Object Recognition: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab

Lecture:
Object Recognition
Juan Carlos Niebles and Ranjay Krishna

Stanford Vision and Learning Lab
Stanford University Lecture 14 1 31-Oct-17

-
What we will learn today?
• Introduction to object recognition
• K-nearest neighbor algorithm
• A simple Object Recognition pipeline

-
What are the different visual recognition tasks?

-
Classification:
Does this image contain a building? [yes/no]
Yes!

-
Classification:
Is this an beach?

-
Image search

-
Image Search
Organizing photo collections

-
Detection:
Does this image contain a car? [where?]
car

-
Detection:
Which object does this image contain? [where?]
Building
clock
person
car

-
Detection:
Accurate localization (segmentation)
clock

-
Detection: Estimating object semantic &
geometric attributes
Object: Building, 45º pose,

8-10 meters away
It has bricks
Object: Person, back;

1-2 meters away
Object: Police car, side view, 4-5 m away

-
Categorization vs Single instance
recognition
Does this image contain the Chicago Macy’s building?

-
Categorization vs Single instance
recognition
Where is the crunchy nut?

-
Applications of computer vision
•Recognizing landmarks in
mobile platforms
+ GPS

-
Activity or Event recognition
What are these people doing?

-
Visual Recognition
• Design algorithms that have the
capability to:
– Classify images or videos
– Detect and localize objects
– Estimate semantic and geometrical
attributes
– Classify human activities and events
Why is this challenging?

-
How many object categories are there?

-
Challenges: viewpoint variation
Michelangelo 1475-1564

-
Challenges: illumination
image credit: J. Koenderink

-
Challenges: scale

-
Challenges: deformation

-
Challenges:
occlusion
Magritte, 1957

-
Art Segway - Magritte

-
Challenges: background clutter
Kilmeny Niland. 1995

-
Challenges: intra-class variation

-
• Introduction

-
The machine learning framework
y = f(x)
output prediction Image
function feature
• Training: given a training set of labeled examples {(x1,y1), …,

(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)

Slide credit: L. Lazebnik
-
Classification
• Assign input vector to one of two or more classes
• Any decision rule divides input space into
decision regions separated by decision
boundaries

-
Nearest Neighbor Classifier
• Assign label of nearest training data point
to each test data point
Compute
Distance
Test
image
Training
images Choose k of the
nearest records
Source: N. Goyal

-
Nearest Neighbor Classifier
• Assign label of nearest training data point
to each test data point
from Duda et al.
partitioning of feature space

for two-category 2D and 3D data
Source: D. Lowe

-
K-nearest neighbor
Distance measure - Euclidean x
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
Where Xn and Xm are the n-th and m-th data points o o
o
o
x2
x1

-
1-nearest neighbor
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
o
o
x2
x1

-
3-nearest neighbor
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
o
o
x2
x1

-
5-nearest neighbor
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
o
o
x2
x1

-
K-NN: a very useful algorithm
• Simple, a good one to try first

• Very flexible decision boundaries
• With infinite examples, 1-NN provably has
error that is at most twice Bayes optimal error
(out of scope for this class).

-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes

-
classes
K=1 K=15

-
classes
– Solution: cross validate!

-
Cross validation

-
classes
• Can produce counter-intuitive results (using
Euclidean measure)

-
Euclidean measure
111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142

-
classes
Euclidean measure)
– Solution: normalize the vectors to unit length

-
classes
Euclidean measure)
• Curse of Dimensionality

-
Curse of dimensionality
• Assume 5000 points uniformly distributed in the unit
hypercube and we want to apply 5-NN. Suppose our query
point is at the origin.
– In 1-dimension, we must go a distance of 5/5000=0.001 on the
average to capture 5 nearest neighbors.
– In 2 dimensions, we must go 0.001 to get a square that contains 0.001
of the volume.
1/d
– In d dimensions, we must go ( 0.001)

-
classes
Euclidean measure)
• Curse of Dimensionality
– Solution: no good one

-
Many classifiers to choose from
• K-nearest neighbor Which is the best one?
• SVM
• Neural networks
• Naïve Bayes
• Bayesian network
• Logistic regression
• Randomized Forests
• Boosted Decision Trees
• RBMs
• Etc.
Slide credit: D. Hoiem

-
Generalization
Training set (labels known) Test set (labels

unknown)
• How well does a learned model generalize from the

data it was trained on to a new test set? Slide credit: L. Lazebnik

-
Bias-Variance Trade-off
• Models with too few
parameters are
inaccurate because of a
large bias (not enough
flexibility).
• Models with too many

parameters are
inaccurate because of a
large variance (too much
sensitivity to the sample).

-
Bias versus variance
• Components of generalization error
– Bias: how much the average model over all training sets differ
from the true model?
• Error due to inaccurate assumptions/simplifications made by
the model
– Variance: how much models estimated from different training
sets differ from each other
• Underfitting: model is too simple to represent all the
relevant class characteristics
– High bias and low variance
– High training error and high test error
• Overfitting: model is too complex and fits irrelevant
characteristics (noise) in the data
– Low bias and high variance
– Low training error and high test error

-
Bias versus variance trade off

-
No Free Lunch Theorem
In a supervised learning setting, we can’t tell

which classifier will have best generalization

-
Remember…
• No classifier is inherently
better than any other: you
need to make assumptions to
generalize
• Three kinds of error

– Inherent: unavoidable
– Bias: due to over-simplifications
– Variance: due to inability to
perfectly estimate parameters
from limited data

-
How to reduce variance?
• Choose a simpler classifier
• Regularize the parameters
• Get more training data
How do you reduce bias?

-
Last remarks about applying machine
learning methods to object recognition
• There are machine learning algorithms to choose from
• Know your data:
– How much supervision do you have?
– How many training examples can you afford?
– How noisy?
• Know your goal (i.e. task):
– Affects your choices of representation
– Affects your choices of learning algorithms
– Affects your choices of evaluation metrics
• Understand the math behind each machine learning
algorithm under consideration!

-
• Introduction

-
Object recognition:
a classification framework
• Apply a prediction function to a feature representation of
the image to get the desired output:
f( ) = apple
f( ) = tomato
f( ) = cow
Dataset: ETH-80, by B. Leibe Slide credit: L. Lazebnik

-
A simple pipeline - Training
Training
Images
Image
Features

-
Training
Training Labels
Images
Image
Training
Features

-
Training
Training Labels
Images
Image Learned
Training
Features Classifier

-
Training
Training Labels
Images
Image Learned
Training
Features Classifier
Test Image
Image
Features

-
Training
Training Labels
Images
Image Learned
Training
Features Classifier
Test Image
Image Learned
Prediction
Features Classifier

-
Training
Training Labels
Images
Image Learned
Training
Features Classifier
Test Image
Image Learned
Prediction
Features Classifier

-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation
? Occlusion

-
Image features
Input image
? Translation
? Scale
? Rotation (in-planar)
? Occlusion

-
Image features
Input image
? Translation
? Scale
? Occlusion
Global shape: PCA space Invariance?

? Translation
? Scale
? Occlusion

-
Image features
Input image
? Translation
? Scale
? Occlusion

? Translation
? Scale
? Occlusion

-
Image features
Input image
? Translation
? Scale
? Rotation
? Occlusion

? Translation
? Scale
? Rotation
? Occlusion
Local shape: shape context Invariance?

? Translation
? Scale
? Occlusion

-
Image features
Input image
? Translation
? Scale
? Rotation
? Occlusion

? Translation
? Scale
? Rotation
? Occlusion
Local shape: shape context Invariance?

? Translation
? Scale
? Occlusion

-
Image features
Input image
? Translation
? Scale
? Rotation
? Occlusion

? Translation
? Scale
? Rotation
? Occlusion
Local shape: shape context Invariance? Texture: Filter banks Invariance?

? Translation ? Translation
? Scale ? Scale
? Rotation (in-planar) ? Rotation (in-planar)
? Occlusion ? Occlusion

-
Image features
Input image
? Translation
? Scale
? Rotation
? Occlusion

? Translation
? Scale
? Rotation
? Occlusion
Local shape: shape context Invariance? Texture: Filter banks Invariance?

? Translation ? Translation
? Scale ? Scale
? Rotation (in-planar) ? Rotation (in-planar)
? Occlusion ? Occlusion

-
Training
Training Labels
Images
Image Learned
Training
Features Classifier
Test Image
Image Learned
Prediction
Features Classifier

-
Classifiers: Nearest neighbor
Training
Training examples
examples from class 2
from class 1

-
Training
Training Labels
Images
Image Learned
Training
Features Classifier
Test Image
Image Learned
Prediction
Features Classifier

-
Classifiers: Nearest neighbor
Test Training
Training examples
examples example
from class 2
from class 1

-
Results
Dataset: ETH-80, by B. Leibe, 2003

-
What we have learned today?
• Introduction

-

Object Recognition: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Object Recognition: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab

Uploaded by

Copyright:

Available Formats

Lecture:

Juan Carlos Niebles and Ranjay Krishna

Stanford University Lecture 14 1 31-Oct-17

Stanford University Lecture 14 2 31-Oct-17

Stanford University Lecture 14 3 31-Oct-17

Stanford University Lecture 14 4 31-Oct-17

Stanford University Lecture 14 5 31-Oct-17

Stanford University Lecture 14 6 31-Oct-17

Organizing photo collections

Stanford University Lecture 14 7 31-Oct-17

Stanford University Lecture 14 8 31-Oct-17

Stanford University Lecture 14 9 31-Oct-17

Stanford University Lecture 14 10 31-Oct-17

Object: Building, 45º pose,

Object: Person, back;

Object: Police car, side view, 4-5 m away

Stanford University Lecture 14 13 31-Oct-17

Stanford University Lecture 14 14 31-Oct-17

Stanford University Lecture 14 15 31-Oct-17

Stanford University Lecture 14 16 31-Oct-17

Why is this challenging?

Stanford University Lecture 14 18 31-Oct-17

Stanford University Lecture 14 19 31-Oct-17

image credit: J. Koenderink

Stanford University Lecture 14 20 31-Oct-17

Stanford University Lecture 14 21 31-Oct-17

Stanford University Lecture 14 22 31-Oct-17

Stanford University Lecture 14 23 31-Oct-17

Stanford University Lecture 14 24 31-Oct-17

Kilmeny Niland. 1995

Stanford University Lecture 14 25 31-Oct-17

Stanford University Lecture 14 26 31-Oct-17

Stanford University Lecture 14 27 31-Oct-17

• Training: given a training set of labeled examples {(x1,y1), …,

Stanford University Lecture 14 28 31-Oct-17

Slide credit: L. Lazebnik

Stanford University Lecture 14 29 31-Oct-17

Stanford University Lecture 14 30 31-Oct-17

from Duda et al.

partitioning of feature space

Stanford University Lecture 14 31 31-Oct-17

Stanford University Lecture 14 32 31-Oct-17

Stanford University Lecture 14 33 31-Oct-17

Stanford University Lecture 14 34 31-Oct-17

Stanford University Lecture 14 35 31-Oct-17

• Simple, a good one to try first

Stanford University Lecture 14 36 31-Oct-17

Stanford University Lecture 14 37 31-Oct-17

Stanford University Lecture 14 38 31-Oct-17

Stanford University Lecture 14 39 31-Oct-17

Stanford University Lecture 14 40 31-Oct-17

Stanford University Lecture 14 41 31-Oct-17

Stanford University Lecture 14 42 31-Oct-17

Stanford University Lecture 14 43 31-Oct-17

Stanford University Lecture 14 44 31-Oct-17

Stanford University Lecture 14 45 31-Oct-17

Stanford University Lecture 14 46 31-Oct-17

Stanford University Lecture 14 47 31-Oct-17

Training set (labels known) Test set (labels

• How well does a learned model generalize from the

Stanford University Lecture 14 48 31-Oct-17

• Models with too many