Professional Documents
Culture Documents
Vision
Vision
Vision in Robotics
Silvio Savarese
21-Jan-15
Everything is a sensor
Everything is a sensor
Everything is a sensor
night
thermal
Kinect
Computer vision
Computer vision studies the tools and theories that enable the design of machines
that can extract useful information from imagery data
(images and videos) toward the goal of interpreting the world
Extract
information
Interpretation
Sensing device
Computational
device
EosSystems
1990
2000
2010
12
Fingerprint biometrics
Augmentation with 3D
computer graphics
14
3D object prototyping
EosSystems
Photomodeler
15
Autostich
EosSystems
1990
2000
2010
16
Face detection
Face detection
Web applications
Photometria
19
Panoramic Photography
kolor
3D modeling of landmarks
21
Autostich
EosSystems
1990
2000
A9
Kooaba
2010
22
Google Goggles
24
25
Augmented reality
26
27
Automotive safety
Factory inspection
Assistive technologies
Surveillance
Security
Kinect
Google
Goggles
EosSystems
1990
Autostich
2000
2010
A9
Kooaba
30
3D
EosSystems
Google
Goggles
2D
1990
2000
2010
31
3D
EosSystems
Google
Goggles
2D
1990
2000
2010
32
Computer vision
2D Recognition
3D Reconstruction
3D shape recovery
3D scene reconstruction
Camera localization
Pose estimation
Object detection
Texture classification
Target tracking
Activity recognition
33
Camera systems
Establish a mapping from 3D to 2D
Pinhole camera
Pinhole perspective
projection
f
f = focal length
c = center of the camera
x y
( x, y, z) (f , f )
z z
E
Projective camera
R,T
jw
P
kw
Ow
iw
Oc
P
P M Pw
KR T Pw
Internal parameters
External parameters
f = focal length
uo, vo = offset
, non-square pixels
= skew angle
R,T = rotation, translation
Properties of Projection
Points project to points
Lines project to lines
Distant objects look smaller
Properties of Projection
Angles are not preserved
Parallel lines meet!
One-point perspective
Masaccio, Trinity,
Santa Maria
Novella, Florence,
1425-28
Calibration Problem
Calibration rig
jC
Calibration Problem
Calibration rig
image
jC
Calibration Problem
Calibration rig
image
jC
Calibration Procedure
Camera Calibration Toolbox for Matlab
J. Bouguet [1998-2000]
http://www.vision.caltech.edu/bouguetj/calib_doc/index.html#examples
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Ow
M K R
Ow
Scene
Calibration rig
C
Camera K
Why is it so difficult?
Intrinsic ambiguity of the mapping from 3D to image (2D)
X?
l
l'
x1
x2
K =known
K =known
R, T
O2
O1
Triangulation
Find X that minimizes
d ( x1 , M 1 X ) d ( x2 , M 2 X )
2
x1
O1
x2
O2
Stereo-view geometry
Correspondence: Given a point in one image,
how can I find the corresponding point x in
another one?
Camera geometry: Given corresponding points
in two images, find camera matrices, position
and pose.
Scene geometry: Find coordinates of 3D point
from its projection into 2 or multiple images.
Epipolar geometry
X
x1
x2
e1
e2
O2
O1
Epipolar Plane
Baseline
Epipolar Lines
Epipoles e1, e2
= intersections of baseline with image planes
= projections of the other camera center
e
e
e1
x1
O1
x2
e2
O2
Baseline intersects the image plane at infinity
Epipoles are at infinity
Epipolar lines are parallel to x axis
e at
infinit
y
infinit
y
Epipolar Constraint
p F p2 0
T
1
p1
p2
e1
O1
e2
O2
Why F is useful?
l = FT x
- Suppose F is known
- No additional information about the scene and camera is given
- Given a point on left image, how can I find the corresponding point on right image?
Why F is useful?
F captures information about the epipolar geometry of
2 views + camera parameters
MORE IMPORTANTLY: F gives constraints on how the
scene changes under view point transformation
(without reconstructing the scene!)
Powerful tool in:
3D reconstruction
Multi-view object/scene matching
x1j
Mm
M1
xmj
x2j
M2
j = 1, , n
x1j
Mm
M1
xmj
x2j
M2
M i K i R i
x j Mi X j
Ti
M j H 1
H Xj
x j M i X j M i H -1 H X j
2010.12.18
69
Affine ambiguity
2010.12.18
70
Prospective ambiguity
2010.12.18
71
Self-calibration
Prior knowledge on cameras or scene can be used to add
constraints and remove ambiguities
Obtain metric reconstruction (up to scale)
Condition
N. Views
Bundle adjustment
Non-linear method for refining structure and motion
Minimizing re-projection error
It can be used before or after metric upgrade
Xj
i 1 j1
M1Xj
x3j
x1j
P1
M2Xj
x2j
M3Xj
P3
P2
E ( M, X) Dx ij , M i X j
m
Bundle adjustment
Non-linear method for refining structure and motion
Minimizing re-projection error
It can be used before or after metric upgrade
Advantages
E ( M, X) Dx ij , M i X j
m
i 1 j1
Limitations
Large minimization problem (parameters grow with number of views)
Requires good initial condition
74
2010.12.18
Levoy et al., 00
Hartley & Zisserman, 00
Dellaert et al., 00
Rusinkiewic et al., 02
Nistr, 04
Brown & Lowe, 04
Schindler et al, 04
Lourakis & Argyros, 04
Colombo et al. 05
Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM
Transactions on Graphics (SIGGRAPH Proceedings),2006,
Computer vision
2D Recognition
3D Reconstruction
3D shape recovery
3D scene reconstruction
Camera localization
Pose estimation
Object detection
Texture classification
Target tracking
Activity recognition
78
Classification:
Does this image contain a building? [yes/no]
Yes!
Classification:
Is this an beach?
Image Search
Detection:
Does this image contain a car? [where?]
car
Detection:
Which object does this image contain? [where?]
Building
clock
person
car
Detection:
Accurate localization (segmentation)
clock
Assistive technologies
Surveillance
Computational photography
Security
Assistive driving
+ GPS
Visual Recognition
Design algorithms that are capable to
Classify images or videos
Detect and localize objects
Estimate semantic and geometrical
attributes
Classify human activities and events
Michelangelo 1475-1564
Challenges: illumination
Challenges: scale
Challenges: deformation
Challenges:
occlusion
Magritte, 1957
Basic properties
Representation
How to represent an object category; which
classification scheme?
Learning
How to learn the classifier, given training data
Recognition
How the classifier is to be used on novel data
Representation
Interest operators
Dense, uniformly
Randomly
Representation
- Building blocks: Choice of descriptors
[SIFT, HOG, codewords.]
Representation
Appearance only or location and appearance
Representation
Invariances
View point
Illumination
Occlusion
Scale
Deformation
Clutter
etc.
Representation
To handle intra-class variability, it is convenient to
describe an object categories using probabilistic
models
Object models: Generative vs Discriminative vs
hybrid
Object categorization:
the statistical viewpoint
p ( zebra | image)
vs.
p (no zebra|image)
Bayes rule:
p ( B|A) p ( A)
p ( A|B )
p( B)
p ( zebra | image)
p (image | zebra )
p ( zebra )
Object categorization:
the statistical viewpoint
p ( zebra | image)
vs.
p (no zebra|image)
Bayes rule:
p ( B|A) p ( A)
p ( A|B )
p( B)
p ( zebra | image)
p (image | zebra )
p ( zebra )
likelihood ratio
prior ratio
Object categorization:
the statistical viewpoint
Discriminative methods model posterior
Generative methods model likelihood and
prior
Bayes rule:
p ( zebra | image)
p (image | zebra )
p ( zebra )
likelihood ratio
prior ratio
Discriminative models
Neural networks
Nearest neighbor
106 examples
Latent SVM
Structural SVM
Felzenszwalb 00
Ramanan 03
Boosting
Generative models
Basic properties
Representation
How to represent an object category; which
classification scheme?
Learning
How to learn the classifier, given training data
Recognition
How the classifier is to be used on novel data
Learning
Learning parameters: What are you
maximizing? Likelihood (Gen.) or
performances on train/validation set (Disc.)
Learning
Learning parameters: What are you
maximizing? Likelihood (Gen.) or
performances on train/validation set (Disc.)
Level of supervision
Manual segmentation; bounding box; image labels;
noisy labels
Batch/incremental
Priors
Learning
Learning parameters: What are you
maximizing? Likelihood (Gen.) or
performances on train/validation set (Disc.)
Level of supervision
Manual segmentation; bounding box; image labels;
noisy labels
Batch/incremental
Priors
Training images:
Issue of overfitting
Negative images for
discriminative methods
Basic properties
Representation
How to represent an object category; which
classification scheme?
Learning
How to learn the classifier, given training data
Recognition
How the classifier is to be used on novel data
Recognition
Recognition task: classification, detection, etc..
Recognition
Recognition task
Search strategy: Sliding Windows
Simple
Computational complexity (x,y, S, , N of classes)
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Recognition
Recognition task
Search strategy: Sliding Windows
Simple
Computational complexity (x,y, S, , N of classes)
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Localization
Objects are not boxes
Segmentation
Bottom up segmentation
Malik et al. 01
Maire et al. 08
Felzenszwalb and Huttenlocher, 2004
Semantic segmentation
Duygulu et al. 02
Recognition
Recognition task
Search strategy: Sliding Windows
Simple
Computational complexity (x,y, S, , N of classes)
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Localization
Objects are not boxes
Prone to false positive
Non max suppression:
Canny 86
.
Desai et al , 2009
Recognition
Recognition task
Search strategy : Probabilistic heat maps
Fergus et al 03
Leibe et al 04
Original
Recognition
Recognition task
Search strategy :
Hypothesis generation + verification
Recognition
Recognition task
Search strategy
Attributes
- It has metal
- it is glossy
- has wheels
Farhadi et al 09
Lampert et al 09
Wang & Forsyth 09
Savarese, 2007
Sun et al 2009
Liebelt et al., 08, 10
Farhadi et al 09
Category: car
Azimuth = 225
Zenith = 30
Recognizing 3D objects
Xiang & Savarese, 2012-2014
BED
CHAIR
CAR
TABLE
Recognition
Recognition task
Search strategy
Attributes
Context
Semantic:
Torralba et al 03
Rabinovich et al 07
Gupta & Davis 08
Heitz & Koller 08
L-J Li et al 08
Bang & Fei-Fei 10
Geometric
Hoiem, et al 06
Gould et al 09
Bao, Sun, Savarese 10
Recognition in context
Recognition in context
Recognition
Recognition task
Search strategy
Attributes
Context
Tracking
State-of-the-art
Object Tracking
131
132
2D Recognition
3D Reconstruction
3D shape recovery
3D scene reconstruction
Camera localization
Pose estimation
Object detection
Texture classification
Target tracking
Activity recognition
134
V1
what pathway
(ventral stream)
135
Pre-frontal
cortex
V1
what pathway
(ventral stream)
136
137
A 3DGP encodes geometric and semantic relationships between groups of objects and space
elements which frequently co-occur in spatially consistent configurations.
138
Training Dataset
3DGPs
139
3D Geometric Phrases
140
3D Geometric Phrases
141
Monocular cameras
Un-calibrated cameras
Arbitrary motion
Monocular cameras
Un-calibrated cameras
Arbitrary motion
Summary
3D physical
environment
Sensors
Objects
147