Professional Documents
Culture Documents
Introduction To Computer Vision
Introduction To Computer Vision
for Robotics
Jan-Michael Frahm
Overview
Camera model
Multi-view geometry
Camera pose estimation
Feature tracking & matching
Robust pose estimation
1
Homogeneous coordinates
M
Unified notation:
e3
include origin in affine basis
a3
e2
a2
Homogeneous
o Coordinates of M
a1 e1
a1
r r r r
e1 e2 e3 o a2
M=
0 0 0 1 a3
1
Affine basis matrix
2
Projective geometry
Projective space P2 is space of rays emerging from O
− view point O forms projection center for all rays
− rays v emerge from viewpoint into scene
− ray g is called projective point, defined as scaled v: g=lv
w R3
x
x′
v = y ′ v g (λ ) = y = λv , λ ∈ ℜ ≠ 0
1
w
y
O
x
x′ x
w R3 g (λ ) = λv = λ y ′ = y
1 w
w=1 x
Π (Ρ2) v w
x ′
v = y ′ = g =
y 1 y
w w
O 1
x 1
Introduction to Computer Vision for Robotics
3
Finite and infinite points
All rays g that are not parallel to Π intersect at an affine point v on Π.
The ray g(w=0) does not intersect Π. Hence v∞ is not an affine point but a
direction. Directions have the coordinates (x,y,z,0)T
Projective space combines affine space with infinite points (directions).
g
g1
P3
g2
w=1 v∞
Π (Ρ2) v v1 v2
x
g ∞ = y
O 0
4
Pinhole Camera (Camera obscura)
(France, 1830)
r
so
s en
e aperture
ag
Im
r
nte
Ce )
, cy View direction
(c x
object
Focal length f
image lens
5
Perspective projection
Perspective projection in P3 models pinhole camera:
− scene geometry is affine Ρ3 space with coordinates M=(X,Y,Z,1)T
− camera focal point in O=(0,0,0,1)T, camera viewing direction along Z
− image plane (x,y) in Π(Ρ2) aligned with (X,Y) at Z= Z0
− Scene point M projects onto point Mp on plane surface
X
Y
Z (Optical axis) M=
Z
Π3 1
y Image plane
Z0
Π (Ρ2) x Mp x p ZZ0 X X
y Z0Y
Z Y
M p = p = ZZZ = 0
Y
O Z0 Z0 Z Z
X Z
Camera center 1 1 Z0
Introduction to Computer Vision for Robotics
Projective Transformation
Projective Transformation maps M onto Mp in Ρ3 space
M
ρ xp X 1 0 0 0 X
y Y 0 1 0 0 Y
ρ M p = TpM ⇒ ρ p = =
Mp Z0 Z 0 0 1 0 Z
Y 0 0 z10
0 1
O
1 W
X
Z
ρ= = projective scale factor
Z0
6
Perspective Projection
xp 1 0 0 0 x
M x p 1 0 0 0 p
0
yy p = 0 1
1
0
0
0 y p
0
Zp 0 0 1 0 Z0
10 0 0 0 1
Π (Ρ2) mp 1 0 0 0 1 1
Y
O
X
Z (Optical axis)
Image center
Pixel coordinates c= (cx, cy)T Image sensor
m = (y,x)T
y
x Focal length Z0 Y Image-sensor mapping:
Pixel scale X m = Kmp
f= (fx,fy)T Projection center
Pixel coordinates are related to image coordinates by affine transformation K
with five parameters: fx s cx
− Image center c at optical axis K = 0 fy cy
− distance Zp (focal length) and Pixel size determines pixel resolution fx, fy 0 0 1
− image skew s to model angle between pixel rows and columns
7
Projection in general pose
R C
Tcam = T P Ro
t
0 1 at
io n[ Projection: ρ m = PM
R] p
mp
Projection center C
M
−1 RT −RTC
Tscene = T
cam = T
0 1
World coordinates
Projection matrix P
Camera projection matrix P combines:
− inverse affine transformation Tcam-1 from general pose to origin
− Perspective projection P0 to image plane at Z0 =1
− affine mapping K from image to sensor coordinates
R T −RT C
scene pose transformation: Tscene = T
0 1
1 0 0 0 fx s c x
projection: P0 = 0 1 0 0 = [I 0] sensor calibration: K = 0 fy cy
0 0 1 0 0 0 1
8
2-view geometry: F-Matrix
Projection onto two views:
⇒ ρ1m1 = ρ0H ∞ m0 + e1
ρ0M∞
Epipolar line
Z
M∞ H m
P0 ∞ 0
X X 0
m0 m1 Y Y 0
Y P1 M = = + = M∞ + O
e1 Z Z 0
O X
1 0 1
C1
Introduction to Computer Vision for Robotics
9
Estimation of F from correspondences
Given a set of corresponding points, solve linearily for the 9 elements of
F in projective coordinates
since the operator [e]x and hence F have rank 2, F has only 7
independent parameters (all epipolar lines intersect at e)
∑ (m
n =0
T
1,n Fm0,n )2 ⇒ min!
0 − ez ey
E = [e]x R with [e]x = ez 0 − ex
− e y ex 0
10
Estimation of P from E
From E we can obtain a camera projection matrix pair: E=Udiag(0,0,1)VT
3D Feature Reconstruction
• corresponding point pair (m0, m1) is projected from
3D feature point M
• M is reconstructed from by (m0, m1) triangulation
• M has minimum distance of intersection
2
d ⇒ min!
P0 m0 M
I0 d constraints:
I0T d = 0
I1 I1T d = 0
m1
F minimize reprojection error:
P1
1 ) ⇒ min.
(m0 − P0M )2 + (m1 − PM 2
11
Multi View Tracking
m0 M
P0
3D match
mi
m1
P1 2D match
Pi
N K
∑∑ m
2
Minimize lobal reprojection error: k ,i − Pi Mk ⇒ min!
i =0 k =0
reconstruction
SIFT-matcher KLT-tracker
12
SIFT-detector
SIFT-detector
• Image content is transformed into local feature coordinates
that are invariant to translation, rotation, scale, and other
imaging parameters
13
Difference of Gaussian for Scale
invariance
Difference of
Gaussian
Gaussian
14
Key point localization
• Detect maxima and minima of
difference-of-Gaussian in scale
space
• Fit a quadratic to surrounding
values for sub-pixel and sub-scale
interpolation (Brown & Lowe, 2002)
• Taylor expansion around point:
ulB
r
uS
brtac
Orientation normalization
0 2π
15
Example of keypoint detection
Threshold on value at DOG peak and on ratio of principle
curvatures (Harris approach)
courtesy Lowe
© Lowe
16
Sift feature detector
Select m samples
x
Compute n-parameter solution
x x
x x
x x Evaluate on {potential plane points}
x x
x x {inliersamples}
x x
x x x x Best solution so far?
yes no
no
Keep it
17
RANSAC: Evaluate Hypotheses
Evaluate cost function 2
c
ε
2 2 2 2
0≤λ ε ≤ λε
1+ c
c 1+ c ε
1+ c
≤ λ2ε 2 <
c
if 2λ ε (
c + c 2 − c 1 + λ2ε 2 )
else 1
x
x x
x x plane hypotheses
x x
x ε0 ε0 ε0 ε0
x
potential points
x ε1 ε1 ε1 ε1
x x x
x x x x
εn εn εn εn
E P
P0,P1 P
triangulate points
18
References
[Lowe 2004] David Lowe, Distinctive Image Features from Scale-
Invariant Keypoints, IJCV, 60(2), 2004, pp91-110
[Lucas & Kanade 1981] Bruce D. Lucas and Takeo Kanade. An
Iterative Image Registration Technique with an Application to Stereo
Vision. In Proceedings International Joint Conference on Artificial
Intelligence, 1981.
[Shi & Tomasi 1994] Jianbo Shi and Carlo Tomasi, Good Features to
Track, IEEE Conference on Computer Vision and Pattern
Recognition 1994
[Baker & Matthews 2004] S. Baker and I. Matthews, Lucas-Kanade 20
Years On: A Unifying Framework.International Journal of Computer
Vision, 56(3):221–255, March 2004.
[Fischler & Bolles] M.A. Fischler and R.C. Bolles, Random sample
consensus: A paradigm for model fitting with applications to image
analysis and automated cartography.CACM, 24(6), June’81.
References
[Mikolajczyk 2003], K.Mikolajczyk, C.Schmid. “A Performance
Evaluation of Local Descriptors”. CVPR 2003
[Lindeberg 1998], T. Lindeberg, "Feature detection with
automatic scale selection," International Journal of Computer
Vision, vol. 30, no. 2, 1998
[Hartley 2003] R. hartley and A. Zisserman, “Multiple View
Geometry in Computer Vision”, 2nd edition, Cambridge Press,
2003
19