Introduction To Computer Vision

Introduction to Computer Vision
for Robotics
Jan-Michael Frahm
Overview
Camera model
Multi-view geometry
Camera pose estimation
Feature tracking & matching
Robust pose estimation
Introduction to Computer Vision for Robotics
1
Homogeneous coordinates
M
Unified notation:
e3
include origin in affine basis
a3
e2
a2
Homogeneous
o Coordinates of M
a1 e1
 a1 
r r r r  
e1 e2 e3 o  a2 
M=
0 0 0 1 a3 
 
1
Affine basis matrix
Properties of affine transformation

Transformation Taffine combines linear mapping and
coordinate shift in homogeneous coordinates
− Linear mapping with A3x3 matrix
− coordinate shift with t3 translation vector
 a11 a12 a13 tx 
a a a23 t y 
 A3 x 3 t3 
Taffine =  21 22
M ′ = Taffine M =  M
0 0 0 1  a31 a32 a33 tz 
 
0 0 0 1
Parallelism is preserved
ratios of length, area, and volume are preserved
Transformations can be concatenated:
if M1 = T1M and M 2 = T2M1 ⇒ M 2 = T2T1M = T21M
2
Projective geometry
Projective space P2 is space of rays emerging from O
− view point O forms projection center for all rays
− rays v emerge from viewpoint into scene
− ray g is called projective point, defined as scaled v: g=lv
w R3
x
 x′
v =  y ′ v g (λ ) =  y  = λv , λ ∈ ℜ ≠ 0
 1 
w 
y
O
x
Projective and homogeneous points
Given: Plane Π in R2 embedded in P2 at coordinates w=1

− viewing ray g intersects plane at v (homogeneous coordinates)
− all points on ray g project onto the same homogeneous point v
− projection of g onto Π is defined by scaling v=g/l = g/w
 x′  x 
w R3 g (λ ) = λv = λ  y ′ =  y 
 1  w 
w=1 x
Π (Ρ2) v w 
 x ′  
v =  y ′ = g =  
y 1 y
w w 
O  1   
x 1
 
3
Finite and infinite points
All rays g that are not parallel to Π intersect at an affine point v on Π.
The ray g(w=0) does not intersect Π. Hence v∞ is not an affine point but a
direction. Directions have the coordinates (x,y,z,0)T
Projective space combines affine space with infinite points (directions).
g
g1
P3
g2
w=1 v∞
Π (Ρ2) v v1 v2
x
g ∞ =  y 
O  0 
Affine and projective transformations

Affine transformation leaves infinite points at infinity
 X ′  a11 a12 a13 tx   X 

 Y ′  a a a23 t y  Y 
M ∞' = TaffineM ∞ ⇒   =  21 22
 Z ′  a31 a32 a33 tz   Z 
    
0 0 0 0 1  0 
Projective transformations move infinite points into finite affine space
 xp   X ′  a11 a12 a13 tx   X 

y  Y ′  a a22 a23 t y  Y 
M ' = Tprojective M∞ ⇒  p
=λ   =λ  21
 zp   Z′   a31 a32 a33 tz   Z 

      
1 w ′  w 41 w 42 w 43 w 44   0 
Example: Parallel lines intersect at the horizon (line of infinite points).
We can see this intersection due to perspective projection!
4
Pinhole Camera (Camera obscura)
Interior of camera obscura
(Sunday Magazine, 1838)

Camera obscura
(France, 1830)
Pinhole camera model
r
so
s en
e aperture
ag
Im
r
nte
Ce )
, cy View direction
(c x
object
Focal length f
image lens
5
Perspective projection
Perspective projection in P3 models pinhole camera:
− scene geometry is affine Ρ3 space with coordinates M=(X,Y,Z,1)T
− camera focal point in O=(0,0,0,1)T, camera viewing direction along Z
− image plane (x,y) in Π(Ρ2) aligned with (X,Y) at Z= Z0
− Scene point M projects onto point Mp on plane surface
X 
Y 
Z (Optical axis) M= 
Z 
 
Π3  1
y Image plane
Z0
Π (Ρ2) x Mp  x p   ZZ0 X  X 
 y   Z0Y   
Z Y
M p =  p  =  ZZZ  = 0  
Y
O  Z0   Z0  Z  Z 
X     Z
Camera center  1   1   Z0 
Projective Transformation
Projective Transformation maps M onto Mp in Ρ3 space
M
ρ  xp   X  1 0 0 0  X 
 y   Y  0 1 0 0  Y 
ρ M p = TpM ⇒ ρ  p  =   =   
Mp  Z0   Z  0 0 1 0  Z 
Y     0 0 z10

0   1 
O
 1  W  
X
Z
ρ= = projective scale factor
Z0
Projective Transformation linearizes projection
6
Perspective Projection
Dimension reduction from Ρ3 into Ρ2 by projection onto Π(Ρ2)
 xp  1 0 0 0  x 
M  x p   1 0 0 0   p 
0
 yy p  = 0 1
1
0
0
0   y p 
0
 Zp  0 0 1 0   Z0 
 10  0 0 0 1  
Π (Ρ2) mp  1  0 0 0 1  1 
Y
O
X
Perspective projection P0 from Ρ3 onto Ρ2:

X 
 x  1 0 0 0  
  Y
ρ mp = DpTpM = P0M ⇒ ρ  y  = 0 1 0
Z
0   , ρ =
Z  Z
 1  0 0 z1 0    0
 1
 
0
Image plane and image sensor

A sensor with picture elements (Pixel) is added onto the image plane
Z (Optical axis)
Image center
Pixel coordinates c= (cx, cy)T Image sensor
m = (y,x)T
y
x Focal length Z0 Y Image-sensor mapping:
Pixel scale X m = Kmp
f= (fx,fy)T Projection center
Pixel coordinates are related to image coordinates by affine transformation K
with five parameters:  fx s cx 
− Image center c at optical axis K =  0 fy cy 

− distance Zp (focal length) and Pixel size determines pixel resolution fx, fy  0 0 1 
− image skew s to model angle between pixel rows and columns
7
Projection in general pose
 R C
Tcam =  T  P Ro
t
0 1 at
io n[ Projection: ρ m = PM
R] p
mp
Projection center C
M
−1 RT −RTC
Tscene = T
cam = T 
0 1 
World coordinates
Projection matrix P
Camera projection matrix P combines:
− inverse affine transformation Tcam-1 from general pose to origin
− Perspective projection P0 to image plane at Z0 =1
− affine mapping K from image to sensor coordinates
R T −RT C 
scene pose transformation: Tscene =  T 
0 1 
1 0 0 0  fx s c x 
projection: P0 = 0 1 0 0  = [I 0] sensor calibration: K =  0 fy cy 
0 0 1 0   0 0 1 
⇒ ρ m = PM, P = KP0Tscene = K RT −RT C 
8
2-view geometry: F-Matrix
Projection onto two views:
P0 = K 0R0−1 [I 0] P1 = K1R1−1 [I −C1 ]

ρ0 m0 = P0M = K 0R0−1 [I 0] M ρ1m1 = P1M = K1R −1
[I −C1 ] M
1
⇒ ρ 0 m0 = K 0R −1
0 [I 0] M∞
= K1R1−1 [I 0 ] M ∞ + K1R1−1 [I −C1 ]O
M ρ1m1 = K1R1−1R0 K 0−1ρ 0 m0 − K1R1−1C1
⇒ ρ1m1 = ρ0H ∞ m0 + e1
ρ0M∞
Epipolar line
Z
M∞ H m
P0 ∞ 0
 X   X  0 
m0 m1 Y  Y  0 
Y P1 M =   =   +   = M∞ + O
e1  Z   Z  0 
O X      
 1   0   1
C1
The Fundamental Matrix F

The projective points e1 and (H∞m0) define a plane in camera 1 (epipolar
plane Π e)
the epipolar plane intersect the image plane 1 in a line (epipolar line ue)
the corresponding point m1 lies on that line: m1Tue= 0
If the points (e1),(m1),(H∞m0) are all collinear, then the collinearity
theorem applies: (m1T e1 x H∞m0) = 0.
collinearity of m 1, e1, H ∞ m 0 ⇒ m 1T ([e1 ]x H ∞ m 0 ) = 0
 0 −ez ey  F3 x 3
 
[ e ]x =  ez 0 −e x 
 −ey ex 0 

Fundamental Matrix F Epipolar constraint
F = [e1 ]x H∞ m1T Fm0 = 0
9
Estimation of F from correspondences
Given a set of corresponding points, solve linearily for the 9 elements of
F in projective coordinates
since the epipolar constraint is homogeneous up to scale, only eight

elements are independent
since the operator [e]x and hence F have rank 2, F has only 7
independent parameters (all epipolar lines intersect at e)
each correspondence gives 1 collinearity constraint
=> solve F with minimum of 7 correspondences
for N>7 correspondences minimize distance point-line:

N
∑ (m
n =0
T
1,n Fm0,n )2 ⇒ min!
m1Ti Fm 0 i = 0 det(F ) = 0 (Rank 2 constraint)
The Essential Matrix E

• F is the most general constraint on an image pair. If the camera
calibration matrix K is known, then more constraints are available
• Essential Matrix E
~ )T F (Km
m1T Fm0 = (Km1
~ )= m
0
~ T K T FK m
1 1424
~
3 0
( )
E
 0 − ez ey 
 
E = [e]x R with [e]x =  ez 0 − ex 
− e y ex 0 

E holds the relative orientation of a calibrated camera pair. It has

5 degrees of freedom: 3 from rotation matrix Rik, 2 from direction
of translation e, the epipole.
10
Estimation of P from E
From E we can obtain a camera projection matrix pair: E=Udiag(0,0,1)VT
P0=[I3x3 | 03x1] and there are four choices for P1:
P1=[UWVT | +u3] or P1=[UWVT | -u3] or P1=[UWTVT | +u3] or P1=[UWTVT | -u3]

0 − 1 0
with W = 1 0 0 only one with 3D point in
0 0 1 front of both cameras
four possible configurations:
3D Feature Reconstruction
• corresponding point pair (m0, m1) is projected from
3D feature point M
• M is reconstructed from by (m0, m1) triangulation
• M has minimum distance of intersection
2
d ⇒ min!
P0 m0 M
I0 d constraints:
I0T d = 0
I1 I1T d = 0
m1
F minimize reprojection error:
P1
1 ) ⇒ min.
(m0 − P0M )2 + (m1 − PM 2
11
Multi View Tracking
• 2D match: Image correspondence (m1, mi)

• 3D match: Correspondence transfer (mi, M) via P1
• 3D Pose estimation of Pi with mi - Pi M => min.
m0 M
P0
3D match
mi
m1
P1 2D match
Pi
N K
∑∑ m
2
Minimize lobal reprojection error: k ,i − Pi Mk ⇒ min!
i =0 k =0
Correspondences matching vs. tracking

Image-to-image correspondences are essential to 3D
reconstruction
SIFT-matcher KLT-tracker
Extract features independently Extract features in first images and

and then match by comparing find same feature back in next view
descriptors [Lowe 2004] [Lucas & Kanade 1981] , [Shi &
Tomasi 1994]
Small difference between frames
potential large difference overall
12
SIFT-detector
Scale and image-plane-rotation invariant feature descriptor

[Lowe 2004]
SIFT-detector
• Image content is transformed into local feature coordinates
that are invariant to translation, rotation, scale, and other
imaging parameters
13
Difference of Gaussian for Scale
invariance
Difference of
Gaussian
Gaussian
Difference-of-Gaussian with constant ratio of scales is a close

approximation to Lindeberg’s scale-normalized Laplacian
[Lindeberg 1998]
Difference of Gaussian for Scale

invariance
Difference-of-Gaussian with constant ratio of scales is a close

approximation to Lindeberg’s scale-normalized Laplacian
[Lindeberg 1998]
14
Key point localization
• Detect maxima and minima of
difference-of-Gaussian in scale
space
• Fit a quadratic to surrounding
values for sub-pixel and sub-scale
interpolation (Brown & Lowe, 2002)
• Taylor expansion around point:
ulB
r
uS
brtac
• Offset of extremum (use finite

differences for derivatives):
Orientation normalization
Histogram of local gradient directions

computed at selected scale
Assign principal orientation at peak of

smoothed histogram
Each key specifies stable 2D

coordinates (x, y, scale, orientation)
0 2π
15
Example of keypoint detection
Threshold on value at DOG peak and on ratio of principle
curvatures (Harris approach)
(a) 233x189 image

(b) 832 DOG extrema
(c) 729 left after peak
value threshold
(d) 536 left after testing
ratio of principle
curvatures
courtesy Lowe
SIFT vector formation

Thresholded image gradients are sampled over 16x16 array of
locations in scale space
Create array of orientation histograms
8 orientations x 4x4 histogram array = 128 dimensions
example 2x2 histogram array
© Lowe
16
Sift feature detector
Robust data selection: RANSAC

• Estimation of plane from point data
{potential plane points}
Select m samples
x
Compute n-parameter solution
x x
x x
x x Evaluate on {potential plane points}
x x
x x {inliersamples}
x x
x x x x Best solution so far?
yes no
no
Keep it
1-(1-(#inlier/|{potential plane points}|)n)steps>0.99
Best solution, {inlier}, {outlier}

17
RANSAC: Evaluate Hypotheses
Evaluate cost function 2
c 
ε
2 2 2 2
0≤λ ε ≤ λε 
1+ c
c 1+ c  ε
1+ c
≤ λ2ε 2 <
c
if 2λ ε (
c + c 2 − c 1 + λ2ε 2  )

else 1 

x
x x
x x plane hypotheses
x x
x ε0 ε0 ε0 ε0
x
potential points
x ε1 ε1 ε1 ε1
x x x
x x x x
εn εn εn εn
Robust Pose Estimation Calibrated Camera

Bootstrap known 3D points
{2D-2D correspondences} {2D-3D correspondences}
E-RANSAC(5-point algorithm) RANSAC(3-point algorithm)
E P
Estimate P0,P1 nonlinear refinement with all inliers
P0,P1 P
nonlinear refinement with all inliers

P0,P1
triangulate points
18
References
[Lowe 2004] David Lowe, Distinctive Image Features from Scale-
Invariant Keypoints, IJCV, 60(2), 2004, pp91-110
[Lucas & Kanade 1981] Bruce D. Lucas and Takeo Kanade. An
Iterative Image Registration Technique with an Application to Stereo
Vision. In Proceedings International Joint Conference on Artificial
Intelligence, 1981.
[Shi & Tomasi 1994] Jianbo Shi and Carlo Tomasi, Good Features to
Track, IEEE Conference on Computer Vision and Pattern
Recognition 1994
[Baker & Matthews 2004] S. Baker and I. Matthews, Lucas-Kanade 20
Years On: A Unifying Framework.International Journal of Computer
Vision, 56(3):221–255, March 2004.
[Fischler & Bolles] M.A. Fischler and R.C. Bolles, Random sample
consensus: A paradigm for model fitting with applications to image
analysis and automated cartography.CACM, 24(6), June’81.
References
[Mikolajczyk 2003], K.Mikolajczyk, C.Schmid. “A Performance
Evaluation of Local Descriptors”. CVPR 2003
[Lindeberg 1998], T. Lindeberg, "Feature detection with
automatic scale selection," International Journal of Computer
Vision, vol. 30, no. 2, 1998
[Hartley 2003] R. hartley and A. Zisserman, “Multiple View
Geometry in Computer Vision”, 2nd edition, Cambridge Press,
2003
19

Introduction To Computer Vision

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Computer Vision

Uploaded by

Copyright:

Available Formats

Introduction to Computer Vision

Introduction to Computer Vision for Robotics

Introduction to Computer Vision for Robotics

Properties of affine transformation

Introduction to Computer Vision for Robotics

Introduction to Computer Vision for Robotics

Projective and homogeneous points

Given: Plane Π in R2 embedded in P2 at coordinates w=1

Introduction to Computer Vision for Robotics

Affine and projective transformations

 X ′  a11 a12 a13 tx   X 

 xp   X ′  a11 a12 a13 tx   X 

 zp   Z′   a31 a32 a33 tz   Z 

Interior of camera obscura

(Sunday Magazine, 1838)

Introduction to Computer Vision for Robotics

Pinhole camera model

Introduction to Computer Vision for Robotics

Projective Transformation linearizes projection

Introduction to Computer Vision for Robotics

Dimension reduction from Ρ3 into Ρ2 by projection onto Π(Ρ2)

Perspective projection P0 from Ρ3 onto Ρ2:

Introduction to Computer Vision for Robotics

Image plane and image sensor

Introduction to Computer Vision for Robotics

Introduction to Computer Vision for Robotics

⇒ ρ m = PM, P = KP0Tscene = K RT −RT C 

Introduction to Computer Vision for Robotics

P0 = K 0R0−1 [I 0] P1 = K1R1−1 [I −C1 ]

The Fundamental Matrix F

Fundamental Matrix F Epipolar constraint

F = [e1 ]x H∞ m1T Fm0 = 0

Introduction to Computer Vision for Robotics

since the epipolar constraint is homogeneous up to scale, only eight

each correspondence gives 1 collinearity constraint

=> solve F with minimum of 7 correspondences

for N>7 correspondences minimize distance point-line:

m1Ti Fm 0 i = 0 det(F ) = 0 (Rank 2 constraint)

Introduction to Computer Vision for Robotics

The Essential Matrix E

E holds the relative orientation of a calibrated camera pair. It has

Introduction to Computer Vision for Robotics

P0=[I3x3 | 03x1] and there are four choices for P1:

P1=[UWVT | +u3] or P1=[UWVT | -u3] or P1=[UWTVT | +u3] or P1=[UWTVT | -u3]

four possible configurations:

Introduction to Computer Vision for Robotics

Introduction to Computer Vision for Robotics

• 2D match: Image correspondence (m1, mi)

Introduction to Computer Vision for Robotics

Correspondences matching vs. tracking

Extract features independently Extract features in first images and

potential large difference overall

Introduction to Computer Vision for Robotics

Scale and image-plane-rotation invariant feature descriptor

Introduction to Computer Vision for Robotics

Introduction to Computer Vision for Robotics

Difference-of-Gaussian with constant ratio of scales is a close

Difference of Gaussian for Scale

Difference-of-Gaussian with constant ratio of scales is a close

• Offset of extremum (use finite

Introduction to Computer Vision for Robotics

Histogram of local gradient directions

Assign principal orientation at peak of