Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Introduction to Computer Vision

for Robotics

Jan-Michael Frahm

Overview
 Camera model
 Multi-view geometry
 Camera pose estimation
 Feature tracking & matching
 Robust pose estimation

Introduction to Computer Vision for Robotics

1
Homogeneous coordinates
M
Unified notation:
e3
include origin in affine basis

a3
e2
a2
Homogeneous

o Coordinates of M
a1 e1
 a1 
r r r r  
e1 e2 e3 o  a2 
M=
0 0 0 1 a3 
 
1
Affine basis matrix

Introduction to Computer Vision for Robotics

Properties of affine transformation


Transformation Taffine combines linear mapping and
coordinate shift in homogeneous coordinates
− Linear mapping with A3x3 matrix
− coordinate shift with t3 translation vector
 a11 a12 a13 tx 
a a a23 t y 
 A3 x 3 t3 
Taffine =  21 22
M ′ = Taffine M =  M
0 0 0 1  a31 a32 a33 tz 
 
0 0 0 1
 Parallelism is preserved
 ratios of length, area, and volume are preserved
 Transformations can be concatenated:
if M1 = T1M and M 2 = T2M1 ⇒ M 2 = T2T1M = T21M

Introduction to Computer Vision for Robotics

2
Projective geometry
 Projective space P2 is space of rays emerging from O
− view point O forms projection center for all rays
− rays v emerge from viewpoint into scene
− ray g is called projective point, defined as scaled v: g=lv

w R3

x
 x′
v =  y ′ v g (λ ) =  y  = λv , λ ∈ ℜ ≠ 0
 1 
w 
y
O
x

Introduction to Computer Vision for Robotics

Projective and homogeneous points

 Given: Plane Π in R2 embedded in P2 at coordinates w=1


− viewing ray g intersects plane at v (homogeneous coordinates)
− all points on ray g project onto the same homogeneous point v
− projection of g onto Π is defined by scaling v=g/l = g/w

 x′  x 
w R3 g (λ ) = λv = λ  y ′ =  y 
 1  w 
w=1 x
Π (Ρ2) v w 
 x ′  
v =  y ′ = g =  
y 1 y
w w 
O  1   
x 1
 
Introduction to Computer Vision for Robotics

3
Finite and infinite points
 All rays g that are not parallel to Π intersect at an affine point v on Π.

 The ray g(w=0) does not intersect Π. Hence v∞ is not an affine point but a
direction. Directions have the coordinates (x,y,z,0)T
 Projective space combines affine space with infinite points (directions).

g
g1
P3
g2

w=1 v∞
Π (Ρ2) v v1 v2
x
g ∞ =  y 
O  0 

Introduction to Computer Vision for Robotics

Affine and projective transformations


 Affine transformation leaves infinite points at infinity

 X ′  a11 a12 a13 tx   X 


 Y ′  a a a23 t y  Y 
M ∞' = TaffineM ∞ ⇒   =  21 22
 Z ′  a31 a32 a33 tz   Z 
    
0 0 0 0 1  0 
 Projective transformations move infinite points into finite affine space

 xp   X ′  a11 a12 a13 tx   X 


y  Y ′  a a22 a23 t y  Y 
M ' = Tprojective M∞ ⇒  p
=λ   =λ  21

 zp   Z′   a31 a32 a33 tz   Z 


      
1 w ′  w 41 w 42 w 43 w 44   0 
Example: Parallel lines intersect at the horizon (line of infinite points).
We can see this intersection due to perspective projection!
Introduction to Computer Vision for Robotics

4
Pinhole Camera (Camera obscura)

Interior of camera obscura

(Sunday Magazine, 1838)


Camera obscura

(France, 1830)

Introduction to Computer Vision for Robotics

Pinhole camera model

r
so
s en
e aperture
ag
Im

r
nte
Ce )
, cy View direction
(c x

object

Focal length f

image lens

Introduction to Computer Vision for Robotics

5
Perspective projection
 Perspective projection in P3 models pinhole camera:
− scene geometry is affine Ρ3 space with coordinates M=(X,Y,Z,1)T
− camera focal point in O=(0,0,0,1)T, camera viewing direction along Z
− image plane (x,y) in Π(Ρ2) aligned with (X,Y) at Z= Z0
− Scene point M projects onto point Mp on plane surface
X 
Y 
Z (Optical axis) M= 
Z 
 
Π3  1

y Image plane
Z0
Π (Ρ2) x Mp  x p   ZZ0 X  X 
 y   Z0Y   
Z Y
M p =  p  =  ZZZ  = 0  
Y
O  Z0   Z0  Z  Z 
X     Z
Camera center  1   1   Z0 
Introduction to Computer Vision for Robotics

Projective Transformation
 Projective Transformation maps M onto Mp in Ρ3 space

M
ρ  xp   X  1 0 0 0  X 
 y   Y  0 1 0 0  Y 
ρ M p = TpM ⇒ ρ  p  =   =   
Mp  Z0   Z  0 0 1 0  Z 
Y     0 0 z10

0   1 
O
 1  W  
X
Z
ρ= = projective scale factor
Z0

 Projective Transformation linearizes projection

Introduction to Computer Vision for Robotics

6
Perspective Projection

Dimension reduction from Ρ3 into Ρ2 by projection onto Π(Ρ2)

 xp  1 0 0 0  x 
M  x p   1 0 0 0   p 
0
 yy p  = 0 1
1
0
0
0   y p 
0
 Zp  0 0 1 0   Z0 
 10  0 0 0 1  
Π (Ρ2) mp  1  0 0 0 1  1 
Y
O
X

Perspective projection P0 from Ρ3 onto Ρ2:


X 
 x  1 0 0 0  
  Y
ρ mp = DpTpM = P0M ⇒ ρ  y  = 0 1 0
Z
0   , ρ =
Z  Z
 1  0 0 z1 0    0
 1
 
0

Introduction to Computer Vision for Robotics

Image plane and image sensor


 A sensor with picture elements (Pixel) is added onto the image plane

Z (Optical axis)
Image center
Pixel coordinates c= (cx, cy)T Image sensor
m = (y,x)T
y
x Focal length Z0 Y Image-sensor mapping:
Pixel scale X m = Kmp
f= (fx,fy)T Projection center
 Pixel coordinates are related to image coordinates by affine transformation K
with five parameters:  fx s cx 
− Image center c at optical axis K =  0 fy cy 

− distance Zp (focal length) and Pixel size determines pixel resolution fx, fy  0 0 1 
− image skew s to model angle between pixel rows and columns

Introduction to Computer Vision for Robotics

7
Projection in general pose
 R C
Tcam =  T  P Ro
t
0 1 at
io n[ Projection: ρ m = PM
R] p
mp

Projection center C
M

−1 RT −RTC
Tscene = T
cam = T 
0 1 

World coordinates

Introduction to Computer Vision for Robotics

Projection matrix P
 Camera projection matrix P combines:
− inverse affine transformation Tcam-1 from general pose to origin
− Perspective projection P0 to image plane at Z0 =1
− affine mapping K from image to sensor coordinates

R T −RT C 
scene pose transformation: Tscene =  T 
0 1 
1 0 0 0  fx s c x 
projection: P0 = 0 1 0 0  = [I 0] sensor calibration: K =  0 fy cy 
0 0 1 0   0 0 1 

⇒ ρ m = PM, P = KP0Tscene = K RT −RT C 

Introduction to Computer Vision for Robotics

8
2-view geometry: F-Matrix
Projection onto two views:

P0 = K 0R0−1 [I 0] P1 = K1R1−1 [I −C1 ]


ρ0 m0 = P0M = K 0R0−1 [I 0] M ρ1m1 = P1M = K1R −1
[I −C1 ] M
1
⇒ ρ 0 m0 = K 0R −1
0 [I 0] M∞
= K1R1−1 [I 0 ] M ∞ + K1R1−1 [I −C1 ]O
M ρ1m1 = K1R1−1R0 K 0−1ρ 0 m0 − K1R1−1C1

⇒ ρ1m1 = ρ0H ∞ m0 + e1
ρ0M∞
Epipolar line
Z
M∞ H m
P0 ∞ 0
 X   X  0 
m0 m1 Y  Y  0 
Y P1 M =   =   +   = M∞ + O
e1  Z   Z  0 
O X      
 1   0   1
C1
Introduction to Computer Vision for Robotics

The Fundamental Matrix F


 The projective points e1 and (H∞m0) define a plane in camera 1 (epipolar
plane Π e)
 the epipolar plane intersect the image plane 1 in a line (epipolar line ue)
 the corresponding point m1 lies on that line: m1Tue= 0
 If the points (e1),(m1),(H∞m0) are all collinear, then the collinearity
theorem applies: (m1T e1 x H∞m0) = 0.
collinearity of m 1, e1, H ∞ m 0 ⇒ m 1T ([e1 ]x H ∞ m 0 ) = 0
 0 −ez ey  F3 x 3
 
[ e ]x =  ez 0 −e x 
 −ey ex 0 

Fundamental Matrix F Epipolar constraint

F = [e1 ]x H∞ m1T Fm0 = 0

Introduction to Computer Vision for Robotics

9
Estimation of F from correspondences
 Given a set of corresponding points, solve linearily for the 9 elements of
F in projective coordinates

 since the epipolar constraint is homogeneous up to scale, only eight


elements are independent

 since the operator [e]x and hence F have rank 2, F has only 7
independent parameters (all epipolar lines intersect at e)

 each correspondence gives 1 collinearity constraint

=> solve F with minimum of 7 correspondences

for N>7 correspondences minimize distance point-line:


N

∑ (m
n =0
T
1,n Fm0,n )2 ⇒ min!

m1Ti Fm 0 i = 0 det(F ) = 0 (Rank 2 constraint)

Introduction to Computer Vision for Robotics

The Essential Matrix E


• F is the most general constraint on an image pair. If the camera
calibration matrix K is known, then more constraints are available
• Essential Matrix E
~ )T F (Km
m1T Fm0 = (Km1
~ )= m
0
~ T K T FK m
1 1424
~
3 0
( )
E

 0 − ez ey 
 
E = [e]x R with [e]x =  ez 0 − ex 
− e y ex 0 

 E holds the relative orientation of a calibrated camera pair. It has


5 degrees of freedom: 3 from rotation matrix Rik, 2 from direction
of translation e, the epipole.

Introduction to Computer Vision for Robotics

10
Estimation of P from E
 From E we can obtain a camera projection matrix pair: E=Udiag(0,0,1)VT

 P0=[I3x3 | 03x1] and there are four choices for P1:

P1=[UWVT | +u3] or P1=[UWVT | -u3] or P1=[UWTVT | +u3] or P1=[UWTVT | -u3]


0 − 1 0
with W = 1 0 0 only one with 3D point in
0 0 1 front of both cameras

four possible configurations:

Introduction to Computer Vision for Robotics

3D Feature Reconstruction
• corresponding point pair (m0, m1) is projected from
3D feature point M
• M is reconstructed from by (m0, m1) triangulation
• M has minimum distance of intersection

2
d ⇒ min!
P0 m0 M
I0 d constraints:
I0T d = 0
I1 I1T d = 0
m1
F minimize reprojection error:
P1
1 ) ⇒ min.
(m0 − P0M )2 + (m1 − PM 2

Introduction to Computer Vision for Robotics

11
Multi View Tracking

• 2D match: Image correspondence (m1, mi)


• 3D match: Correspondence transfer (mi, M) via P1
• 3D Pose estimation of Pi with mi - Pi M => min.

m0 M
P0
3D match
mi
m1
P1 2D match
Pi

N K

∑∑ m
2
Minimize lobal reprojection error: k ,i − Pi Mk ⇒ min!
i =0 k =0

Introduction to Computer Vision for Robotics

Correspondences matching vs. tracking


 Image-to-image correspondences are essential to 3D

reconstruction
SIFT-matcher KLT-tracker

Extract features independently Extract features in first images and


and then match by comparing find same feature back in next view
descriptors [Lowe 2004] [Lucas & Kanade 1981] , [Shi &
Tomasi 1994]
 Small difference between frames

 potential large difference overall

Introduction to Computer Vision for Robotics

12
SIFT-detector

Scale and image-plane-rotation invariant feature descriptor


[Lowe 2004]

Introduction to Computer Vision for Robotics

SIFT-detector
• Image content is transformed into local feature coordinates
that are invariant to translation, rotation, scale, and other
imaging parameters

Introduction to Computer Vision for Robotics

13
Difference of Gaussian for Scale
invariance

Difference of
Gaussian
Gaussian

 Difference-of-Gaussian with constant ratio of scales is a close


approximation to Lindeberg’s scale-normalized Laplacian
[Lindeberg 1998]
Introduction to Computer Vision for Robotics

Difference of Gaussian for Scale


invariance

 Difference-of-Gaussian with constant ratio of scales is a close


approximation to Lindeberg’s scale-normalized Laplacian
[Lindeberg 1998]
Introduction to Computer Vision for Robotics

14
Key point localization
• Detect maxima and minima of
difference-of-Gaussian in scale
space
• Fit a quadratic to surrounding
values for sub-pixel and sub-scale
interpolation (Brown & Lowe, 2002)
• Taylor expansion around point:
ulB
r

uS
brtac

• Offset of extremum (use finite


differences for derivatives):

Introduction to Computer Vision for Robotics

Orientation normalization

 Histogram of local gradient directions


computed at selected scale

 Assign principal orientation at peak of


smoothed histogram

 Each key specifies stable 2D


coordinates (x, y, scale, orientation)

0 2π

Introduction to Computer Vision for Robotics

15
Example of keypoint detection
Threshold on value at DOG peak and on ratio of principle
curvatures (Harris approach)

(a) 233x189 image


(b) 832 DOG extrema
(c) 729 left after peak
value threshold
(d) 536 left after testing
ratio of principle
curvatures

courtesy Lowe

Introduction to Computer Vision for Robotics

SIFT vector formation


 Thresholded image gradients are sampled over 16x16 array of
locations in scale space
 Create array of orientation histograms
 8 orientations x 4x4 histogram array = 128 dimensions
example 2x2 histogram array

© Lowe

Introduction to Computer Vision for Robotics

16
Sift feature detector

Introduction to Computer Vision for Robotics

Robust data selection: RANSAC


• Estimation of plane from point data
{potential plane points}

Select m samples

x
Compute n-parameter solution
x x
x x
x x Evaluate on {potential plane points}
x x
x x {inliersamples}
x x
x x x x Best solution so far?
yes no
no
Keep it

1-(1-(#inlier/|{potential plane points}|)n)steps>0.99

Best solution, {inlier}, {outlier}


Introduction to Computer Vision for Robotics

17
RANSAC: Evaluate Hypotheses
 Evaluate cost function 2
c 
ε
2 2 2 2
0≤λ ε ≤ λε 
1+ c
c 1+ c  ε
1+ c
≤ λ2ε 2 <
c
if 2λ ε (
c + c 2 − c 1 + λ2ε 2  )

else 1 


x
x x
x x plane hypotheses
x x
x ε0 ε0 ε0 ε0
x

potential points
x ε1 ε1 ε1 ε1
x x x
x x x x

εn εn εn εn

Introduction to Computer Vision for Robotics

Robust Pose Estimation Calibrated Camera


Bootstrap known 3D points

{2D-2D correspondences} {2D-3D correspondences}

E-RANSAC(5-point algorithm) RANSAC(3-point algorithm)

E P

Estimate P0,P1 nonlinear refinement with all inliers

P0,P1 P

nonlinear refinement with all inliers


P0,P1

triangulate points

Introduction to Computer Vision for Robotics

18
References
[Lowe 2004] David Lowe, Distinctive Image Features from Scale-
Invariant Keypoints, IJCV, 60(2), 2004, pp91-110
[Lucas & Kanade 1981] Bruce D. Lucas and Takeo Kanade. An
Iterative Image Registration Technique with an Application to Stereo
Vision. In Proceedings International Joint Conference on Artificial
Intelligence, 1981.
[Shi & Tomasi 1994] Jianbo Shi and Carlo Tomasi, Good Features to
Track, IEEE Conference on Computer Vision and Pattern
Recognition 1994
[Baker & Matthews 2004] S. Baker and I. Matthews, Lucas-Kanade 20
Years On: A Unifying Framework.International Journal of Computer
Vision, 56(3):221–255, March 2004.
[Fischler & Bolles] M.A. Fischler and R.C. Bolles, Random sample
consensus: A paradigm for model fitting with applications to image
analysis and automated cartography.CACM, 24(6), June’81.

Introduction to Computer Vision for Robotics

References
 [Mikolajczyk 2003], K.Mikolajczyk, C.Schmid. “A Performance
Evaluation of Local Descriptors”. CVPR 2003
 [Lindeberg 1998], T. Lindeberg, "Feature detection with
automatic scale selection," International Journal of Computer
Vision, vol. 30, no. 2, 1998
 [Hartley 2003] R. hartley and A. Zisserman, “Multiple View
Geometry in Computer Vision”, 2nd edition, Cambridge Press,
2003

Introduction to Computer Vision for Robotics

19

You might also like