Computer Vision - Transformations, Imaging Geometry and Stereo Vision

Computer Vision –
Transformations,
Imaging Geometry
and
Stereo Vision
Dr. S. Das
IIT Madras, Chennai-36
BASICS
Representation of Points in the 3D world: a vector of length 3
X  x y z  T
y
T
P(x,y,z) P’(x’,y’,z’)
Right handed
coordinate system
4 basic transformations
x
• Translation
z Transformations • Rotation Affine
of points in 3D transformations
• Scaling
• Shear
Basics 3D Transformation equations
• Translation : P’ = P + ∆P
⎡
⎢ ' x '
⎤ ⎡⎢x ⎥ ⎡⎢x
⎢⎥y  ⎤⎢y ⎥ ⎢y
⎥ ⎤
⎥⎢ ' ⎥ ⎥
• Scaling: P’= SP
⎣z ⎢⎣z ⎥⎦⎢⎣z
⎦
⎡Sx ⎥⎦
S  0⎢⎢ 0 y 0⎥
⎥
⎢⎣ 0
S Sz 
00 ⎤
• Rotation : about an axis, ⎥⎦
P’ = RP
ROTATION - 2D Y
5
x' xcos   ysin  = 30°
4
y' xsin  ycos

In matrix form, this is :
3
2 
R⎢ ⎡cos( ) -
⎥ sin( ) cos( ) 1
0 1 2 3 X
sin(⎣ )⎤ 4 5
Positive Rotations:
⎦ counter clockwise about
the origin
For rotations, |R| = 1 and [R]T = [R]-1.
Rotation matrices are orthogonal.
Rotation about an arbitrary
point P in space
As we mentioned before, rotations are

applied about the origin. So to rotate about
any arbitrary point P in space, translate so
that P coincides with the origin, then rotate,
then translate back. Steps are:
• Translate by (-Px, -Py)
• Rotate
• Translate by (Px, Py)

point P in space
P1 
House at P1 Rotation by

P1
Translation of Translation
P1 to Origin back to P1
2D Transformation equations (revisited)
• Translation : P’ = P + ∆P
⎡ ⎡ x⎤ ⎡ ⎡1 0 x⎤ ⎡ x⎤
T
⎢ ⎥ ⎢ ⎥ ⎢ ⎥  ⎥ ⎢ ⎥ ?
⎣ y'⎦ ⎣ y⎦ ⎣ ⎣ ⎤y'⎦
x' 1 y ⎦ ⎣ y
x'⎤ ⎡ ⎤
x
⎦ ⎣ ⎦
⎢y
• ⎥Rotation : about an axis, ⎢0
-
P’ = RP ⎡cos()
R
cos())⎤⎦
⎢⎣sin( sin(
⎡ x"⎤ ⎢⎡sincos cos
sin⎢⎥⎡⎤y'x'⎥⎤ ) ⎥
⎢ ⎦ 
⎣
⎣
y" ⎥ ⎦⎣
point P in space
Rgen = T1(-Px, -Py) * R2() * T3(Px, Py)
⎡1 0  Px ⎤  sin( ) 0⎤ ⎡1 0
⎢ ⎥ ⎥
0 * P y
⎥
⎡⎢cos( ) Px ⎤
⎣ 0 ⎥1
 0⎢0 1  Py ⎢⎥* cos( ) ⎢0 ⎥ ⎢ ⎥⎦
⎢sin(1 ) ⎥
0⎦ 1 0 1 ⎦⎥
⎢⎣
cos( sin( Px *(cos( ⎢) 0 1) 0Py *(sin(
⎡ ⎣
⎢ ) P
) y *(cos( )  1)  Px
 sin( )
)
⎢ ⎤ ⎥
⎢ 0 cos( *sin( ) ⎥ 1 ⎥
⎣ ) 0 ⎦
Using Homogeneous system
Homogeneous representation of a
point in 3D space:
P | x y z w | T
(w  1, for a
3Dpoint)
Transformations will thus be
represented by 4x4 matrices:
P’ = A.P
Homogenous Coordinate systems
• In order to Apply a sequence of transformations to
produce composite transformations we introduce the
fourth coordinate
• Homogeneous representation of 3D point:
|x y z h|T (h=1 for a 3D point, dummy
coordinate)
• Transformations will be represented by 4x4 matrices.
⎡1 0 0 ⎡S x 0 0
⎥
0⎥
⎢x
y
T ⎢ 0
⎤
⎢⎢0 1 S  ⎢0
⎢
⎤
0⎥
0
0⎥
y ⎢
⎣ ⎥
1⎥
⎢
0 00
⎣ S 0 1
⎦ 0 Translation
Homogenous 1 ⎦ 0
Homogenous ⎥
Scaling
matrix matrix
⎡1 0 0 ⎡ cos  0 sin  0⎤
⎢
R  0 0⎥
⎥ 0
1 0⎥
⎥
⎢⎤0 0 R  ⎢ 0
⎢ ⎢ 
⎢ sin
⎢00 cos  sin  ⎥
⎥ ⎢ ⎥
⎣ 1 
⎣ 0 0
1
⎥
Rotation sin  x axis by angle α
⎦ about ⎦ about y0 axis by angle β
Rotation
cos
cos 
0
⎡cos   sin  0 0⎤
0
Change of
0 ⎥⎥ sign?
R  ⎢sin  cos  0 0
0
⎢ 0 0
⎢ ⎢ 0 1 ⎥
⎣ 0 1
⎥
0 0
Rotation
⎦ about z axis by angle γ
How can one do a Rotation about an arbitrary Axis in
Space?
3D Transformation equations (3)
Rotation About an Arbitrary Axis in Space
Assume we want to perform a rotation about an
axis in space, passing through the point (x0, y0, z0)
with direction cosines (cx, cy, cz), by  degrees.
1) First of all, translate by: - (x0, y0, z0) = |T|.
2) Next, we rotate the axis into one of the principle
axes. Let's pick, Z (|Rx|, |Ry|).
3) We rotate next by  degrees in Z ( |Rz()|).
4) Then we undo the rotations to align the axis.
5) We undo the translation: translate by (x0, y0, z0)
The tricky part is (2) above.

This is going to take 2 rotations,
i)about x (to place the axis in the x-z
plane) and
ii)about y (to place the result coincident with
the z axis).
z Rotation about x by :
cy How do we determine ?
d Project the unit vector, along

P cz OP, into the y-z plane. The y
0 and z components are cy and
cz, the directions cosines of
cx y the unit vector along the
x arbitrary axis. It can be seen
z
from the diagram above, that :
cx d = sqrt(c 2 + c 2), cos() = cz /d
P(cx, 0 ,d)  y
z
sin() = cy /d
0
d Rotation by β about y:
y How do we determine β?
x Similar to above:
Determine the angle  to rotate the result into the Z axis:
The x component is cx and the z component is d.
cos() = d = d /(length of the unit vector) sin()
= cx = cx /(length of the unit vector).
Final Transformation:
M = |T|-1 |Rx|-1 |Ry|-1 |Rz| |Ry| |Rx| |T|
If you are given 2 points instead, you can calculate

the direction cosines as follows:
V = | (x1 -x0) (y1 -y0) (z1 -z0)

|T cx = (x1 -x0)/ |V|
cy = (y1 -y0)/
|V| cz = (z1 -z0)/
|V|,
Inverse transformations
0 0  ⎡1 S x 0 0
⎡1
⎢⎢0 ⎥ 0⎤
T1
 x⎤  y⎥ ⎢ 0 1 Sy 0⎥
⎢0  z S 1  ⎢ 0
z
1 0 ⎥ ⎢ ⎥
⎢⎣0 1
0 1 ⎥
⎦
Inverse Translation Inverse scaling
0 0 ⎢ 0 0 1S
Inverse Rotation 0⎥
⎡1 0 00⎤ ⎡cos  0  sin  ⎣0⎤ 0⎡ cos  0sin  00 0⎤

⎢⎢0 0⎥⎥ ⎢ 0 0⎥⎥ ⎢ sin 0⎥⎥
⎢0 cos sin  1 0 1⎦ cos 0 0
 0
⎢sin 
⎢  sin  cos  0 ⎢
⎥ 0 0 cos  ⎢
⎥ 0 ⎥⎥
⎣0 10⎥⎦ 10⎥⎦ 1⎦
0 Rα -1 ⎢1 ⎣0 0⎢ ⎣
R -1
0 0 0 β
Concatenation of transformations
• The 4 X 4 representation is used to perform a
sequence of transformations.
• Thus application of several transformations
in a particular sequence can be presented
by a single transformation matrix
v 
 R (S (Tv))  Av; A  R .S.T
• The order of application is important… the

multiplication may not be commutable.
Commutivity of Transformations
If we scale, then translate to the origin,
and then translate back, is that equivalent to
translate to origin, scale, translate back?
When is the order of matrix
multiplication unimportant?
When does T1 * T2 = T2 *
T 1?
Cases where
T T 1 * T 2 = T2 * T
T1:
1 2
translation translation
scale scale
rotation rotation
Scale (uniform) rotation
COMPOSITE TRANSFORMATIONS
If we want to apply a series of
transformations T1, T2, T3 to a set of points,
We can do it in two ways:
1) We can calculate p'=T1*p, p''= T2*p',

p'''=T3*p''
2) Calculate T= T1*T2*T3, then p'''=
T*p.
Method 2, saves large number of additions

and multiplications (computational time) –
needs approximately 1/3 of as many operations.
Therefore, we concatenate or compose the
matrices into one final transformation matrix,
and then apply that to the points.
Spaces
Object Space
definition of objects. Also called Modeling space.
World Space
where the scene and viewing specification is made
Eye space (Normalized Viewing Space)

where eye point (COP) is at the origin looking down the Z
axis.
3D Image Space
A 3D Perspected space.
Dimensions: -1:1 in x & y, 0:1 in Z.
Where Image space hidden
surface algorithms work.
Screen Space (2D)

Projections
We will look at several planar geometric 3D to 2D
projection:
-Parallel Projections
Orthographic
Oblique
-Perspective
Projection of a 3D object is defined by

straight projectionrays (projectors) emanating
from the center of projection (COP) passing
through each point of the object and intersecting
the projection plane.
Perspective Projections
Distance from COP to

projection plane is finite.
The projectors are not
parallel & we specify a
center of projection.
Center of Projection is
also called the Perspective
Projection
Perspective Reference
Point
COP = PRP
• Perspective foreshortening: the size of the
perspective projection of the object varies inversely
with the distance of the object from the center of
projection.
• Vanishing Point: The perspective projections of any set

of parallel lines that are not parallel to the projection
Projection
Plane
(top view) Projectors
for
Projectors for side view
top view
Projection
Plane
(side view)
Projection Projectors for
Plane front view
(front view)
Example of Orthographic Projection

Example of Isometric Projection:
Projection
plane
Projector
Projection-
plane normal
Example Oblique Projection
Projection y
plane
Projector x
z
Projection-plane normal
END OF BASICS
THE CAMERA MODEL:
perspective projection
y,Y
p(x,y,z)
Camera lens
I x,X
z
0
COL
P
(X,Y) (x,y,z)- 3D world
f (X,Y) - 2D Image plane

Perspective Geometry and Camera Models
X or Y P(X,Y,Z)
F
X or Y P(X,Y,Z)
IP
PP
xp or yp
Z
(COL) O
PP
X ,Y x,y
p(x,y,z
CASE 1
O
(COL) Z
By similarity of triangles P(-X,-Y)

f
X x Y 
 , 
yf z f f zf • Image plane before
the camera lens
,
f z f-z • Origin of coordinate
xf yf
X  x Y  y systems at the image
X  , Y plane
1 1
f f • Image plane at origin of
coordinate system
PP
x ,y X,Y p(x,y,z)
CASE 2
O
By similarity of triangles (COL) Z
P(-X,-Y)
f
 X x Y

f z f z ,
y • Image plane before
 xf yf
X  , Y the camera lens
z z • Origin of coordinate
x systems at the camera lens
X  z , Y
y • Image plane at origin of
f z coordinate system
f
x ,y
PP
p(x,y,z)
CASE 3 X, Y
By similarity of triangles P(X,Y)

Z
(COL)
X x Y y O
 ,  f
f z f z
• Image plane after
xf yf
X  z , Y  z the camera lens
• Origin of coordinate
x
systems at the camera lens
X  , Y y
zf zf • Focal length f
PP
X ,Y p(x,y,z)
CASE 4 x,y
P(X,Y)
By similarity of triangles (COL) Z
O
f
X x , Y
f  f 
yz ff z • Image plane after the
Y
, camera lens
f xf f  • Origin of coordinate system
X  yf
z  z not at COP
X  , Y
xz yz • Image plane origin coincides
1 f 1 f with 3D world origin
PP
Consider the first case ….
X ,Y x,y
p(x,y,z)
• Note that the equations
are non-linear
• We can develop a matrix O
formulation of the (COP) Z
equations given below P(-X,-Y)
f
x y
X  , Y
1 z f 1 z f
⎡ X ⎡1 0 0 0⎤⎡kx⎤
⎢
⎤Y ⎥ ⎥⎢
⎢ 1 0 ⎥⎢ky
0
⎢Z ⎥⎥  ⎢0 ⎥⎥
⎢
⎢k0' ⎥ ⎢0 0 10⎥⎢kz
(Z is not important and is ⎣ ⎦
eliminated)
0 1 f⎥
Inverse perspective projection
p(x0,y0,z0)
⎡1 0 0
0⎥⎥
P 1  ⎢0⎤ 0
⎢0
⎢⎢0 1 0 ⎥⎥
⎣0 1⎦ P(X0,Y0)
0 1
0 1 f
⎡x0 ⎤ ⎡1 0 00⎤⎡kX 0 ⎤ ⎡kX 0 ⎤ ⎡ X 0
⎢y ⎢0 ⎥⎢⎢ 0 ⎥ ⎢ ⎢
 ⎥ ⎢
⎤
wh  0 ⎥  ⎢ 1 0 0 kY kY 0 ⎥

Y
0
⎢⎢0
⎥⎢z 0 ⎥⎢ 0 ⎥ ⎥⎢ 0 ⎥
⎢ ⎥ ⎢
⎢ 0 ⎥ ⎢⎢0 0 1 ⎥ ⎢⎥ ⎥
⎥1 ⎥ ⎥ ⎥
⎣ 1 ⎢
⎦⎣ k k ⎣
Hence no 3D 0information
0 1 f can be⎦retrieved
⎣ ⎦ inverse
⎦ with the
⎦ ⎣ 1
transformation
So we introduce the dummy variable i.e. the depth Z
Let the image point be represented as: [kX 0 kY0 kZ k ]

T
⎡ fX 0
⎡ kX 0 ⎤ ⎤
⎡⎢x0 ⎤ ⎡1 0 0 ⎢ ⎥ ⎢ f Z
⎥⎢ kY0 ⎢ 0
⎢ ⎢ fY ⎥
wh  ⎥
0
⎢
0⎤⎡kX 0 ⎤ 0⎥⎢
⎥ kY0⎥  ⎢ ⎢ kZ ⎥⎥  ⎢ f Z⎥
⎥
⎥⎢z 0 ⎥ ⎢kZ ⎢ ⎥ ⎥ ⎥
⎢ ⎥ ⎢⎢
0 ⎥ ⎥ Z ⎢
⎢ fZ f  Z
⎥1y 0 1 0
⎣ 1⎥ ⎢⎦⎣k ⎦ ⎣⎢k f  k⎦ ⎢ ⎥
⎢0 0 1 ⎣ ⎥1 ⎦
⎦ ⎥
fZ fz0
z0  ⎣0 Z 0 1 f f z0 f
f  f  0 f Z  Z  f
Z  z0
X0 z Y0
x0  ( f  z0 ), y0  ( f  z0 )
f f
CASE 1 PP
X ,Y x,y
Forward: 3D to 2D p(x,y,z
X x Y 
f  z  f, f  z O
(COL) Z
yf
P(-X,-Y)
f
f xf f-z
yf
X  , Y
z y
X  , Y
xz
1 f 1 z f
Inverse: 2D to 3D
X0 Y0
x0  ( f  z0 ), y0  ( f  z0 )
f f
CASE 3
x ,y
Forward: 3D to 2D PP
p(x,y,z)
X x Y y X, Y
 , 
f z f z
P(X,Y)
Z
xf yf (COL)
X  z , Y  z O
f
x
X  , Y
z zy
f f
Inverse: 2D to 3D
z0 .X 0 z0
x0  , y0  0
f .Yf
Pinhole Camera schematic diagram
Camera Image formulation
 Action of eye is simulated by an abstract camera model
(pinhole camera model)
 3D real world is captured on the image plane. Image is
projection of 3D object on a 2D plane.
F : ( X w ,Yw , Z w )  (xi , yi )
Y 
X
Xworld
C ximage Xworld  ( X w ,Yw , Zw

Z
)
X Y
f ximage  ( f Z w , f Zw )
w w
Camera Geometry
 Camera can be considered as a projection matrix, x
P3*4 X
 A pinhole camera has the projection matrix as
P  diag( f , f ,1)I | 0
X , Y , Z   fX / Z  p , fY / Z  p
ycam
T
 Principal point offset x
y
 p xcam
T
⎡f 0 px
x  K I |
⎢ 0⎤
y 0⎥⎥ p  ( px , p y )
⎢⎣ 0⎥ 0X
K  0⎢ 0 f p ⎦
 Camera with rotation and translation
0 1
x  KR | tX
R, Y
t
Z
X
Camera Geometry
Y 
X Xworld
C ximage
p
Z
yimg p  ( px , p y )
ximg
Camera internal  x Scale factor in x- coordinate direction
⎡x s px
parameters  y Scale factor in y- coordinate direction
⎤ ⎥
K  ⎢⎢  py ⎥ s Camera skew
x
y Aspect ratio
y
⎢⎣ 1
Camera matrix, P  K [⎥R⎦ | t R Rotation
t Translation vector
]
Observations about Perspective
projection
• 3D scene to image plane is a one to one
transformation (unique
correspondence)
• For every image point no unique world
coordinate can be found
• So depth information cannot be retrieved using a
single image ? What to do?
• Would two (2) images of the same object (from
different viewing angles) help?
• Termed - Stereo Vision
Image 1
Stereo Vision
X
Y
(X1,Y1)
Lens center
Optical axis
Image 2
B
Y p(x,y,z)
X
World point
(X2,Y2)
Stereo Vision (2)
• Stereo imaging involves obtaining two separate image
views of an object ( in this discussion the world
point)
• The distance between the centers of the two lenses is
called the baseline width.
• The projection of the world point on the two image
planes is (X1, Y1) and (X2, Y2)
• The assumption is that the cameras are identical
• The coordinate system of both cameras are perfectly
aligned differing only in the x-coordinate location of the
origin.
• The world coordinate system is also bought into the
coincidence with one of the image X, Y planes (say
image plane 1) . So y, z coordinates are same for both
Top view of the stereo imaging system with origin
at center of first imaging plane.
X
(X1,Y1)
z1
O1
f
Image 1
W(x, y, z)
B
Image 2
f z2
O2
(X2,Y2)
First bringing the first camera into coincidence with
the world coordinate system and then using the second
camera coordinate system and directly applying the
formula we get:
X1 X2
x1  ( f  z1 ), x2  (f z )
f 2 f
Because the separation between the two cameras is B
x2  x1  B, z1  z2  z(?) / * Solve it now

*/ X1 X2
x1  ( f  z), x1  B (f 
f  f z)
fB
( X 2  X1 ) z f 
B f (f  (X2
• The equation above gives the depth directly from
the coordinate of the two points
• The quantity given below is called the disparity
fB
D  ( X 2  X1 )
( f  z)

• The most difficult task is to find out the two
corresponding points in different images of the
same scene – the correspondence problem.
•Once the correspondence problem is solved –
(non-analytical), we get D. Then obtain depth
using: fB
z  f  ( X 2  f [1  BD ]
Alternate Model
X x Y y
– Case III
f  , z f
Xz z
x f , fy
x 2  x1  y1  y2  y; z1  z2 Yzz(?).

B,
X1z
x1  X 2z
f , x1  x  B  f
2
( X 1  X 2 )z z
B f , ( X1  X  B. f D
fB
2)
Top view of the stereo imaging system with origin
at center of first camera lens.
X IP 1
O1
z1
(X1,Y1)
f
W(x, y, z)
B
(X2,Y2) z2
O2
IP 2
Compare the two solutions
fB
z f   f [1  BD
( X 2
] fB
X1 ) D 2  X1 )
(X  ( f  z)
fB
z  B. f
( X1  X D
fB
2)
D(X X ) z
1 2
What do you think of D ?

The Correspondence Problem
z  B. f D D(X X ) z
1 2 fB Y1 
Y2
If D  0; then X 2  X1
(X1, Y1) (X2, Y2)

Y1 Y2
X1 X2
Image Plane - I Image Plane - II
EPIPOLAR Line
Error in Depth Estimation
 (z)   B. f
z  B. f D D2
D
Expressing in terms of depth (z), we have:

2
 
B. f

z

z
(z) D D2 D B.
f
What is the maximum value of depth (z), you can
measure using a stereo setup ?
zmax  B. f
Even if correspondence is solved correctly, the
computation of D may have an error, with an upper
bound of 0.5; i.e. (D)max = 0.5.
That may cause an error of:  (z)  z2

 2B.
Larger baseline width and Focal length (off the
camera) reduces the error and increases the maximum
value of depth that may be estimated.
What about the minimum value of depth (object

closest to the cameras) ?
zmin  B. f / Dmax
What is Dmax ?
Dmax  X max
Xmax depends on f and image resolution
(in other words, angle of field-of-view or FOV).
General Stereo Views
Perfect Stereo Views
We can also have arbitrary pair of views from two
cameras.
• The baseline may not lie on any of the principle
axis
• The viewing axes of the cameras may not be
parallel
• Unequal focal lengths of the cameras
• The coordinate systems of the image planes may not be
aligned
Take home exercises/problems:

What about Epipolar line in cases above ?
How do you derive the equation of an epipolar line ?
In case of a set of arbitrary views used for 3-D
reconstruction (object structure, surface geometry,
modeling etc.), methods used involve:
- KLT (Kanade-Lucas-Tomasi)- tracker
- Bundle adjustment
- 8-point DLT algorithm
- Zhang’s homography
- Tri-focal tensors
- Cheriality and DIAC
- Auto-calibration
- Metric reconstruction
- RANSAC
Tri-focal tensors
End of Lectures on -
Transformations,
Imaging Geometry
and
Stereo Vision

Computer Vision - Transformations, Imaging Geometry and Stereo Vision

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Vision - Transformations, Imaging Geometry and Stereo Vision

Uploaded by

Copyright:

Available Formats

Computer Vision –

As we mentioned before, rotations are

• Translate by (-Px, -Py)

• Translate by (Px, Py)

Rgen = T1(-Px, -Py) * R2() * T3(Px, Py)

The tricky part is (2) above.

d Project the unit vector, along

If you are given 2 points instead, you can calculate

V = | (x1 -x0) (y1 -y0) (z1 -z0)

⎡1 0 00⎤ ⎡cos  0  sin  ⎣0⎤ 0⎡ cos  0sin  00 0⎤

• The order of application is important… the

1) We can calculate p'=T1*p, p''= T2*p',

Method 2, saves large number of additions

Eye space (Normalized Viewing Space)

Screen Space (2D)

Projection of a 3D object is defined by

Distance from COP to

• Vanishing Point: The perspective projections of any set

Example of Orthographic Projection

f (X,Y) - 2D Image plane

By similarity of triangles P(-X,-Y)

By similarity of triangles P(X,Y)

Let the image point be represented as: [kX 0 kY0 kZ k ]

C ximage Xworld  ( X w ,Yw , Zw

x2  x1  B, z1  z2  z(?) / * Solve it now

What do you think of D ?

(X1, Y1) (X2, Y2)

Expressing in terms of depth (z), we have:

That may cause an error of:  (z)  z2

What about the minimum value of depth (object

Take home exercises/problems:

You might also like

1) We can calculate p'=T1p, p''= T2p',