Professional Documents
Culture Documents
Computer Vision - Transformations, Imaging Geometry and Stereo Vision
Computer Vision - Transformations, Imaging Geometry and Stereo Vision
Transformations,
Imaging Geometry
and
Stereo Vision
Dr. S. Das
IIT Madras, Chennai-36
BASICS
Representation of Points in the 3D world: a vector of length 3
X x y z T
y
T
P(x,y,z) P’(x’,y’,z’)
Right handed
coordinate system
4 basic transformations
x
• Translation
z Transformations • Rotation Affine
of points in 3D transformations
• Scaling
• Shear
Basics 3D Transformation equations
• Translation : P’ = P + ∆P
⎡
⎢ ' x '
⎤ ⎡⎢x ⎥ ⎡⎢x
⎢⎥y ⎤⎢y ⎥ ⎢y
⎥ ⎤
⎥⎢ ' ⎥ ⎥
• Scaling: P’= SP
⎣z ⎢⎣z ⎥⎦⎢⎣z
⎦
⎡Sx ⎥⎦
S 0⎢⎢ 0 y 0⎥
⎥
⎢⎣ 0
S Sz
00 ⎤
• Rotation : about an axis, ⎥⎦
P’ = RP
ROTATION - 2D Y
5
x' xcos ysin = 30°
4
y' xsin ycos
In matrix form, this is :
3
2
R⎢ ⎡cos( ) -
⎥ sin( ) cos( ) 1
0 1 2 3 X
sin(⎣ )⎤ 4 5
Positive Rotations:
⎦ counter clockwise about
the origin
For rotations, |R| = 1 and [R]T = [R]-1.
Rotation matrices are orthogonal.
Rotation about an arbitrary
point P in space
• Rotate
P1
House at P1 Rotation by
P1
Translation of Translation
P1 to Origin back to P1
2D Transformation equations (revisited)
• Translation : P’ = P + ∆P
⎡ ⎡ x⎤ ⎡ ⎡1 0 x⎤ ⎡ x⎤
T
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ?
⎣ y'⎦ ⎣ y⎦ ⎣ ⎣ ⎤y'⎦
x' 1 y ⎦ ⎣ y
x'⎤ ⎡ ⎤
x
⎦ ⎣ ⎦
⎢y
• ⎥Rotation : about an axis, ⎢0
-
P’ = RP ⎡cos()
R
cos())⎤⎦
⎢⎣sin( sin(
⎡ x"⎤ ⎢⎡sincos cos
sin⎢⎥⎡⎤y'x'⎥⎤ ) ⎥
⎢ ⎦
⎣
⎣
y" ⎥ ⎦⎣
Rotation about an arbitrary
point P in space
⎡1 0 Px ⎤ sin( ) 0⎤ ⎡1 0
⎢ ⎥ ⎥
0 * P y
⎥
⎡⎢cos( ) Px ⎤
⎣ 0 ⎥1
0⎢0 1 Py ⎢⎥* cos( ) ⎢0 ⎥ ⎢ ⎥⎦
⎢sin(1 ) ⎥
0⎦ 1 0 1 ⎦⎥
⎢⎣
cos( sin( Px *(cos( ⎢) 0 1) 0Py *(sin(
⎡ ⎣
⎢ ) P
) y *(cos( ) 1) Px
sin( )
)
⎢ ⎤ ⎥
⎢ 0 cos( *sin( ) ⎥ 1 ⎥
⎣ ) 0 ⎦
Using Homogeneous system
Homogeneous representation of a
point in 3D space:
P | x y z w | T
(w 1, for a
3Dpoint)
Transformations will thus be
represented by 4x4 matrices:
P’ = A.P
Homogenous Coordinate systems
• In order to Apply a sequence of transformations to
produce composite transformations we introduce the
fourth coordinate
• Homogeneous representation of 3D point:
|x y z h|T (h=1 for a 3D point, dummy
coordinate)
• Transformations will be represented by 4x4 matrices.
⎡1 0 0 ⎡S x 0 0
⎥
0⎥
⎢x
y
T ⎢ 0
⎤
⎢⎢0 1 S ⎢0
⎢
⎤
0⎥
0
0⎥
y ⎢
⎣ ⎥
1⎥
⎢
0 00
⎣ S 0 1
⎦ 0 Translation
Homogenous 1 ⎦ 0
Homogenous ⎥
Scaling
matrix matrix
⎡1 0 0 ⎡ cos 0 sin 0⎤
⎢
R 0 0⎥
⎥ 0
1 0⎥
⎥
⎢⎤0 0 R ⎢ 0
⎢ ⎢
⎢ sin
⎢00 cos sin ⎥
⎥ ⎢ ⎥
⎣ 1
⎣ 0 0
1
⎥
Rotation sin x axis by angle α
⎦ about ⎦ about y0 axis by angle β
Rotation
cos
cos
0
⎡cos sin 0 0⎤
0
Change of
0 ⎥⎥ sign?
R ⎢sin cos 0 0
0
⎢ 0 0
⎢ ⎢ 0 1 ⎥
⎣ 0 1
⎥
0 0
Rotation
⎦ about z axis by angle γ
How can one do a Rotation about an arbitrary Axis in
Space?
3D Transformation equations (3)
Rotation About an Arbitrary Axis in Space
Assume we want to perform a rotation about an
axis in space, passing through the point (x0, y0, z0)
with direction cosines (cx, cy, cz), by degrees.
1) First of all, translate by: - (x0, y0, z0) = |T|.
2) Next, we rotate the axis into one of the principle
axes. Let's pick, Z (|Rx|, |Ry|).
3) We rotate next by degrees in Z ( |Rz()|).
4) Then we undo the rotations to align the axis.
5) We undo the translation: translate by (x0, y0, z0)
Final Transformation:
M = |T|-1 |Rx|-1 |Ry|-1 |Rz| |Ry| |Rx| |T|
World Space
where the scene and viewing specification is made
3D Image Space
A 3D Perspected space.
Dimensions: -1:1 in x & y, 0:1 in Z.
Where Image space hidden
surface algorithms work.
-Parallel Projections
Orthographic
Oblique
-Perspective
Center of Projection is
also called the Perspective
Projection
Perspective Reference
Point
COP = PRP
• Perspective foreshortening: the size of the
perspective projection of the object varies inversely
with the distance of the object from the center of
projection.
Projection
Plane
(side view)
Projection Projectors for
Plane front view
(front view)
Projection
plane
Projector
Projection-
plane normal
Example Oblique Projection
Projection y
plane
Projector x
z
Projection-plane normal
END OF BASICS
THE CAMERA MODEL:
perspective projection
y,Y
p(x,y,z)
Camera lens
I x,X
z
0
COL
P
(X,Y) (x,y,z)- 3D world
X or Y P(X,Y,Z)
F
X or Y P(X,Y,Z)
IP
PP
xp or yp
Z
(COL) O
PP
X ,Y x,y
p(x,y,z
CASE 1
O
(COL) Z
X x Y
,
yf z f f zf • Image plane before
the camera lens
,
f z f-z • Origin of coordinate
xf yf
X x Y y systems at the image
X , Y plane
1 1
f f • Image plane at origin of
coordinate system
PP
x ,y X,Y p(x,y,z)
CASE 2
O
By similarity of triangles (COL) Z
P(-X,-Y)
f
X x Y
f z f z ,
y • Image plane before
xf yf
X , Y the camera lens
z z • Origin of coordinate
x systems at the camera lens
X z , Y
y • Image plane at origin of
f z coordinate system
f
x ,y
PP
p(x,y,z)
CASE 3 X, Y
X ,Y p(x,y,z)
CASE 4 x,y
P(X,Y)
By similarity of triangles (COL) Z
O
f
X x , Y
f f
yz ff z • Image plane after the
Y
, camera lens
f xf f • Origin of coordinate system
X yf
z z not at COP
X , Y
xz yz • Image plane origin coincides
1 f 1 f with 3D world origin
PP
Consider the first case ….
X ,Y x,y
p(x,y,z)
• Note that the equations
are non-linear
• We can develop a matrix O
formulation of the (COP) Z
equations given below P(-X,-Y)
f
x y
X , Y
1 z f 1 z f
⎡ X ⎡1 0 0 0⎤⎡kx⎤
⎢
⎤Y ⎥ ⎥⎢
⎢ 1 0 ⎥⎢ky
0
⎢Z ⎥⎥ ⎢0 ⎥⎥
⎢
⎢k0' ⎥ ⎢0 0 10⎥⎢kz
(Z is not important and is ⎣ ⎦
eliminated)
0 1 f⎥
Inverse perspective projection
p(x0,y0,z0)
⎡1 0 0
0⎥⎥
P 1 ⎢0⎤ 0
⎢0
⎢⎢0 1 0 ⎥⎥
⎣0 1⎦ P(X0,Y0)
0 1
0 1 f
⎡x0 ⎤ ⎡1 0 00⎤⎡kX 0 ⎤ ⎡kX 0 ⎤ ⎡ X 0
⎢y ⎢0 ⎥⎢⎢ 0 ⎥ ⎢ ⎢
⎥ ⎢
⎤
wh 0 ⎥ ⎢ 1 0 0 kY kY 0 ⎥
Y
0
⎢⎢0
⎥⎢z 0 ⎥⎢ 0 ⎥ ⎥⎢ 0 ⎥
⎢ ⎥ ⎢
⎢ 0 ⎥ ⎢⎢0 0 1 ⎥ ⎢⎥ ⎥
⎥1 ⎥ ⎥ ⎥
⎣ 1 ⎢
⎦⎣ k k ⎣
Hence no 3D 0information
0 1 f can be⎦retrieved
⎣ ⎦ inverse
⎦ with the
⎦ ⎣ 1
transformation
So we introduce the dummy variable i.e. the depth Z
X x Y
f z f, f z O
(COL) Z
yf
P(-X,-Y)
f
f xf f-z
yf
X , Y
z y
X , Y
xz
1 f 1 z f
Inverse: 2D to 3D
X0 Y0
x0 ( f z0 ), y0 ( f z0 )
f f
CASE 3
x ,y
Forward: 3D to 2D PP
p(x,y,z)
X x Y y X, Y
,
f z f z
P(X,Y)
Z
xf yf (COL)
X z , Y z O
f
x
X , Y
z zy
f f
Inverse: 2D to 3D
z0 .X 0 z0
x0 , y0 0
f .Yf
Pinhole Camera schematic diagram
Camera Image formulation
Action of eye is simulated by an abstract camera model
(pinhole camera model)
3D real world is captured on the image plane. Image is
projection of 3D object on a 2D plane.
F : ( X w ,Yw , Z w ) (xi , yi )
Y
X
Xworld
P3*4 X
A pinhole camera has the projection matrix as
P diag( f , f ,1)I | 0
X , Y , Z fX / Z p , fY / Z p
ycam
T
Principal point offset x
y
p xcam
T
⎡f 0 px
x K I |
⎢ 0⎤
y 0⎥⎥ p ( px , p y )
⎢⎣ 0⎥ 0X
K 0⎢ 0 f p ⎦
Camera with rotation and translation
0 1
x KR | tX
R, Y
t
Z
X
Camera Geometry
Y
X Xworld
C ximage
p
Z
yimg p ( px , p y )
ximg
Camera internal x Scale factor in x- coordinate direction
⎡x s px
parameters y Scale factor in y- coordinate direction
⎤ ⎥
K ⎢⎢ py ⎥ s Camera skew
x
y Aspect ratio
y
⎢⎣ 1
Camera matrix, P K [⎥R⎦ | t R Rotation
t Translation vector
]
Observations about Perspective
projection
• 3D scene to image plane is a one to one
transformation (unique
correspondence)
• For every image point no unique world
coordinate can be found
• So depth information cannot be retrieved using a
single image ? What to do?
• Would two (2) images of the same object (from
different viewing angles) help?
• Termed - Stereo Vision
Image 1
Stereo Vision
X
Y
(X1,Y1)
Lens center
Optical axis
Image 2
B
Y p(x,y,z)
X
World point
(X2,Y2)
Stereo Vision (2)
• Stereo imaging involves obtaining two separate image
views of an object ( in this discussion the world
point)
• The distance between the centers of the two lenses is
called the baseline width.
• The projection of the world point on the two image
planes is (X1, Y1) and (X2, Y2)
• The assumption is that the cameras are identical
• The coordinate system of both cameras are perfectly
aligned differing only in the x-coordinate location of the
origin.
• The world coordinate system is also bought into the
coincidence with one of the image X, Y planes (say
image plane 1) . So y, z coordinates are same for both
Top view of the stereo imaging system with origin
at center of first imaging plane.
X
(X1,Y1)
z1
O1
f
Image 1
W(x, y, z)
B
Image 2
f z2
O2
(X2,Y2)
First bringing the first camera into coincidence with
the world coordinate system and then using the second
camera coordinate system and directly applying the
formula we get:
X1 X2
x1 ( f z1 ), x2 (f z )
f 2 f
Because the separation between the two cameras is B
Xz z
x f , fy
x 2 x1 y1 y2 y; z1 z2 Yzz(?).
B,
X1z
x1 X 2z
f , x1 x B f
2
( X 1 X 2 )z z
B f , ( X1 X B. f D
fB
2)
Top view of the stereo imaging system with origin
at center of first camera lens.
X IP 1
O1
z1
(X1,Y1)
f
W(x, y, z)
B
(X2,Y2) z2
O2
IP 2
Compare the two solutions
fB
z f f [1 BD
( X 2
] fB
X1 ) D 2 X1 )
(X ( f z)
fB
z B. f
( X1 X D
fB
2)
D(X X ) z
1 2
z B. f D D(X X ) z
1 2 fB Y1
Y2
If D 0; then X 2 X1
X1 X2
Image Plane - I Image Plane - II
EPIPOLAR Line
Error in Depth Estimation
(z) B. f
z B. f D D2
D
Transformations,
Imaging Geometry
and
Stereo Vision