Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Computer

Image
Vision
Processing

Image formation

Geometric primitives and


transformations

Cosimo Distante

Cosimo.distante@cnr.it
Cosimo.distante@unisalento.it
Image
Processing Geometric primitives and transformations

•  …....
2.1.1 Geometric primitives
Image
Processing Geometric
Geometric primitives form the basic primitives
building blocks used to describe three-dimensional s
dingInblocks used to
this section, wedescribe
introducethree-dimensional shapes.
points, lines, and planes. Later sections of the book di
nes, curves
and planes. 2D
(Sections 5.1 Points
Laterandsections of the(Section
11.2), surfaces book 12.3),
discuss
and volumes (Section 12.5).
(Section 12.3), and volumes (Section 12.5).
2D points. 2D2Dpoints
points (pixel
(pixel coordinates
coordinates in an can
in an image) image) can be
be denoted denoted
using a pair of va
2
x = (x, y) 2 using
R2 , or aalternatively,
pair of values, x = (x, y) ∈ R , or alternatively,
es in an image) can be denoted using a pair" of values, #
x
x= .
" # y
x
x =(As stated. in the introduction, we use the (x1 , x2 , . . .)(2.1)
notation to denote column vector
y points can also be represented using homogeneous coordinates, x̃ = (x̃, ỹ, w̃) 2
2D
where vectors2D thatpoints
differ only
can by
bescale are considered
represented to be
using equivalent. P 2 = R3 (0,
homogeneous
e (x1 , x2 , . . .) notation to denote column vectors.)
is called the 2D projective space.
coordinates
sing homogeneous coordinates, x̃ = (x̃, ỹ, w̃) 2 P 2
,
A homogeneous vector x̃ can be converted back into an inhomogeneous vector
re considered to be equivalent. P 2
= R
dividing through by the last element w̃, i.e.,
3
(0, 0, 0)

x̃ = (x̃, ỹ, w̃) = w̃(x, y, 1) = w̃x̄,


onverted back into an inhomogeneous vector x by
i.e., where x̄ = (x, y, 1) is the augmented vector. Homogeneous points whose last element is
2D points. 2D points 2D(pixel
points. coordinates
2D points in an image)
(pixel can be denoted
coordinates in an x using can
image) a pair
beof de
x =Image
(x, y) 2 R , or alternatively,
2 x = .
x = (x, y) 2 R2 , or alternatively, " # y #
Processing "
Geometric x primitives
ation to denote column vectors.)
, . . .) notation to denote column x = vectors.) .
(As stated in the introduction,ywe use the x(x=1 , x2y, . ...) notation
x

ogeneous 2D projective
coordinates, = space
(x̃, 2P 2 ,
s coordinates, x̃ = (x̃, ỹ, w̃) 2 P ,
2D points can
As stated in the introduction,

we use
(As stated in the
ỹ,
alsothebe(xrepresented w̃) 2 usingtohomogeneous
1 , x2 , . . .) notation denote column vectocoo
we use the (x1 , x2 , . . .) notation to
2 introduction,
dered to be equivalent.
2D pointswhere
can also P
be represented
vectors that differ= R
usingonly
3
(0,
homogeneous
by 0, 0)
scale coordinates,
are x̃ = to
considered (x̃,be eq
be equivalent. P = R (0, 0, 0)
2 3 ỹ, w̃)
2D points can also be represented using homogeneous coord
where vectors that differ only by scale are considered to be equivalent. P 2 = R3 (
is called the 2D
where projective
vectors that differ space.
only by scale are considered to be equi
s called the 2D projective space.
back into an
A homogeneous AAhomogeneous
vector x̃ can be vector
inhomogeneous
homogeneous
is called the 2D projective vector
converted can
can be by
x̃space.
backxinto converted
converted back into
into
an inhomogeneous vecto an
A homogeneous vector x̃ can w̃, be i.e.,
converted back into an i
to an inhomogeneous vector x by
dividing
dividing through byan through
theinhomogeneous
last elementby w̃,
thei.e.,last
vector element
dividing through by the last element w̃, i.e.,
x̃ = (x̃, ỹ, w̃) = w̃(x, x̃ = 1) =
y,(x̃, ỹ, w̃ x̄, = w̃(x, y, 1) = w̃x̄
w̃)
x, y, 1) = w̃x̄, x̃ = (2.2)
(x̃, ỹ, w̃) = w̃(x, y, 1) = w̃x̄,
where x̄ = (x, y, 1) is the augmented vector. Homogeneous points whose last element
where
0 are called ideal x̄
where =or(x,
where
points x̄points1)aty,isinfinity
=y,(x, theisisaugmented
1) the
the do not vector.
augmented
augmented
and vector.
have Homogeneous
vector
Homogeneous
an equivalent po
poin
inhomog
= w̃x̄,
mogeneous
epresentation. points
0 are 0 arewhose
called called
idealideallastpoints
points element (2.2)
or points
or points =
isatw̃atinfinity
infinity and
and do
do not
nothave
havean
d do not have an representation.
equivalent inhomogeneous
representation.
2D lines. 2D lines can also be represented using homogeneous coordinates l̃ = (a
us points whose last element is w̃ =
2Dequation
The corresponding line lines. 2D
is lines can also be represented using homogeneou
Image
Processing Geometric primitives
blocks used to Points
describe three-dimensional
at infinity shapes.
and planes. Later sections of the book discuss
On a plane, we know that two non-parallel lines intersect
ction 12.3), and
at volumes (Section
a point, but 12.5). lines cannot.
two parallel

Imagine that two parallel lines do meet at a point that is a


an image) canspecial
be denoted
point using a pair
we call of values,
a point at infinity for that group of
parallel lines.
" #
x
. (2.1)
y
adding these missing points at infinity to the finite point
set of the plane R2 gives an extended plane we call a
,
1 2x , . . .) notation to denote column vectors.)
projective plane
homogeneous coordinates, x̃ = (x̃, ỹ, w̃) 2 P , 2

onsidered to be equivalent. P 2 = R3 + {points


(0, 0, 0)at infinity}
theImage
2D projective space.
nmogeneous
be converted
Processing vector
back x̃ cananbe
into convertedprimitives
inhomogeneous
Geometric back intox an
vector by inhomogeneo
ent w̃, i.e.,by the last element w̃, i.e.,
through
Points at infinity
(x̃, ỹ, w̃) = w̃(x, y, 1)x̃==w̃x̄,
(x̃, ỹ, w̃) = w̃(x, y, 1) = (2.2)
w̃x̄,

ted(x,
= vector.
y, 1)Homogeneous
isHomogeneous points
the augmented whose
points whose
vector. last element
last element
Homogeneous = 0 whose last
is w̃points
sled
at ideal
infinitypoints
and doornot have at
aninfinity
equivalent inhomogeneous
are called ideal points or pointsdo
points and at not have
infinity andandoequivalent
not i
ation. have an equivalent inhomogeneous representation

represented using
The homogeneous
point at coordinates
infinity of the line l̃ = (a,
could not b,bec).represented
. 2D lines can also be represented using homogeneous coordinate
s by the symbol ∞, which is merely a notation, it is not a
esponding line equation is
number.
x̄ · l̃ = ax + by + c = 0. (2.3)
x̄ · l̃ = ax + by + c = 0.
n vector so that l = (n̂x , n̂y , d) = (n̂, d) with kn̂k = 1. In
theImage
2D projective space.
nmogeneous
be converted
Processing vector
back x̃ cananbe
into convertedprimitives
inhomogeneous
Geometric back intox an
vector by inhomogeneo
ent w̃, i.e.,by the last element w̃, i.e.,
through
Points at infinity
(x̃, ỹ, w̃) = w̃(x, y, 1)x̃==w̃x̄,
(x̃, ỹ, w̃) = w̃(x, y, 1) = (2.2)
w̃x̄,

=ted(x,
vector.
y, 1)Homogeneous
isHomogeneous points
the augmented whose
points whose
vector. last element
last element
Homogeneous = 0 whose last
is w̃points
sledat ideal
infinitypoints
and doornot have at
points aninfinity
equivalent inhomogeneous
x! = ( x!, y!, w!do
and ) not have an equivalent i
ation.
x! y!
where x = y=
represented using homogeneous w! w!
coordinates l̃ = (a, b, c).
. 2D lines can also be represented using homogeneous coordinate
s x!
esponding lineIntuitively !
w
equation is → 0 ⇒ →∞
w!
x̄ · l̃ = ax + by
The+point
c = 0.represented by homogeneous coordinate
(2.3) ( x!, y!, 0 )
x̄ · l̃ = ax + by + c = 0.
is the point at infinity
n vector so that l = (n̂x , n̂y , d) = (n̂, d) with kn̂k = 1. In
Image
Processing Geometric primitives

Points at infinity
The point represented by homogeneous coordinate ( x!, y!, 0 )
is the point at infinity

Of course, the point at infinity has only homogeneous


coordinates and does not have any inhomogeneous
representation as we cannot divide by zero.

Point at infinity is (1,1, 0 )

(0, 0,1) is the origin


(0, 0, 0) is invalid and does not represent any point
. in an image) can be denoted using(2.1)
inates a pair of values,
Image
Processing Geometric primitives
" #
, . . .) notation to denote column
xRemember vectors.)
the definition of
x= . (2.1)
ogeneous coordinates,
yProjective x̃ = (x̃, ỹ, w̃) 2 P ,
space?
2

dered to be equivalent. P 2 = R3 (0, 0, 0)


e the (x1 , x2 , . . .) notation to denote column vectors.)
This projective space from the quotient space of
ed using homogeneous
homogeneous coordinates,
coordinates is x̃
so =
far (x̃,
only ỹ,
a w̃) 2
space P
that
2
,
does
back into an inhomogeneous vector2x by 3
ale are considered to be
not inherit equivalent.
algebraic structures =R
P from (0, 0, 0)
It is not a vector space;
be
x, y,converted
1) = w̃x̄,back into an inhomogeneous (2.2) vector x by
it even does not have the zero!
w̃, i.e.,
mogeneous points whose
The only last element
structure w̃ =
is the
it inherits is notion of linear
d, do not=have
ỹ, w̃) w̃(x, 1) = w̃x̄,
any,equivalent
dependence of points (2.2)
that encodes the collinearity!
inhomogeneous
gmented
= w̃(x,
last y,vector.
element1) = Homogeneous points whose
w̃,w̃i.e.,
x̄, (2.2) last element is w̃ =
ation.
points
Imageat infinityx̃ = (x̃,do
and not= have
ỹ, w̃) an1)equivalent
w̃(x, y, = w̃x̄, inhomogeneous(2.2)
r.Processing
Homogeneous points whose last =
element is w̃primitives
Geometric
x̃ = (x̃, ỹ, w̃) = w̃(x, y, 1) = w̃x̄, (2.2)
ty(x,and
y, 1)
doisnot
the have
augmented vector. Homogeneous
an equivalent points whose last element is w̃ =
inhomogeneous
d ideal
2D points 2D
lines orcan lines
points
alsoat be
infinity and do not have
represented usingan homogeneous
equivalent inhomogeneous
coordin
e augmented vector. Homogeneous points whose last element is w̃ =
ion.
o be represented using homogeneous coordinates l̃ = (a, b, c).
sponding line
Canequation is
be represented by homogeneous coordinates
or points at infinity and do not have an equivalent inhomogeneous
tion
ed is homogeneous coordinates l̃ = (a, b, c).
using
2D lines can also be represented using homogeneous coordinates l̃ = (a, b, c).
x̄ · l̃ = ax + by + c = 0.
ponding x̄ line
· l̃ =equation
ax + isby + c = 0. (2.3)
x + by + c = 0.Normalizing the line equation we have
(2.3)
normalize
also be represented
the line equation x̄ · l̃using homogeneous
= ax vector
+ by + cso = 0. that coordinates
l = (n̂x , n̂yl̃, d) == (a,(n̂,
b, c).
(2.3)d)
quation
equation vector
is so that l = (n̂x , n̂y , d) = (n̂, d) with kn̂k = 1. In
n̂ is the
so that l = normal
(n̂
2.1 Geometric
x , n̂ vector
, d) = (n̂,
yprimitives perpendicular
d) with kn̂k = 1. to
and transformations In the line and d is its 33 dist
ector
malize
cular perpendicular
to the
the line
line equation
and d to thedistance
vector
is its line
so that and d(n̂isxorigin
lto=the ,its
n̂y ,distance
d) = (n̂, d) towith
the kn̂k
origin
= 1. In
.2).
is
(The
the normal
one vector
exception
perpendicular
to this to the
normalization
line Normal
and is its
is ^the to
distance
line
the
at infin
origin
eption to x̄

normalization this = line+yatbyinfinity
· isl̃ normalization
theax +^ c is=the
l̃ 0.
=line(0, 0, at1),infinity l̃ =
d z n (0, 0, 1),(2.3)
ludes
). (The all
one (ideal)
exception points
to thisat n
infinity.)
normalization isvector
the line at infinity l̃ = (0, 0, 1),
)ints at infinity.) l m
udes
fn
ne all (ideal)
also
equation
rotation express
angle
points
vector
s a function of rotationxdangle
✓, n̂

at
so
=
infinity.)
as a
that
(n̂ function
, l
n̂ =
y ) (n̂
= of rotation
We can
= angle
express
(n̂, normal
✓,
✓, n̂ = (n̂x , n̂y ) = (cos ✓, sin ✓) In
,
(cos
x n̂ ✓,
y , d)
sin ✓) d) with n̂ =
kn̂k (n̂
vector
= 1.
x as:
, n̂y
also express n̂ as a function θof rotation angle x ✓, n̂ = (n̂dx , n̂y ) = (cos ✓, sin ✓)
.2a).
al vector
monly
ntation This
used
is inrepresentation
perpendicular
commonlythe Hough used to the
transform
inis the
commonly
line and d is
line-finding
Hough used
transform iny the
xits distance
Hough
to tran
the origin
line-finding
a). This representation is commonly used in the Hough transform line-finding
exception to this normalization
(a) is the line
lineatat infinity
infinity
(b) l̃ = (0, 0, 1),
When using homogeneous coordinates, we can compute the intersecti
Image l̃ = x̃
algorithm, which is discussed in Section 1 ⇥ x̃
4.3.2. The2 . combination (✓, d) is als
Processing Geometric primitives
polar coordinates.
x̃ = l̃1 ⇥ l̃2 ,
ying toWhen
fit an intersection
using point to multiple
homogeneous coordinates,
2D lines linestheor,
we can compute conversel
intersection of tw

east squares
e ⇥ is techniques
the cross product of
Intersection (Section
operator. 6.1.1
x̃ = l̃1 and
two linesSimilarly, ⇥ l̃the Appendix A.2) can b
2 , line joining two poin

ise 2.1.
where ⇥ isLine
the cross product
twooperator.
points Similarly, the line joining two points can b
joining l̃ = x̃1 ⇥ x̃2 .
l̃ = x̃1 ⇥ x̃2 .
cs. There
n trying are
to fit2D another
Conics algebraic
intersection curves
point that canlines
to multiple be expressed
or, conversely,wit
neous
ts, leastequations.
When
squares to fitFor
tryingForm example,
an intersection
example
techniques a conic
(Sectionthe
point conic
to
sections
6.1.1 sections
multiple
(so
and called (so
lines or,
because
Appendix called
conversely,
A.2)they
canabeca
line
be
points, least arise
squaresasatechniques
the (Sectionof6.1.1
intersection and Appendix
a plane a 3D A.2)
andusing cone)can be be
can used,
ion of
xercise a plane
2.1.
in Exercise 2.1.
and 3D cone) can be written a quadric eq
written using a quadric equation
T
2D conics.
onics. There There are other
are other algebraic x̃
algebraiccurves
curves = can
Qx̃that
that 0.
canbebe
expressed with simple
expressed with
homogeneousQuadric
equations. For example,
equations play the conicroles
useful sections
in (so study
the called of
because they
ogeneous equations. For example, the conic sections (so called becau
equations play
intersection auseful
ofmulti-view
plane androles
a 3D in the
cone)
geometry can
and study ofcalibration
be written
camera multi-view
using a quadricgeometry
equation
section of a plane and a 3D cone) can be written using a quadric equa
rtley and Zisserman 2004; Faugeras and=Luong
x̃T Qx̃ 0. 2001) but are no
-view
he study geometry Quadric
and equations
camera calibra-play
tion (Hartley
of multi-view geometry and camera calibra- useful roles
and Zisserman in
20
Image this book.
2001)
eras andbut are
Luong
Processing 2001)tion
not used
but (Hartley
are not and
extensively Zisserman
inin
used extensively
Geometric primitives 2004; Fa
this book.
3D Points 3D points. Point coordinates
Can be described in ordinates
inhomogeneous
dimensions can be written using inhomogeneous co-x = (x, y, z) 2 R3
o
3it is sometimes useful to denot
eneous coordinates
be written using3D
or x̃ = (x̃,
inhomogeneous
points. Point
ỹ, z̃, w̃) 2 P . As before,
co-
coordinates in thre
oint using the augmented vector x̄3= (x, x̃ y, = w̃x̄.
z, 1) with
es x̃ = (x̃,Itỹ, z̃, w̃)
is often . As abefore,
ordinates x = (x, y, z) 2 R or homo
2 toPdescribe
useful 3
3D point as an augmented
vector
gmented vectoritx̄is=sometimes
(x, y, z, 1)useful
with to denote a 3D
with x̃ = w̃x̄.
Figure 2.3 3D line equation, r = (1 )p + q.
ure Image
2.3 3D line equation, r q (1
= )p + q.
ine equation, r = (1 Geometric
Processing x )p + q. primitives
y

s. 3D planes can2.3also be equation,


represented as)phomogeneous coordinates
3D Planes
Figure
can also be represented = (1 +
as homogeneous coordinates m̃ = (a, b, c, d)
3D line r q.
rresponding
ne equation plane equation
presented asCan
homogeneous homogeneous m̃ = (a, b, c, d)
be described in coordinates
D planes. 3D planes can also be represented as homogeneous coordinates m̃ = (a, b, c, d)
th a corresponding
x̄ · m̃ = axplane+equation
by + x̄ cz + d==ax
· m̃ 0. + by + cz + d = 0. (2.7)
ons x̄ · m̃ = ax + by + cz + d33
= 0. (2.7)
=enWe
the
also
ax plane
+ byequation
normalize
+ cz as
the
+ =
plane
m
d = (n̂
equation
0. x , n̂y , n̂z , as = (n̂,
d) m = d)(n̂with kn̂k = d) 1. = (n̂, d
x , n̂y , n̂z ,(2.7)
can also normalize the plane equation as m = (n̂x , n̂y , n̂z , d) = (n̂, d) with kn̂k = 1.
rmaln̂vector
se, is the perpendicular
normal vectorto theperpendicular
plane and d is to its the
distance
plane to and
the d is it
z n^
this case, n̂ is the normal vector perpendicular to the plane and d is its distance to the
sginwith theas case of= 2D lines, the plane at=plane
infinity =m̃(0,
m̃with 0, 0, 1),
gure 2.2b). As withx theycasez of 2D lines, the plane at=1),
quation
(Figure 2.2b).m As (n̂
with the , n̂
case ,
of n̂
2D , d)
lines,plane
the (n̂,
at infinity
at d)
infinity =kn̂k
(0, 0, 0, 1. m̃
infinity
oints at infinity,
hich contains cannot
all the points be normalized
at infinity, cannot
m (i.e., it does
be normalized (i.e., itnot
doeshave a unique
not have a unique
perpendicular
ntains
e). or a all
rmal finitethe points
distance). to the plane cannot
at infinity, and d be is its distance(i.e.,
normalized to theit does n
aa
seWefinite
of
x 2D distance).
can express
function oflines, dthe (✓,plane ), at infinity m̃ = (0, 0, 0, 1),
n̂ as a function of two angles (✓, ),
two angles
n express n̂ xas a n̂function
y ✓ cosof, sin
= (cos two sin ),(✓, ),
angles
✓ cos (2.8)
y, cannot be normalized (i.e.,
n̂ = (cos ✓ cos , sin ✓ cos , sin ), it ,does not have a unique
(2.8)
(b) but these are less commonly used than polar coordinates
e., using spherical coordinates,
ce they do notbut
ordinates, uniformly
these sample
are =the(cos
n̂ less space ✓of cos
commonly possible sinthan
,normal
used cos
✓vectors.
polar , sin ),
coordinates
Dtwo
plane angles (✓, ), in terms of the normal
equation, expressed
e, n̂ is the normal vector perpendicular to the plane and d is its distance to the
7
2.2b). As with the case of 2D lines, the plane at infinity m̃ = (0, 0, 0, 1),
gureImage
tainsGeometric
all the points at Primitives
Processing Geometric
infinity, cannot be normalizedprimitives
(i.e., it does not have a unique
a finite distance).
•  Spherical coordinates!
•  3Dn̂planes!
n express Spherical
as a function coordinates
of two angles (✓, ),
– 
can be written as a functio
n̂ =can
(cosbe written
✓ cos , sin as a function
✓ cos oftwo
, sin ), of twoangles
angles (2.8)
.!

g spherical coordinates, but these


n̂ = are
sinless commonly
φ cos θ , sin φused
( sin θthan polar
, cos φ coordinates )
do not uniformly sample the space of possible normal vectors.

Lines in 3D arei.e., using spherical


less elegant than eithercoordinates,
lines in 2D orbut
Images on this slides are
planes in 3D. One possible
sourced from the Szeliski book and Wolfram website.!

tion is to use twothese areonless


points the commonly
line, (p, q). used
Any other than point on the line can be
as a•  linear polar coordinates
combination of these two since they do not
points
Spherical coordinates!
uniformly sample the space of
–  canpossible
be written as a vectors.
= (1
r normal function
)p + q, 
 (2.9)
of two angles .!
n Figure 2.3. If we restrict 0   1, we get the line segment joining p and q.
use homogeneous coordinates, we can write the line as

r̃ = µp̃ + q̃. (2.10)


not uniformly
since they
ordinates, butdo
thesesample
not are
i.e., the
uniformly spacecoordinates,
sample
less spherical
using commonly of possible
the
used space of but
than polar normal
possible vector
these normal
coordinates
are lessv
g spherical
Image coordinates, but these are less commonly used than pola
mly sample the space
Processing sinceofthey
possible
do34 normal
not vectors.
Computer
uniformly
Geometric
Vision:the
sample
primitives
Algorithms
space and
of Appli
pos
do not uniformly sample the space of possible normal vectors.
3D in
nes lines. Lines
3D are lessin elegant
3D are lessthanelegant
eitherthan either
lines inz lines
2D or in 2D
planeor
less elegant3D
arerepresentation than
3D lines
islines.
to uselines
either two inpoints
Lines 2D
in or on
3D the
planes
are inline,
less 3D. (p,
Oneq).
elegant Any
possible
than p othe
either l
n is to in
Lines use
3Dtwo points
are less onthan
elegant the either (p, q).
line, lines in 2DAny λother
or planes in poi
3D.
two points onasthe
expressed line, (p,
a linear Any
combination
representationq). is to other pointpoints
of these
use two twoon the line
points
on thecan be (p,
line,
r=(1-λ)p+λq
a linear
tion is tocombination
use Consider
two two
points
mbination of these two points of
onthese
points
the on two
line, points
the (p,
line (p,q)Any other point on th
q).
expressed as a linear combination of these q two point
as a linear combination of these two points r = (1 )px + q,y
r = Any(1 other )p +point
r q,=on the line can be
(1 )p + q, (2.9)
expressed r = (1 )p 0+Figure
r = (1 )p +
as shown in Figure 2.3. If we restrict  q,  2.3 1,3Dwe lineget the rline
equation, = (1s
f we restrict 0   1, we get the line segment joining p and q.
igure
in
If2.3.
Figure 2.3.If
we use
Ifwe
we restrict
homogeneous
as shown in
restrict 0 coordinates,
Figure
0   2.3.
1, 1,
If
we we
we
get get
werestrict
the the
can write
line 0 line
the 
segment segme
line1, as
we
joinin
ous coordinates, we can write3D theplanes.
line as3D planes can also be represented as homogen
homogeneous
use homogeneous If we use homogeneous
coordinates,
coordinates, we we can can write coordinates,
write
the the
line line
as weas can writ
In homogeneous coordinates r̃ = µp̃ + q̃.
with a corresponding plane equation
r̃ = µp̃ + q̃. (2.10)
r̃ r̃=the +
x̄ · m̃ = ax r̃ += byµ+p̃cz+ + d q̃=
A special caseIf of
q isthis
at is when
infinity
when the second point is at infinity,
=µp̃second
µ p̃ +q̃. point
ˆ q̃.
= (dx , the
i.e.,alsoq̃ normalize , dis
ˆz ,at
dˆy plane 0)infinity,
= (as ˆm0).
d, i.e., q̃
We can equation = (n̂x , n̂
e Here,
case weofsee
of this
direction istheAthat
whenline.dˆWe
special
the case
issecond
the
can ofpoint
Indirection
this
then this
case, isof
is
re-write
n̂ when
isat the
the thevector
line.
normal
infinity,
the second
i.e.,
inhomogeneous =point
Weperpendicular
q̃can then
(dˆx3D, is
ˆ
dto at
theˆi
re-w
y , dˆ
e of
linethis is when
equation
ˆ Here,
as the
we second
see that
origin point
ˆ is 2.2b).
(Figure
d the is direction
at
As infinity,
with the of casethei.e.,
of 2D q̃
line. =We
lines, the(d
z
pc
see that d is the direction ofwhich the contains
line. We can then re-write cannotthe inhom
ˆ all the points at infinity, be normaliz
Image
Processing Geometric Transformation

2D transformation
36 Computer Vision: Algorithms and Applications (September 3, 20

y similarity projective
translation

Euclidean affine
x

Basic set Basic


Figure 2.4 of 2D planar
set of transformations
2D planar transformations.

Translation. 2D translations can be written as x0 = x + t or


h i
x0 = I t x̄
Image
Euclidean affine
36 Computer Vision: Algorithms and Applications (Septe
Processing Geometric Transformation x
Basic set of 2D planar transformations.
2D Translation
y similarity projective
tx
Figure 2.4 Basic set of 2D planar
translation transformations.
an be written as x0 = x + t or t y
h i Euclidean affine
2×3 0 3×1
x = I t x̄ x = x + t or
0
translations can be written as
h i
atrix or x0
= I Figure
t 2.4
x̄ Basic set of 2D planar transformations.

2) identity"matrix
Translation. #2D translations can be written as x0 = x + t or
or Identity h i
I t "matrix #
x̄ =
0 x = I t x̄
0

2×2 I t matrix or
T 0
1is the (2 ⇥ 2) identity
0wherex̄ I = x̄
0 T
1 "
I t
#

a2⇥3 matrix
Using results
a full rankin a
3×3 more compact x̄0 =
0 T
1 notation,

wh
vector. Using a 2 ⇥ 3 matrix results in a more compact notation
hich can be where
obtained from
0 is the zero Using2a 2⇥
vector.the ⇥ 33matrix
matrix by
results in appen
a more compa
gequation
1]a row)
full-rank
makes 3 ⇥it3an
where matrix
possible (which can
to chain
augmented besuch
obtained
as x̄from
transformations
vector using
appears 2on 3 matrix
the matrix
⇥both byitappe
multiplication
sides, can
1] row)
ny
ed Imagemakes
equation
with it possible
a fullwhere to chain
an augmented
homogeneous transformations
x̃. such as x̄using
vector
vector matrix
appears on multiplication. N
both sides, it can
nyProcessing
equation
aced with awhere Geometric
an augmented
full homogeneous suchTransformation
vectorx̃.
vector as x̄ appears on both sides, it can alw
aced with a full homogeneous vector x̃.
on + translation.2D This transformation
Rotation is also known as 2D rigid body mo
+ Translation
ation
clidean+ translation.
transformation This
(since
also known
transformation
Euclidean
as 2D rigid
is2Dalso
distances
body motion or the
known
Euclideanare
as 2D rigid Itbody
preserved).
transformation can mo
be
ation + translation.
Euclidean transformationThis (since
transformation
Euclidean is also knownare
distances aspreserved).
2D rigid body It motion
can be
Rx + t or h i
Euclidean
= Rx + t transformation
or (since Euclidean distances are preserved). It can be wr
x = hR t ix̄
0
= Rx + t or x0 =h R ti x̄
x0"= R t x̄ #
re "cos ✓ sin ✓ #
re R = " cos ✓ sin ✓#
R = cos sin✓✓ cos sin ✓✓
R= sin ✓ cos ✓
rthonormal rotation matrix with RR sinT✓= Icos and✓ |R| = 1.
n orthonormal rotation matrix with RR T
= I and |R| = 1.
orthonormal rotation matrix with RR = I and |R| = 1.
T

rotation. Also known as the similarity transform, this transformation


led rotation. Also known as the similarity transform, this transformation
x =0 sRxAlso
dedasrotation.
0
+ t known
where as s istheansimilarity
arbitrary transform,
scale factor. thisIt transformation
can also be writtecan
sed as x0 = sRx + t where s is an arbitrary " scale factor. It#can also be written
sed as x = sRx + t where h s is an arbitrary
i scale
" a factor. It can
# also be written as
h i " a bb ttx#
x =0
sR t i x̄ = x̄,
x0 =h
x
x̄ = x̄,
e "" ##
si
Image cos
cos✓✓ sin
sin✓✓
Processing is an orthonormal
R==
Geometric
R rotation
sin cos✓✓matrix
Transformation
sin✓✓ cos with RR
orthonormal 2D scaled
orthonormalrotation
rotationmatrix Rotation
matrixwith
with RRTT ==IIand
RR |R|==1.1.
and|R|
Also known as similarity transform
Scaled rotation. Also known as the sim
d rotation.
ed
Geometric
rotation. Also
Alsoknown
Transformations
knownasas0the
thesimilarity
similaritytransform,
transform, this
thistransformat
transforma
pressed as x = sRx + t where s is an arb
d asasxx ==sRx
ed 00
sRx ++scaled
•  2D ttwhere
wheressisisan
rotation arbitrary
anor
arbitraryscale
scale
similarity factor.
factor.ItItcan
transform! canalso
alsobe
beww
"" h ## i
hh ii aa 0 bb ttxx
xx == sR
00
sR tt x̄x̄== x =
bb aa ttyy sR x̄, t
x̄, x̄
ewe
wenonolonger
longerrequire thataa22++bb22 == 1.1. The
requirethat Thesimilarity
similaritytransform
transformpp
en lines.
een lines. where
–  we
Constraintno longer isrequire
not that
enforced. ! a 2
+ b 2

• between lines.
2D affine transformation!

The similarity transform preserves angles between lines


etricImage
primitives and transformations
Processing Geometric Transformation

2D Affineis written as x0 = Ax̄, where A is an ar


The affine transformation
., " #
a00 a01 a02
x =
0
x̄.
a10 a11 a12
nes remain parallel under affine transformations.

You can also use a 3×3 matrix adding raw [0, 1]


. This transformation, also known as a perspective transform or
n homogeneous coordinates,

x̃0 = H̃ x̃,

The3 affine
is an arbitrary ⇥ 3 matrix. Note
transform that H̃parallelism
preserves is homogeneous, i.e., it is
between lines
el under affine transformations.
Image
Processing Geometric Transformation

mation, also
Geometric known as a perspective transfor
Transformations
2D Projective
Also known as perspective transform or homography

oordinates,
•  2D projective, also called the homography!

0
x̃ = H̃ x̃,

⇥ 3 matrix. Note
•  Projective matrixthat is defined
H̃!
is homogeneous,
up to scale. ! i.e
•  Inhomogeneous matrix H
•  Projective results areiscomputed
defined upafter
to scale
homogeneous
H̃ matrices •  that differ results
Inhomogeneous
operation.! onlyareby scaleafter
computed are equiva
0 homogeneous operation.
must be normalized in order to obtain an inh
00 01 02
x0 = x̄. (2.1
a10 a11 a12
Image
Parallel lines remain parallel
Processing under affine transformations.
Geometric Transformation
mation, also
Geometric known as
Transformations a perspective transform
Projective. This transformation, also known as a perspective transform or homograph
2D Projective
oordinates,
operates on homogeneous coordinates,
•  2D projective, also called the homography!
Also known as perspective transform or homography

x̃0 = H̃ x̃, (2.2

0 3 ⇥ 3 matrix. Note that H̃ is homogeneous, i.e., it is only defin


x̃ = H̃ x̃,
where H̃ is an arbitrary
up to a scale, and that two H̃ matrices that differ only by scale are equivalent. The resultin
homogeneous coordinate x̃0 must be normalized in order to obtain an inhomogeneous resu
x, i.e.,
h00 x + h01 y + h02 h10 x + h11 y + h12
⇥ 3 matrix. Note that H̃ is homogeneous, i.e.
•  Projective matrix
x0 = is defined
h20 x + h21 y + h22
and yup0
= to scale. !
h20 x + h21 y + h22
. (2.2
•  Inhomogeneous results are computed after homogeneous
Perspective transformations preserve straight lines (i.e., they remain straight after the tran
H̃ matrices that differ only by scale are equiva
operation.!
formation).

must be normalized
Hierarchy of 2D transformations.in Theorder
preceding settoof obtain an
transformations inho
are illustrat
in Figure 2.4 and summarized in Table 2.1. The easiest way to think of them is as a s
of (potentially restricted) 3 ⇥ 3 matrices operating on 2D homogeneous coordinate vector

+h y+h TheZisserman
Hartley and perspective
01 transformations.
planar 02
(2004)transform
contains a more
h x+h y+h
preserves straight lines
detailed description
10
of the hierarchy of 2
11 12
36Image Computer Vision: Algorithms and Applications (September 3, 20
Processing Geometric Transformation – Hierarchy 2D

y similarity projective
translation

38 ComputerEuclidean
Vision: Algorithms andaffine
Applications (September 3, 2010 dr
x
Transformation Matrix # DoF Preserves Icon

Figure 2.4 Basich set of i2D planar transformations.


translation I t 2 orientation
2⇥3
h i ⇢S

Translation. 2D translations R t as x = 3x + tlengths
can be written
rigid (Euclidean) or 0 S
S
S⇢⇢
h 2⇥3 i
h 0 i
xsR = t I t x̄4 ⇢S
similarity angles S⇢
2⇥3
where I is the (2 ⇥ 2) identity matrix orh i
affine A" # 6 parallelism ⇥⇥ ⇥⇥
2⇥3
h iI t
projective x̄ =H̃
0
8x̄ straight lines `
`
03⇥3 T
1
Image
Processing Geometric
2.1 Geometric primitives and Transformation
transformations – Hierarchy 3D

Transformation Matrix # DoF Preserves Icon


h i
translation I t 3 orientation
3⇥4
h i ⇢S

rigid (Euclidean) R t 6 lengths S
3⇥4
S
S⇢⇢
h i
⇢S
similarity sR t 7 angles S⇢
3⇥4
h i
affine A 12 parallelism ⇥⇥ ⇥⇥
3⇥4
h i
projective H̃ 15 straight lines ``
4⇥4

Table 2.2 Hierarchy of 3D coordinate transformations. Each transformation also p


the properties listed in the rows below it, i.e., similarity preserves not only angles
parallelism and straight lines. The 3 ⇥ 4 matrices are extended with a fourth [0T 1
of groups.
Image Hartley and Zisserman (2004, Section 2.4) give a more de
Processing 3D Transformation
hierarchy.
Computer Vision: Algorithms and Applicatio
can
n. 3D betranslations
writtencanasbex
Computer Vision:
Translation
0
= asxand
Algorithms
written x+ ort or
0 Applications (September
=tx+
h i
is the (3 ⇥ 3)
h
identity
i
xmatrix
0
= I and t 0x̄is the zero vector.
he (3 ⇥ 3) identity matrix and 0 is the zero vector.
x =
0
I t x̄
omputer Vision: Algorithms and Applications (September 3, 2010 draft)
3×3

3) identity matrix and 0 is the zero vector.


ntranslation.
+ translation. +Also
Also known
Rotation asknown
3D rigid
Translation asbody
3D motion
rigid body
or the motio
3D Euc
ttion.
can be
n, it Also written
canknown as
be writtenx 0
=as
as 3D rigid Rx += t or
x0motion
body or the+
Rx or trans-
3DtEuclidean
written as x0 = Rx + t or h i
h ix0 = hx̄ i
R 0t
x = R t x̄
0
x = R t(2.24)x̄
a 3 ⇥ 3 orthonormal
orthonormal rotation matrixrotation
with RRTmatrix with
= I and |R| =
RR = that
1. TNote I and |R| =
Rit is a 3 3 orthonormal rotation matrix
e convenient to describe a rigid motion using
⇥ with
is more convenient to describe a rigid motion using RR T
=
h i
Image
Processing
x 0
= R t
3D Transformation

Rotation + Translation cont’d


a 3 ⇥ 3 orthonormal rotation matrix with RRT = I a
it is more convenient to describe a rigid motion using
Sometime useful to rotate based on a point

x0 = R(x c) = Rx Rc,

he center ofwhere c is the


rotation center the
(often of rotation
camera (often the camera center)
center).
ctly parameterizing a 3D rotation is a non-trivial task, whi
w.

ation. The 3D similarity transform can be expressed as


ItCompactly
can also
Compactly
Image be written
parameterizing
parameterizing aa3D as isis aa non-trivial
3Drotation
rotation non-trivial task, which
which we
we descri
descr
tail below.
ilProcessing
below. 3D Transformation
h i
The=
0
Scaled Rotation
led rotation. x
aledrotation. 3Dsimilarity
similarity t x̄.can
sR0 transform
transform can be
be expressed
expressed as x0 =
as x
0
sRx+
= sRx +
n be expressed as x = sRx + t where s
nanarbitrary
arbitraryscale
The 3D
scalefactor.
factor.ItItcan
canalso
alsobebewritten
written as
as
hh i
i
ves angles between lines x
x =
0
0
= and
sR planes.
sR x̄.
tt x̄.
i
his transformation preserves angles between lines and planes.
s transformation preserves angles between lines and planes.
x̄. (2.26)
Affine
orm is written as x 0
= Ax̄,
ffine. The affine transform is written where
as x00 = A
Ax̄, where Aisis an arbitra
an arbitrary
ne. The affine transform is written as x = Ax̄, where A is an arbitrary 3
3⇥
.,
and planes.
2
2
2 a00 a01 a02 a03 3 3
3
6 a00 a01 a02 a03 7
a x =64 a10 a11 a12 a13 7
0
a a a 5 x̄.
00 x0 01 02 03
= 4 a10 a11 a12 a13 5 x̄.
6 a20 a21 a22 a23 7
x =4 a
0
10 a
11 a a
a20 a21 a22 a23
rallel lines and planes remain parallel12
5 x̄.
13 transformations.
Ax̄, where A is an arbitrary 3 ⇥ 4 matrix,
under affine
s on homogeneous
Image
Processing coordinates,
3D Transformation

Projective
0
x̃ = H̃ x̃,
4×4 4×1

⇥ 4 homogeneous matrix. As in 2D
Remember that to get inhomogeneous coordinates, the
alized in order
resulting to obtain
homogeneous an inhomo
x! ʹ must be normalized

aight lines (i.e., they remain straigh


ve,Now
since this more accurately models the behavio
that we know how to represent 2D and 3D geometric primitives and how to tr
Image
them spatially, we need to specify
3D how
to3D 2D primitives are projected onto the image pl
projections
Processing
can do this using a linear 3D to 2D projection matrix. The simplest model is ortho
para-perspective
which requires no division to get the final (inhomogeneous) result. The more commo
model is perspective, since this more accurately models the behavior of real cameras
How 3D primitives are projected onto the image plane?
rojection
Orthography simply drops the z component of the th
and para-perspective

e 2D pointOrthography
An orthographic (In this
x.projection simplysection, we use
drops the z component
and para-perspective
the to
of p denote 3
three-dimensional
nate p to obtain the 2D point x. (In this section, we use p to denote 3D points and x t
an2Dbepoints.)
written
This can as
simplybedrops
writtenthe
as z component of the three-dimensional
coordinate p to obtain the 2D point x
x = [I 2⇥2 |0] p. ⎡ X ⎤
x = [I |0] p.
2⇥2 we can write ⎢ ⎥
If we are using homogeneous (projective) coordinates, p =

Y

⎢ Z ⎥
2 3 ⎣ ⎦
1 0 0 0
mogeneous (projective) coordinates, we can write
6
x̃ = 4 0 1 0 0 5 p̃,
7
0 0 0 1
2 3
Image
Processing to=2D
3D x [I 2⇥2 |0] p.
projections
If we are using homogeneous (projective)
2.1 Orthography
Geometric primitives and
coordinates, we can
para-perspective
and transformations
write
cont’d 47
2 3
1 0 0 0
6 7
x̃ = 4 0 1 0 0 5 p̃,
0 0 0 1

(a) 3D view (b) orthography

Orthography is an approximate model for long focal


length (telephoto) lenses and objects whose depth is
shallow relative to their distance to the camera
we drop the z component but keep the w component. Orthography is an ap
Image
elProcessing
for long focal length (telephoto)
3D tolenses
2D and objects2.1 Geometric
projections whose depth
primitives is shallo
and transformations

eir distance to the camera (Sawhney and Hanson 1991). It is exact only for
But we need to fit world coordinate expressed in meter
s (Baker and Nayar 1999, 2001).
onto image plane in millimeter/pixel
n practice, world coordinates (which may measure dimensions in meters)
d to fit onto an image sensor (physically measured in millimeters, but ultim
in pixels). For this reason, scaled
Scaledorthography
Orthographyis actually more commonly u
(a) 3D view

x = [sI 2⇥2 |0] p. (a) 3D view

model is equivalent to first projecting the world points onto a local fronto-par
and then scaling this image using regular perspective projection. The scaling(d
(c) scaled orthography

for all parts of the scene (Figure 2.7b) or it can be different for objects tha
eled independently (Figure 2.7c). More importantly, the scaling can vary from
e when estimating structure from motion, which can better model the scale c
rs as an object approaches the camera. (c)Figure
scaled
(e) perspective
orthography
(

2.7 Commonly used projection models: (a) 3D


caled orthography is a popular model for reconstructing scaledthe 3D(d)shape
orthography, of(e)objec
para-perspective, perspectiv
shows a top-down view of the projection. Note how p
box sides remain parallel in the non-perspective project
is model is equivalent to first projecting the world points onto a local fronto-parallel image
Image
ane and then scaling this image using regular perspective projection. The scaling can primitives
be theand transformations
Processing 3D to 2D projections 2.1 Geometric
me for all parts of the scene (Figure 2.7b) or it can be different for objects that are being
odeled independently (Figure 2.7c). More importantly, the scaling can vary from frame to
me when estimating structure from motion, which can better model the scale change that
curs as an object approaches the camera.
Para-perspective
Scaled orthography is a popular model for reconstructing the 3D shape of objects far away
om the camera, sinceobject
it greatlypoints
simplifiesare
certain computations.
first projectedForonto example, pose (camera
a local
entation) can be estimated using simple least squares (Section 6.2.1). Under orthography,
reference parallel to the image plane
ucture and motion can simultaneously be estimated using factorization (singular value de- (a) 3D view

mposition), as discussed in Section 7.3 (Tomasi and Kanade 1992).


However,
A closely related projection rather
model than being
(a) 3D projected
view(Aloimonos
is para-perspective 1990; Poelman(b) orthography
and
anade 1997). In thisorthogonally
model, object pointsto this plane,
are again they are
first projected projected
onto a local reference
rallel to the image plane. However, rather than being projected orthogonally to this plane,
parallel to the line of sight to the object center is
ey are projected parallel to the line of sight to the object center (Figure 2.7d). This (c) scaled orthography (d
lowed by the usual projection onto the final image plane, which again amounts to a scaling.
e combination of these two projections is therefore affine and can be written as
2 3
a00 a01 a02 a03
6 7
x̃ = 4 a10 a11 a12 a13 5 p̃. (2.49)
0 0 0 1 (e) perspective (
(c) scaled orthography (d) para-perspective
Figure 2.7 Commonly used projection models: (a) 3D
ote how parallel lines in 3D remain parallel after projection in Figure 2.7b–d. scaled
Para-perspective
orthography, (d) para-perspective, (e) perspectiv
ovides a more accurate projection model than scaled orthography, without incurring shows a theof the projection. Note how p
top-down view
box sides remain parallel in the non-perspective project
Image
Processing 3D to 2D projections
(a) 3D view
eometric primitives and transformations 49
Perspective
Geometric primitives and transformations 49
ir z component. points are projected onto
Using inhomogeneous the image
coordinates, this plane
can be written(a) as
3D view (
heir z component. Using inhomogeneous coordinates, this can be written as
by dividing them by their 2 z component
3
2 x/z 3
6 x/z 7
x̄ = Pz (p) = 64 y/z 75 . (2.50
x̄ = Pz (p) = 4 y/z 5 . (2.50)
1
1
(c) scaled orthography
(c) scaled orthography (d)

mogeneous
omogeneouscoordinates,
coordinates,the
the projection
projection has
has a simple
a simplelinear
linearform,
form,
In homogeneous coordinates we have
22 33
11 00 00 00
66 77
x̃x̃= 0 1 0
= 4 0 1 0 0 55p̃,
4 0 p̃, (2.51
(2.51)
(e) perspective (f
00 00 11 00 Figure 2.7 Commonly used projection models: (a) 3D
scaled orthography, (d) para-perspective, (e) perspective
shows a top-down view of the projection. Note how pa
ewe
drop
dropthethewwcomponent
componentwe of p. Thus,
of p.
drop wafter
Thus,
the projection,
projection,
component ofititpisisnot
notpossible
possibletotorecover
recover
theth
box sides remain parallel in the non-perspective projectio

ce ofof
ance thethe3D
3Dpoint
pointfrom
fromthe
theimage,
image, which makes sense
which makes sensefor
foraa2D (e) perspective
2Dimaging
imaging sensor.
sensor.
A form
form oftenseen
often seeninincomputer
computergraphics
graphics systems
systems is aatwo-step
isFigure
two-step projection
projection
2.7 Commonly that
used first
that projects
first
projection project
models
Image
Processing 3D to 2D projections

Camera intrinsics

0 Once the 3DVision:


Computer point isAlgorithms
projected we
andmust transform
Applications the
(September 3
resulting coordinates according to the pixel sensor spacing
yc
xs
sx
cs pc
sy Oc zc
p xc
ys

Oc camera center
igure 2.8 Projection of a 3D camera-centered point pc onto the sensor plane
cs the 3D origin of the image plane coordinate system
. O c is the camera
sx andcenter (nodal
sy pixel point), cs is the 3D origin of the sensor plan
spacing
ystem, and sx and sy are the pixel spacings.
geometry involved. In this section, we first present a mapping from 2D pixel coordinate
Image3D rays using a sensor homography M , since this is easier to explain in terms of physic
0Processing Computer 3D to 2Ds
projections
measurable quantities.Vision: Algorithms
We then relate and Applications
these quantities (September
to the more commonly used camer3
trinsic matrix K, which is used to map 3D camera-centered points pc to 2D pixel coordin
Camera intrinsics
x̃s .
Image sensors return pixel values indexed by integer yc pixel coordinates (xs , ys ), o
xs
with the coordinates starting at the upper-left corner of the image and moving down an
s
the right. (This convention is xnot obeyed by all imaging libraries,
cs pc but the adjustment
other coordinate systems is straightforward.) To map pixel centers to 3D coordinates, we
scale the (xs , ys ) values sby
y the pixel spacings (sx , syO) (sometimes
zc expressed in microns
c
solid-state sensors) and then describe xc of the sensor array relative to the cam
p the orientation
projection center O c with an ys origin cs and a 3D rotation Rs (Figure 2.8).
The combined 2D to 3D projection can then be written as
2 3
sx 0 0 2 3
igure 2.8 Projection of a 3D h camera-centered
i 6 0 s point 0 7 pxs onto the sensor plane
6 y 76 c 7
p = R s cs 6 7 4 ys 5 = M s x̄s . (2
. O c is the camera center (nodal point), c0s is0the0 3D origin
4 5
1 of the sensor plan
0 0 1
ystem, and sx and sy are the pixel spacings. 8 unknowns
The first
•  two columns of
3 parameters the 3 ⇥ 3rotation
describing matrix M Rs,s are the 3D vectors corresponding to unit s
in the • image pixel array
3 parameters along thethexstranslation
describing and ys directions,
cs, and while the third column is the
amera intrinsics
image• array
2 scale factors
origin cs . (sx , sy )
s measurable
of freedom,
50quantities. Wesince thequantities
Computer
then relate these distance
Vision: moreof
Algorithms
to the the
commonly sensor
andused camera in-fro
Applications (S
Image
trinsic matrix K, which is used to map 3D camera-centered points p to 2D pixel coordinates
the x̃ . sensor spacing, based on external image measur
Processing
s
3D to 2D projections c

Image sensors return pixel values indexed by integer pixel coordinates (x yc , y ), often
mating a camera
Camera
with the coordinates starting atmodel
the upper-leftM
intrinsics cornersofwith xs the
the image required
and moving
s
down and tosev
s

the right. (This convention is not obeyed by all imaging


sx libraries, but the adjustment for
pc
t two columns
other coordinate systems isare orthogonal
straightforward.)
c s
To map pixelaftercentersan
to 3Dappropriate
coordinates, we first re
scale the (x , y ) values by the pixel spacings (ssy, s ) (sometimes expressed O in micronszfor
erssolid-state
assume sensors)a andgeneral 3 orientation
⇥ 3 homogeneous matrix fo
s s x y
c c
then describe the of the sensor arrayxrelative to the camera
p c
projection center O with an origin c and a 3D rotation ys R (Figure 2.8).
ip between the
to 3D3D pixelcan thencenter
be written asp and the 3D came
c s s
The combined 2D projection
2 3
own scaling s, p
Figure 2.8 Projection
= sp
6
s
x
c .
0 We 0 2 can
7 x
3
therefore write t
6 of0 as 3D0camera-centered point pc onto the s
h i s
y 76 7
p= R c 6 7 4 y 5 = M x̄ . (2.53)
homogeneous p. O c is theversion of the pixel caddress as
s s s s s
camera center
4 0 0
0 0 1
0
(nodal 5 point),
1 x̃
s is the 3D origin s of the
system, and sx and sy are the pixel spacings.
The first two columns of the 3 ⇥ 3 matrix M s are the 3D vectors corresponding to unit steps
x̃ = ↵M
s s
1
c p = Kp .
in the image pixel array along the xs and ys directions, while the third column is the 3D
c
image arrayCamera
origin csintrinsics
.

calibration
Once we have projected a 3D point through an idealmatrix
pinhole using
K is called the calibration matrix and describes th
degrees of freedom (the full dimensionality of a 3 ⇥ 3 homogeneous matr
, then,
then, dodo
Image mostmost textbooks
textbooks on 3D
3Doncomputer
3D computer
vision vision
and and multi-vi
multi-view geom
Processing to 2D projections
3;Hartley
Hartleyandand Zisserman
Zisserman 2004;2004; Faugeras
Faugeras and 2001)
and Luong Luongtreat
2001)
K as treat
an uK
Camera intrinsics
ix with five degrees of freedom?
with five degrees of freedom?
hile thisthis
While is usually not made
is usually explicit
not made in these
explicit inbooks,
these itbooks,
is because
it is we
bec
l KK
ull matrix based
matrix K on external
calibration
based measurement
matrix
on external alone.
describes
measurement When
the camera calibrating
alone.intrinsics
When caliba
based on external 3D points or other measurements (Tsai 1987), we end
) based on external 3D extrinsics
Camera points ordescribe
other measurements (Tsaiin1987
camera’s orientation
trinsic (K) and extrinsic
space (R, t)pose)
(camera camera parameters simultaneously us
ntrinsic (K) and extrinsic (R, t) camera parameters simultan
rements, h i
surements, x̃s = K R ht pw =iP pw ,
x̃s = K R t pw = P pw ,
pw are known 3D world coordinates and
camera camera
re pw are known 3D world coordinates
intrinsics extrinsics and
P = K[R|t]
Camera matrix P = K[R|t]
wn as the camera matrix. Inspecting this
3×4 equation, we see that we can
T
n, R is an orthogonal rotation.) 6 7
K = 4 0 fy cy 5 ,
Image
everal
(Golub waysVan
and to Loan
write the upper-triangular
521996). formVision:
(Note theComputer
unfortunateofclash One possibility
K.Algorithms
of terminologies:isIn
and Applications (Sep
Processing 0 0 1
3D to32D projections
2
a textbooks, R represents an upper-triangular (right of the diagonal) matrix; in
fx s cx
on, which
R is anuses
orthogonal
Camera rotation.)
independent focal lengths7 fx and fy for the
6 intrinsics W-1sensor x and y dimens
K = 4 0 fy cy 5 , yc (2.57)
several ways to write the upper-triangular
s encodes any possible 0skew0 between form of K. One
the sensor possibility
axes due is
xs to the sensor not
2 13 0
perpendicular to the optical
fx s axis cx and (cx , cy ) denotes the optical center exp
ependent focal lengths 7 sensor0x and(cyx,cdimensions.
y) f
K = f4x and
0 ffyy forcythe The entry
coordinates. Another6 possibility is
5, (2.57)
possible skew between the0 sensor0 axes1 due to 2 the sensor not3being xc mountedzc
to the optical axis and (cx , cy ) denotes the optical
f center
s cxexpressed in pixel
dependent focal lengths fx and fy for the sensor H-1
6 y dimensions.7 The entry
Another possibility is K = x4and 0 y af cy 5 ,
Another
y possible skew between possibility
the sensor axes dueisto the sensor
s not being mounted
2 3 0 0 1
to the optical axis and (cx , cyf) denotes
s cthex
optical
wherecenter expressed
the aspect ratio in pixel
a has been
Figure 62.9 Simplified camera
7 made intrinsics
explicit showing
and a the focal
common length f a
focal
Another possibility
where the aspect is = a0hasafbeencymade
Kratio 5 ,andexplicit andWaandcommon focal
(2.58)length f
(cx , cy2).4The image width
3 height are
length f is used. H.
In practice, for many 0 0 1
f applications
s cx an even simpler form can be obtained b
and s = 0, 6 7
K = 4 0 af c , Van 2
5common 3 isthe(2.58)
ect ratio a has beenfactorization
made explicit
In many0 0 1 and
(Goluby
a
and focal
Loan length
1996). (Note
f used.
unfortunate clash
f 0 cx
matrix algebra textbooks, R represents an upper-triangular
, for many applications an even simpler form can
applications by setting a = 1
6 be obtained 7 (right of the
computer vision, R is an = 4 0 rotation.)
Korthogonal f cy 5 .
ect ratio a has been setting 2a = 1and a common
made explicit 3 focal length f is used.
e, for many applications There are
an even several wayscan
0 cform
fsimpler 0 the0upper-triangular
to write
be obtained
1
by setting a =form
1 of K. On
and s = 0 x
Image 52 Computer Vision: Algorithms and Applications (Sep
Processing 3D to 2D projections

Camera intrinsics W-1


yc
xs
0

0 (cx,cy) f
xc zc

H-1
ys

Figure
Usually 2.9 setting
Simplified
(cx,ccamera intrinsics showing the focal length f a
y) = (W/2,H/2) results in only one
(cunknown:
x , cy ). The image widthlength
the focal and height
f are W and H.

factorization (Golub and Van Loan 1996). (Note the unfortunate clash
matrix algebra textbooks, R represents an upper-triangular (right of the
computer vision, R is an orthogonal rotation.)
There are several ways to write the upper-triangular form of K. On
center of the lens). The sense of the y axis has also been flipped to get a coor
compatible with the way that most imaging libraries treat the vertical (row) coo
Image
tain graphics libraries, such3D
Processing to 2D use
as Direct3D, projections
a left-handed coordinate system, w
to some confusion.
Notes on focal lenght

A note on focal lengths


If pixel coordinates are in [0, W ) × [0, H )
The issue of how to express focal lengths is one that often causes confusion in
computer vision algorithms and discussing their results. This is because the
The focal
2.1 Geometric length
primitives andf transformations
and camera center (cx,cy) are in pixels
depends on the units used to measure pixels.
If we number pixel coordinates using integer values, say [0, W ) ⇥ [0, H), th
f and camera center (cx , cy ) in (2.59) can be expressed
W/2 as pixel values. How d
tities relate to the more familiarθ/2 focalf lengths used by photographers?
Z
Figure 2.10 illustrates the relationship between (x,y,1) the focal length f , the sen
and the field of view ✓, which obey the formula (X,Y,Z)

 1

Figure 2.10 Central projection, W W ✓
tan =showing or the relationship
f= between
tan the 3D . and 2D coord
nates, p and x, as well as the2relationship
2f between the focal2 length2 f , image width W , a
the field of view ✓.
For conventional film cameras, W = 35mm, and hence f is also expressed in
at we
equivalent
ow Image have
35mm shown
focal length,how to
multiply
that we have shown how to parameterize byparameterize
35/W . h
theathe
to go from
i calibration
calibration
unitless matrixmat
The conversion between the various focal
f to one expressed inK pix
ntrinsics
Processing and extrinsics 3D P to
together = 2D
K toprojections
mera intrinsics and extrinsics together to obtain a single 3to ⇥ R
f obtain
expressed t
a
in .
single
pixels the 3 ⇥ 4 cam
4equivalent
camera 35mm
ma
Camera Matrix
hh
Camera matrix
i i
times preferable
how to parameterize to use an
the calibration invertible
matrixPK,=
P =weKK can R4R
put ⇥ t 4 tmatrix,
the . . which ca
sics together to obtain a single 3 ⇥ 4 camera matrix Now that we have shown how to parameter
he last rowhin theiP matrix, camera intrinsics and extrinsics together to ob
is sometimes It preferable
is sometimes to use an invertible
preferable to use 4an⇥invertible
4 matrix,4 which × 4 can beh
matrix,
metimes preferable
P = K R t . to use an invertible 4 ⇥ 4 matrix, which
" # " (2.63) # P =K
opping the lastwhich row incan thebePobtained
matrix, by not dropping
Kcan be0 obtained R the
t last row in the
g the last row
o use an invertible 4 in the

P matrix 4P̃ = P matrix,
matrix, which
"T by not
#It"is sometimes # = toK̃E,
preferable use an invertibl
P matrix, " 0 K 10 dropping # " t1 row in#the P matrix,
0R the last
T
" #" # P̃ = = K̃E, " #"
K 0 R t K0 T
10 0 R1 t
T
P̃ =
K 0
P̃ = P̃ == (2.64) = K̃E,
is a 3D rigid-body (Euclidean)
K̃E,
transformation and 0
is 1
the
T
0 T
1 0 T
1 Full K̃
here E is a 3D rigid-body
0 T
(Euclidean)
1transformation
0 T
1 and is the
K̃(Euclidean) full
he 4 ⇥ 4 camera matrix P̃ can be where E is a 3D rigid-body
used
y (Euclidean) transformation and K̃ is the full-rank calibration to map rank
directly from
4 ⇥ 4 camera matrix P̃ can be u
tran

atrix. The 4 ⇥ 4 camera matrix P̃ can be matrix. used Theto map directly from 3D w
, yisw aP̃, zcan
Ematrix 3D
w ,be1) Intothis
used toscreen
rigid-body case
map coordinates
(Euclidean)
we
directlycan
frommap 3D (plus
3D world disparity),
transformation
coordinates
p̄ w= (x ,
w y w, zw , 1) toto
and
screen =
K̃(xiss(p,t
xscoordinates
screen
= coordinates
ween (xw , yw , (plus
w , 1) to screen
zcoordinates
disparity), xs =coordinates
(xs , ys , 1, d), (plus disparity), xs = (xs , ys , 1
xs ⇠
The 4 ⇥ 4 camera matrix P̃ can xxss⇠
be used
P̃ p̄
to
,
map directly from
xs ⇠ P̃ p̄w , ⇠where (2.65)
P̃ p̄
⇠w ,
indicates equality up to scale. Not
xw , yw , zw , 1) to screen coordinatesdivided (plus w
by thedisparity),
third element of x = (x
thesvector to o
up to scale. Note that after multiplication by P̃ , the vector is conversion truly accurate after a down
ndicates
here equality
⇠ indicates up up
equality to to scale.
scale. Note that after
Note that
To make the aftermultiplication
2
multiplicatio by
mera is to
oint Figureanother
projected
2.12 Ainto
pointtwo images:into
is projected (a)two
relationship
0
images: between
0 the
0 relationship
(a) 3D point
between the 3Dc
on Image
aordinate
common plane
(X,
n̂0 and
1)
+ c2D
· p the 0 =projected
0. (x,planar
point(b) y, 1, d);homography
(b) planar homograph
, 1) and the 2D projected point
Y, Z, (x, y, 1, d); induc
eject
takeitpoints
intoallimages
two
Processing
by another image
lying on aof 3D yielding
to
a 3Dplane
common 2D
scene projections
n̂ · pfrom
+ c =different
0. camera pos
g on a common plane n̂56
0 · p + c0 = Computer
0.
0 0Vision: Algorithms and Applicatio
e camera to another from 4
2a)? Using the full
Mapping rank
= K̃ to one⇥ 4 camera
camera 1matrix P̃ = K̃E
1 to another 1 from
⇠ K̃from
x̃1Mapping 1 E 1one
p camera 1E 1 E 0 pK̃
another 0 x̃0 = P̃ 1 P̃ 0 x̃0 =
= (X,Y,Z,1)
ionwe
en
ne from world
take two
camera to of
images
to another screen coordinates
a 3D scene as cameraFull
from different positions
rank or
What happens when we take two images of a 3D scene from different camera po
e 2.12a)? Using the full rank 4 ⇥ 4 camera matrix P̃ = K̃E from (2.64),
orientations (Figure 2.12a)? Using the full rank 4x~ ⇥= 4(x camera matrix P̃ = ~K̃E fro
hen we take
rojection fromx̃two images
0 ⇠ to
world of
K̃screen ap = P̃ 0 p.as
3D scene
0 E 0coordinates from 1different
1,y 1,1,d 1) camera positions
x0 = (x0,y0
ately, we
we can do
write not
the usually
projection fromhave
x~ = (x ,y access
world to to
screen coordinates
ure 2.12a)? Using the full rank 4 ⇥ 4 camera matrix P̃ = K̃E from (2.6H
0 0 ,1,d )
0 0
the depth
as coordina
x̃0 ⇠ K̃ 0 E 0 p = P̃x̃00p. (2.68)
cprojection
w image.
the z-buffer However,
or for
disparity
from world to screen coordinates a planar
value ⇠ K̃d
M 10
pscene,
0 E 0for
0 as
= P̃a p. as in
0pixel discussed
one image ab
know
ast row
ocation thepz-buffer
Assuming
of that we
usingP x̃ orin disparity
know
(2.64) value
the z-buffer
with for a pixel
dor0 disparity
a general
(a) value indone
0 forimage,
plane a pixel we
in one
equation,canimag
0 0 ⇠ K̃ 0 E 0 p = P̃ 0 p. (2.6
oint location p usingpoint location p using
compute the 3D
e plane to d0 = 0 Figure values1 2.12(Figure 12.12b).
1A point is projected 1 into Thus,
two images:if we
(a) set
relations
e know the z-buffer pp ⇠ ⇠orEEdisparity
0
1
ordinate
K̃K̃ 1
(X, x̃
0 value
Y, x̃
p
Z,

01)
E 00 for
dand K̃
the 0a
2D pixel
x̃0 in one
projected pointimage,
(x, (2.69)
y, 1, we(b)c
d);
mn of
pointand M
location 10 in
p using
then project (2.70)
it intoby
0
and
points
another
0 0
also
all lying
image on aits
yielding lastplane
common row, n̂0 ·since
p + c0 =we0. do
another
into anotherimage
h. The mapping
imageyielding
yielding
equation (2.70)
1 1 thus 1
x̃0 = P̃ 1 P̃ 0 x̃0to
reduces 1
x̃1 ⇠pK̃⇠ 1EE1 p1=K̃
1 K̃ 1 Ex̃1 E 0 K̃
Mapping from one
0 10
0 camera to another
= M 10 x̃0 .(2.6
⇠ K̃ 1 E 1 p = K̃ 1 E 1 E 0 11K̃ 00 x̃10 = P̃ 1 P̃ 0 x̃0 =1M 10 x̃0 . (2.70)
tE
into = K̃ 1image
1 panother E 1 Eyielding
K̃ 0happens
0 What =⇠
x̃0x̃when P̃we1H̃

take
0 x̃
two 0, = M
x̃images of a 10
3D x̃scene
0 . from d
1 access to10
Unfortunately, we do not usually have 0 coordinates of pixels in
the depth
1 (Figure 2.12a)? 1Using the full rank 4 ⇥ 4 camera m
orientations

You might also like