Professional Documents
Culture Documents
2 1geometry
2 1geometry
Image
Vision
Processing
Image formation
Cosimo Distante
Cosimo.distante@cnr.it
Cosimo.distante@unisalento.it
Image
Processing Geometric primitives and transformations
• …....
2.1.1 Geometric primitives
Image
Processing Geometric
Geometric primitives form the basic primitives
building blocks used to describe three-dimensional s
dingInblocks used to
this section, wedescribe
introducethree-dimensional shapes.
points, lines, and planes. Later sections of the book di
nes, curves
and planes. 2D
(Sections 5.1 Points
Laterandsections of the(Section
11.2), surfaces book 12.3),
discuss
and volumes (Section 12.5).
(Section 12.3), and volumes (Section 12.5).
2D points. 2D2Dpoints
points (pixel
(pixel coordinates
coordinates in an can
in an image) image) can be
be denoted denoted
using a pair of va
2
x = (x, y) 2 using
R2 , or aalternatively,
pair of values, x = (x, y) ∈ R , or alternatively,
es in an image) can be denoted using a pair" of values, #
x
x= .
" # y
x
x =(As stated. in the introduction, we use the (x1 , x2 , . . .)(2.1)
notation to denote column vector
y points can also be represented using homogeneous coordinates, x̃ = (x̃, ỹ, w̃) 2
2D
where vectors2D thatpoints
differ only
can by
bescale are considered
represented to be
using equivalent. P 2 = R3 (0,
homogeneous
e (x1 , x2 , . . .) notation to denote column vectors.)
is called the 2D projective space.
coordinates
sing homogeneous coordinates, x̃ = (x̃, ỹ, w̃) 2 P 2
,
A homogeneous vector x̃ can be converted back into an inhomogeneous vector
re considered to be equivalent. P 2
= R
dividing through by the last element w̃, i.e.,
3
(0, 0, 0)
ogeneous 2D projective
coordinates, = space
(x̃, 2P 2 ,
s coordinates, x̃ = (x̃, ỹ, w̃) 2 P ,
2D points can
As stated in the introduction,
x̃
we use
(As stated in the
ỹ,
alsothebe(xrepresented w̃) 2 usingtohomogeneous
1 , x2 , . . .) notation denote column vectocoo
we use the (x1 , x2 , . . .) notation to
2 introduction,
dered to be equivalent.
2D pointswhere
can also P
be represented
vectors that differ= R
usingonly
3
(0,
homogeneous
by 0, 0)
scale coordinates,
are x̃ = to
considered (x̃,be eq
be equivalent. P = R (0, 0, 0)
2 3 ỹ, w̃)
2D points can also be represented using homogeneous coord
where vectors that differ only by scale are considered to be equivalent. P 2 = R3 (
is called the 2D
where projective
vectors that differ space.
only by scale are considered to be equi
s called the 2D projective space.
back into an
A homogeneous AAhomogeneous
vector x̃ can be vector
inhomogeneous
homogeneous
is called the 2D projective vector
converted can
can be by
x̃space.
backxinto converted
converted back into
into
an inhomogeneous vecto an
A homogeneous vector x̃ can w̃, be i.e.,
converted back into an i
to an inhomogeneous vector x by
dividing
dividing through byan through
theinhomogeneous
last elementby w̃,
thei.e.,last
vector element
dividing through by the last element w̃, i.e.,
x̃ = (x̃, ỹ, w̃) = w̃(x, x̃ = 1) =
y,(x̃, ỹ, w̃ x̄, = w̃(x, y, 1) = w̃x̄
w̃)
x, y, 1) = w̃x̄, x̃ = (2.2)
(x̃, ỹ, w̃) = w̃(x, y, 1) = w̃x̄,
where x̄ = (x, y, 1) is the augmented vector. Homogeneous points whose last element
where
0 are called ideal x̄
where =or(x,
where
points x̄points1)aty,isinfinity
=y,(x, theisisaugmented
1) the
the do not vector.
augmented
augmented
and vector.
have Homogeneous
vector
Homogeneous
an equivalent po
poin
inhomog
= w̃x̄,
mogeneous
epresentation. points
0 are 0 arewhose
called called
idealideallastpoints
points element (2.2)
or points
or points =
isatw̃atinfinity
infinity and
and do
do not
nothave
havean
d do not have an representation.
equivalent inhomogeneous
representation.
2D lines. 2D lines can also be represented using homogeneous coordinates l̃ = (a
us points whose last element is w̃ =
2Dequation
The corresponding line lines. 2D
is lines can also be represented using homogeneou
Image
Processing Geometric primitives
blocks used to Points
describe three-dimensional
at infinity shapes.
and planes. Later sections of the book discuss
On a plane, we know that two non-parallel lines intersect
ction 12.3), and
at volumes (Section
a point, but 12.5). lines cannot.
two parallel
ted(x,
= vector.
y, 1)Homogeneous
isHomogeneous points
the augmented whose
points whose
vector. last element
last element
Homogeneous = 0 whose last
is w̃points
sled
at ideal
infinitypoints
and doornot have at
aninfinity
equivalent inhomogeneous
are called ideal points or pointsdo
points and at not have
infinity andandoequivalent
not i
ation. have an equivalent inhomogeneous representation
represented using
The homogeneous
point at coordinates
infinity of the line l̃ = (a,
could not b,bec).represented
. 2D lines can also be represented using homogeneous coordinate
s by the symbol ∞, which is merely a notation, it is not a
esponding line equation is
number.
x̄ · l̃ = ax + by + c = 0. (2.3)
x̄ · l̃ = ax + by + c = 0.
n vector so that l = (n̂x , n̂y , d) = (n̂, d) with kn̂k = 1. In
theImage
2D projective space.
nmogeneous
be converted
Processing vector
back x̃ cananbe
into convertedprimitives
inhomogeneous
Geometric back intox an
vector by inhomogeneo
ent w̃, i.e.,by the last element w̃, i.e.,
through
Points at infinity
(x̃, ỹ, w̃) = w̃(x, y, 1)x̃==w̃x̄,
(x̃, ỹ, w̃) = w̃(x, y, 1) = (2.2)
w̃x̄,
=ted(x,
vector.
y, 1)Homogeneous
isHomogeneous points
the augmented whose
points whose
vector. last element
last element
Homogeneous = 0 whose last
is w̃points
sledat ideal
infinitypoints
and doornot have at
points aninfinity
equivalent inhomogeneous
x! = ( x!, y!, w!do
and ) not have an equivalent i
ation.
x! y!
where x = y=
represented using homogeneous w! w!
coordinates l̃ = (a, b, c).
. 2D lines can also be represented using homogeneous coordinate
s x!
esponding lineIntuitively !
w
equation is → 0 ⇒ →∞
w!
x̄ · l̃ = ax + by
The+point
c = 0.represented by homogeneous coordinate
(2.3) ( x!, y!, 0 )
x̄ · l̃ = ax + by + c = 0.
is the point at infinity
n vector so that l = (n̂x , n̂y , d) = (n̂, d) with kn̂k = 1. In
Image
Processing Geometric primitives
Points at infinity
The point represented by homogeneous coordinate ( x!, y!, 0 )
is the point at infinity
east squares
e ⇥ is techniques
the cross product of
Intersection (Section
operator. 6.1.1
x̃ = l̃1 and
two linesSimilarly, ⇥ l̃the Appendix A.2) can b
2 , line joining two poin
ise 2.1.
where ⇥ isLine
the cross product
twooperator.
points Similarly, the line joining two points can b
joining l̃ = x̃1 ⇥ x̃2 .
l̃ = x̃1 ⇥ x̃2 .
cs. There
n trying are
to fit2D another
Conics algebraic
intersection curves
point that canlines
to multiple be expressed
or, conversely,wit
neous
ts, leastequations.
When
squares to fitFor
tryingForm example,
an intersection
example
techniques a conic
(Sectionthe
point conic
to
sections
6.1.1 sections
multiple
(so
and called (so
lines or,
because
Appendix called
conversely,
A.2)they
canabeca
line
be
points, least arise
squaresasatechniques
the (Sectionof6.1.1
intersection and Appendix
a plane a 3D A.2)
andusing cone)can be be
can used,
ion of
xercise a plane
2.1.
in Exercise 2.1.
and 3D cone) can be written a quadric eq
written using a quadric equation
T
2D conics.
onics. There There are other
are other algebraic x̃
algebraiccurves
curves = can
Qx̃that
that 0.
canbebe
expressed with simple
expressed with
homogeneousQuadric
equations. For example,
equations play the conicroles
useful sections
in (so study
the called of
because they
ogeneous equations. For example, the conic sections (so called becau
equations play
intersection auseful
ofmulti-view
plane androles
a 3D in the
cone)
geometry can
and study ofcalibration
be written
camera multi-view
using a quadricgeometry
equation
section of a plane and a 3D cone) can be written using a quadric equa
rtley and Zisserman 2004; Faugeras and=Luong
x̃T Qx̃ 0. 2001) but are no
-view
he study geometry Quadric
and equations
camera calibra-play
tion (Hartley
of multi-view geometry and camera calibra- useful roles
and Zisserman in
20
Image this book.
2001)
eras andbut are
Luong
Processing 2001)tion
not used
but (Hartley
are not and
extensively Zisserman
inin
used extensively
Geometric primitives 2004; Fa
this book.
3D Points 3D points. Point coordinates
Can be described in ordinates
inhomogeneous
dimensions can be written using inhomogeneous co-x = (x, y, z) 2 R3
o
3it is sometimes useful to denot
eneous coordinates
be written using3D
or x̃ = (x̃,
inhomogeneous
points. Point
ỹ, z̃, w̃) 2 P . As before,
co-
coordinates in thre
oint using the augmented vector x̄3= (x, x̃ y, = w̃x̄.
z, 1) with
es x̃ = (x̃,Itỹ, z̃, w̃)
is often . As abefore,
ordinates x = (x, y, z) 2 R or homo
2 toPdescribe
useful 3
3D point as an augmented
vector
gmented vectoritx̄is=sometimes
(x, y, z, 1)useful
with to denote a 3D
with x̃ = w̃x̄.
Figure 2.3 3D line equation, r = (1 )p + q.
ure Image
2.3 3D line equation, r q (1
= )p + q.
ine equation, r = (1 Geometric
Processing x )p + q. primitives
y
2D transformation
36 Computer Vision: Algorithms and Applications (September 3, 20
y similarity projective
translation
Euclidean affine
x
2) identity"matrix
Translation. #2D translations can be written as x0 = x + t or
or Identity h i
I t "matrix #
x̄ =
0 x = I t x̄
0
x̄
2×2 I t matrix or
T 0
1is the (2 ⇥ 2) identity
0wherex̄ I = x̄
0 T
1 "
I t
#
a2⇥3 matrix
Using results
a full rankin a
3×3 more compact x̄0 =
0 T
1 notation,
x̄
wh
vector. Using a 2 ⇥ 3 matrix results in a more compact notation
hich can be where
obtained from
0 is the zero Using2a 2⇥
vector.the ⇥ 33matrix
matrix by
results in appen
a more compa
gequation
1]a row)
full-rank
makes 3 ⇥it3an
where matrix
possible (which can
to chain
augmented besuch
obtained
as x̄from
transformations
vector using
appears 2on 3 matrix
the matrix
⇥both byitappe
multiplication
sides, can
1] row)
ny
ed Imagemakes
equation
with it possible
a fullwhere to chain
an augmented
homogeneous transformations
x̃. such as x̄using
vector
vector matrix
appears on multiplication. N
both sides, it can
nyProcessing
equation
aced with awhere Geometric
an augmented
full homogeneous suchTransformation
vectorx̃.
vector as x̄ appears on both sides, it can alw
aced with a full homogeneous vector x̃.
on + translation.2D This transformation
Rotation is also known as 2D rigid body mo
+ Translation
ation
clidean+ translation.
transformation This
(since
also known
transformation
Euclidean
as 2D rigid
is2Dalso
distances
body motion or the
known
Euclideanare
as 2D rigid Itbody
preserved).
transformation can mo
be
ation + translation.
Euclidean transformationThis (since
transformation
Euclidean is also knownare
distances aspreserved).
2D rigid body It motion
can be
Rx + t or h i
Euclidean
= Rx + t transformation
or (since Euclidean distances are preserved). It can be wr
x = hR t ix̄
0
= Rx + t or x0 =h R ti x̄
x0"= R t x̄ #
re "cos ✓ sin ✓ #
re R = " cos ✓ sin ✓#
R = cos sin✓✓ cos sin ✓✓
R= sin ✓ cos ✓
rthonormal rotation matrix with RR sinT✓= Icos and✓ |R| = 1.
n orthonormal rotation matrix with RR T
= I and |R| = 1.
orthonormal rotation matrix with RR = I and |R| = 1.
T
• between lines.
2D affine transformation!
x̃0 = H̃ x̃,
The3 affine
is an arbitrary ⇥ 3 matrix. Note
transform that H̃parallelism
preserves is homogeneous, i.e., it is
between lines
el under affine transformations.
Image
Processing Geometric Transformation
mation, also
Geometric known as a perspective transfor
Transformations
2D Projective
Also known as perspective transform or homography
oordinates,
• 2D projective, also called the homography!
0
x̃ = H̃ x̃,
⇥ 3 matrix. Note
• Projective matrixthat is defined
H̃!
is homogeneous,
up to scale. ! i.e
• Inhomogeneous matrix H
• Projective results areiscomputed
defined upafter
to scale
homogeneous
H̃ matrices • that differ results
Inhomogeneous
operation.! onlyareby scaleafter
computed are equiva
0 homogeneous operation.
must be normalized in order to obtain an inh
00 01 02
x0 = x̄. (2.1
a10 a11 a12
Image
Parallel lines remain parallel
Processing under affine transformations.
Geometric Transformation
mation, also
Geometric known as
Transformations a perspective transform
Projective. This transformation, also known as a perspective transform or homograph
2D Projective
oordinates,
operates on homogeneous coordinates,
• 2D projective, also called the homography!
Also known as perspective transform or homography
must be normalized
Hierarchy of 2D transformations.in Theorder
preceding settoof obtain an
transformations inho
are illustrat
in Figure 2.4 and summarized in Table 2.1. The easiest way to think of them is as a s
of (potentially restricted) 3 ⇥ 3 matrices operating on 2D homogeneous coordinate vector
+h y+h TheZisserman
Hartley and perspective
01 transformations.
planar 02
(2004)transform
contains a more
h x+h y+h
preserves straight lines
detailed description
10
of the hierarchy of 2
11 12
36Image Computer Vision: Algorithms and Applications (September 3, 20
Processing Geometric Transformation – Hierarchy 2D
y similarity projective
translation
38 ComputerEuclidean
Vision: Algorithms andaffine
Applications (September 3, 2010 dr
x
Transformation Matrix # DoF Preserves Icon
x0 = R(x c) = Rx Rc,
Projective
0
x̃ = H̃ x̃,
4×4 4×1
⇥ 4 homogeneous matrix. As in 2D
Remember that to get inhomogeneous coordinates, the
alized in order
resulting to obtain
homogeneous an inhomo
x! ʹ must be normalized
e 2D pointOrthography
An orthographic (In this
x.projection simplysection, we use
drops the z component
and para-perspective
the to
of p denote 3
three-dimensional
nate p to obtain the 2D point x. (In this section, we use p to denote 3D points and x t
an2Dbepoints.)
written
This can as
simplybedrops
writtenthe
as z component of the three-dimensional
coordinate p to obtain the 2D point x
x = [I 2⇥2 |0] p. ⎡ X ⎤
x = [I |0] p.
2⇥2 we can write ⎢ ⎥
If we are using homogeneous (projective) coordinates, p =
⎢
Y
⎥
⎢ Z ⎥
2 3 ⎣ ⎦
1 0 0 0
mogeneous (projective) coordinates, we can write
6
x̃ = 4 0 1 0 0 5 p̃,
7
0 0 0 1
2 3
Image
Processing to=2D
3D x [I 2⇥2 |0] p.
projections
If we are using homogeneous (projective)
2.1 Orthography
Geometric primitives and
coordinates, we can
para-perspective
and transformations
write
cont’d 47
2 3
1 0 0 0
6 7
x̃ = 4 0 1 0 0 5 p̃,
0 0 0 1
eir distance to the camera (Sawhney and Hanson 1991). It is exact only for
But we need to fit world coordinate expressed in meter
s (Baker and Nayar 1999, 2001).
onto image plane in millimeter/pixel
n practice, world coordinates (which may measure dimensions in meters)
d to fit onto an image sensor (physically measured in millimeters, but ultim
in pixels). For this reason, scaled
Scaledorthography
Orthographyis actually more commonly u
(a) 3D view
model is equivalent to first projecting the world points onto a local fronto-par
and then scaling this image using regular perspective projection. The scaling(d
(c) scaled orthography
for all parts of the scene (Figure 2.7b) or it can be different for objects tha
eled independently (Figure 2.7c). More importantly, the scaling can vary from
e when estimating structure from motion, which can better model the scale c
rs as an object approaches the camera. (c)Figure
scaled
(e) perspective
orthography
(
mogeneous
omogeneouscoordinates,
coordinates,the
the projection
projection has
has a simple
a simplelinear
linearform,
form,
In homogeneous coordinates we have
22 33
11 00 00 00
66 77
x̃x̃= 0 1 0
= 4 0 1 0 0 55p̃,
4 0 p̃, (2.51
(2.51)
(e) perspective (f
00 00 11 00 Figure 2.7 Commonly used projection models: (a) 3D
scaled orthography, (d) para-perspective, (e) perspective
shows a top-down view of the projection. Note how pa
ewe
drop
dropthethewwcomponent
componentwe of p. Thus,
of p.
drop wafter
Thus,
the projection,
projection,
component ofititpisisnot
notpossible
possibletotorecover
recover
theth
box sides remain parallel in the non-perspective projectio
ce ofof
ance thethe3D
3Dpoint
pointfrom
fromthe
theimage,
image, which makes sense
which makes sensefor
foraa2D (e) perspective
2Dimaging
imaging sensor.
sensor.
A form
form oftenseen
often seeninincomputer
computergraphics
graphics systems
systems is aatwo-step
isFigure
two-step projection
projection
2.7 Commonly that
used first
that projects
first
projection project
models
Image
Processing 3D to 2D projections
Camera intrinsics
Oc camera center
igure 2.8 Projection of a 3D camera-centered point pc onto the sensor plane
cs the 3D origin of the image plane coordinate system
. O c is the camera
sx andcenter (nodal
sy pixel point), cs is the 3D origin of the sensor plan
spacing
ystem, and sx and sy are the pixel spacings.
geometry involved. In this section, we first present a mapping from 2D pixel coordinate
Image3D rays using a sensor homography M , since this is easier to explain in terms of physic
0Processing Computer 3D to 2Ds
projections
measurable quantities.Vision: Algorithms
We then relate and Applications
these quantities (September
to the more commonly used camer3
trinsic matrix K, which is used to map 3D camera-centered points pc to 2D pixel coordin
Camera intrinsics
x̃s .
Image sensors return pixel values indexed by integer yc pixel coordinates (xs , ys ), o
xs
with the coordinates starting at the upper-left corner of the image and moving down an
s
the right. (This convention is xnot obeyed by all imaging libraries,
cs pc but the adjustment
other coordinate systems is straightforward.) To map pixel centers to 3D coordinates, we
scale the (xs , ys ) values sby
y the pixel spacings (sx , syO) (sometimes
zc expressed in microns
c
solid-state sensors) and then describe xc of the sensor array relative to the cam
p the orientation
projection center O c with an ys origin cs and a 3D rotation Rs (Figure 2.8).
The combined 2D to 3D projection can then be written as
2 3
sx 0 0 2 3
igure 2.8 Projection of a 3D h camera-centered
i 6 0 s point 0 7 pxs onto the sensor plane
6 y 76 c 7
p = R s cs 6 7 4 ys 5 = M s x̄s . (2
. O c is the camera center (nodal point), c0s is0the0 3D origin
4 5
1 of the sensor plan
0 0 1
ystem, and sx and sy are the pixel spacings. 8 unknowns
The first
• two columns of
3 parameters the 3 ⇥ 3rotation
describing matrix M Rs,s are the 3D vectors corresponding to unit s
in the • image pixel array
3 parameters along thethexstranslation
describing and ys directions,
cs, and while the third column is the
amera intrinsics
image• array
2 scale factors
origin cs . (sx , sy )
s measurable
of freedom,
50quantities. Wesince thequantities
Computer
then relate these distance
Vision: moreof
Algorithms
to the the
commonly sensor
andused camera in-fro
Applications (S
Image
trinsic matrix K, which is used to map 3D camera-centered points p to 2D pixel coordinates
the x̃ . sensor spacing, based on external image measur
Processing
s
3D to 2D projections c
Image sensors return pixel values indexed by integer pixel coordinates (x yc , y ), often
mating a camera
Camera
with the coordinates starting atmodel
the upper-leftM
intrinsics cornersofwith xs the
the image required
and moving
s
down and tosev
s
calibration
Once we have projected a 3D point through an idealmatrix
pinhole using
K is called the calibration matrix and describes th
degrees of freedom (the full dimensionality of a 3 ⇥ 3 homogeneous matr
, then,
then, dodo
Image mostmost textbooks
textbooks on 3D
3Doncomputer
3D computer
vision vision
and and multi-vi
multi-view geom
Processing to 2D projections
3;Hartley
Hartleyandand Zisserman
Zisserman 2004;2004; Faugeras
Faugeras and 2001)
and Luong Luongtreat
2001)
K as treat
an uK
Camera intrinsics
ix with five degrees of freedom?
with five degrees of freedom?
hile thisthis
While is usually not made
is usually explicit
not made in these
explicit inbooks,
these itbooks,
is because
it is we
bec
l KK
ull matrix based
matrix K on external
calibration
based measurement
matrix
on external alone.
describes
measurement When
the camera calibrating
alone.intrinsics
When caliba
based on external 3D points or other measurements (Tsai 1987), we end
) based on external 3D extrinsics
Camera points ordescribe
other measurements (Tsaiin1987
camera’s orientation
trinsic (K) and extrinsic
space (R, t)pose)
(camera camera parameters simultaneously us
ntrinsic (K) and extrinsic (R, t) camera parameters simultan
rements, h i
surements, x̃s = K R ht pw =iP pw ,
x̃s = K R t pw = P pw ,
pw are known 3D world coordinates and
camera camera
re pw are known 3D world coordinates
intrinsics extrinsics and
P = K[R|t]
Camera matrix P = K[R|t]
wn as the camera matrix. Inspecting this
3×4 equation, we see that we can
T
n, R is an orthogonal rotation.) 6 7
K = 4 0 fy cy 5 ,
Image
everal
(Golub waysVan
and to Loan
write the upper-triangular
521996). formVision:
(Note theComputer
unfortunateofclash One possibility
K.Algorithms
of terminologies:isIn
and Applications (Sep
Processing 0 0 1
3D to32D projections
2
a textbooks, R represents an upper-triangular (right of the diagonal) matrix; in
fx s cx
on, which
R is anuses
orthogonal
Camera rotation.)
independent focal lengths7 fx and fy for the
6 intrinsics W-1sensor x and y dimens
K = 4 0 fy cy 5 , yc (2.57)
several ways to write the upper-triangular
s encodes any possible 0skew0 between form of K. One
the sensor possibility
axes due is
xs to the sensor not
2 13 0
perpendicular to the optical
fx s axis cx and (cx , cy ) denotes the optical center exp
ependent focal lengths 7 sensor0x and(cyx,cdimensions.
y) f
K = f4x and
0 ffyy forcythe The entry
coordinates. Another6 possibility is
5, (2.57)
possible skew between the0 sensor0 axes1 due to 2 the sensor not3being xc mountedzc
to the optical axis and (cx , cy ) denotes the optical
f center
s cxexpressed in pixel
dependent focal lengths fx and fy for the sensor H-1
6 y dimensions.7 The entry
Another possibility is K = x4and 0 y af cy 5 ,
Another
y possible skew between possibility
the sensor axes dueisto the sensor
s not being mounted
2 3 0 0 1
to the optical axis and (cx , cyf) denotes
s cthex
optical
wherecenter expressed
the aspect ratio in pixel
a has been
Figure 62.9 Simplified camera
7 made intrinsics
explicit showing
and a the focal
common length f a
focal
Another possibility
where the aspect is = a0hasafbeencymade
Kratio 5 ,andexplicit andWaandcommon focal
(2.58)length f
(cx , cy2).4The image width
3 height are
length f is used. H.
In practice, for many 0 0 1
f applications
s cx an even simpler form can be obtained b
and s = 0, 6 7
K = 4 0 af c , Van 2
5common 3 isthe(2.58)
ect ratio a has beenfactorization
made explicit
In many0 0 1 and
(Goluby
a
and focal
Loan length
1996). (Note
f used.
unfortunate clash
f 0 cx
matrix algebra textbooks, R represents an upper-triangular
, for many applications an even simpler form can
applications by setting a = 1
6 be obtained 7 (right of the
computer vision, R is an = 4 0 rotation.)
Korthogonal f cy 5 .
ect ratio a has been setting 2a = 1and a common
made explicit 3 focal length f is used.
e, for many applications There are
an even several wayscan
0 cform
fsimpler 0 the0upper-triangular
to write
be obtained
1
by setting a =form
1 of K. On
and s = 0 x
Image 52 Computer Vision: Algorithms and Applications (Sep
Processing 3D to 2D projections
0 (cx,cy) f
xc zc
H-1
ys
Figure
Usually 2.9 setting
Simplified
(cx,ccamera intrinsics showing the focal length f a
y) = (W/2,H/2) results in only one
(cunknown:
x , cy ). The image widthlength
the focal and height
f are W and H.
factorization (Golub and Van Loan 1996). (Note the unfortunate clash
matrix algebra textbooks, R represents an upper-triangular (right of the
computer vision, R is an orthogonal rotation.)
There are several ways to write the upper-triangular form of K. On
center of the lens). The sense of the y axis has also been flipped to get a coor
compatible with the way that most imaging libraries treat the vertical (row) coo
Image
tain graphics libraries, such3D
Processing to 2D use
as Direct3D, projections
a left-handed coordinate system, w
to some confusion.
Notes on focal lenght
1
✓
Figure 2.10 Central projection, W W ✓
tan =showing or the relationship
f= between
tan the 3D . and 2D coord
nates, p and x, as well as the2relationship
2f between the focal2 length2 f , image width W , a
the field of view ✓.
For conventional film cameras, W = 35mm, and hence f is also expressed in
at we
equivalent
ow Image have
35mm shown
focal length,how to
multiply
that we have shown how to parameterize byparameterize
35/W . h
theathe
to go from
i calibration
calibration
unitless matrixmat
The conversion between the various focal
f to one expressed inK pix
ntrinsics
Processing and extrinsics 3D P to
together = 2D
K toprojections
mera intrinsics and extrinsics together to obtain a single 3to ⇥ R
f obtain
expressed t
a
in .
single
pixels the 3 ⇥ 4 cam
4equivalent
camera 35mm
ma
Camera Matrix
hh
Camera matrix
i i
times preferable
how to parameterize to use an
the calibration invertible
matrixPK,=
P =weKK can R4R
put ⇥ t 4 tmatrix,
the . . which ca
sics together to obtain a single 3 ⇥ 4 camera matrix Now that we have shown how to parameter
he last rowhin theiP matrix, camera intrinsics and extrinsics together to ob
is sometimes It preferable
is sometimes to use an invertible
preferable to use 4an⇥invertible
4 matrix,4 which × 4 can beh
matrix,
metimes preferable
P = K R t . to use an invertible 4 ⇥ 4 matrix, which
" # " (2.63) # P =K
opping the lastwhich row incan thebePobtained
matrix, by not dropping
Kcan be0 obtained R the
t last row in the
g the last row
o use an invertible 4 in the
⇥
P matrix 4P̃ = P matrix,
matrix, which
"T by not
#It"is sometimes # = toK̃E,
preferable use an invertibl
P matrix, " 0 K 10 dropping # " t1 row in#the P matrix,
0R the last
T
" #" # P̃ = = K̃E, " #"
K 0 R t K0 T
10 0 R1 t
T
P̃ =
K 0
P̃ = P̃ == (2.64) = K̃E,
is a 3D rigid-body (Euclidean)
K̃E,
transformation and 0
is 1
the
T
0 T
1 0 T
1 Full K̃
here E is a 3D rigid-body
0 T
(Euclidean)
1transformation
0 T
1 and is the
K̃(Euclidean) full
he 4 ⇥ 4 camera matrix P̃ can be where E is a 3D rigid-body
used
y (Euclidean) transformation and K̃ is the full-rank calibration to map rank
directly from
4 ⇥ 4 camera matrix P̃ can be u
tran
atrix. The 4 ⇥ 4 camera matrix P̃ can be matrix. used Theto map directly from 3D w
, yisw aP̃, zcan
Ematrix 3D
w ,be1) Intothis
used toscreen
rigid-body case
map coordinates
(Euclidean)
we
directlycan
frommap 3D (plus
3D world disparity),
transformation
coordinates
p̄ w= (x ,
w y w, zw , 1) toto
and
screen =
K̃(xiss(p,t
xscoordinates
screen
= coordinates
ween (xw , yw , (plus
w , 1) to screen
zcoordinates
disparity), xs =coordinates
(xs , ys , 1, d), (plus disparity), xs = (xs , ys , 1
xs ⇠
The 4 ⇥ 4 camera matrix P̃ can xxss⇠
be used
P̃ p̄
to
,
map directly from
xs ⇠ P̃ p̄w , ⇠where (2.65)
P̃ p̄
⇠w ,
indicates equality up to scale. Not
xw , yw , zw , 1) to screen coordinatesdivided (plus w
by thedisparity),
third element of x = (x
thesvector to o
up to scale. Note that after multiplication by P̃ , the vector is conversion truly accurate after a down
ndicates
here equality
⇠ indicates up up
equality to to scale.
scale. Note that after
Note that
To make the aftermultiplication
2
multiplicatio by
mera is to
oint Figureanother
projected
2.12 Ainto
pointtwo images:into
is projected (a)two
relationship
0
images: between
0 the
0 relationship
(a) 3D point
between the 3Dc
on Image
aordinate
common plane
(X,
n̂0 and
1)
+ c2D
· p the 0 =projected
0. (x,planar
point(b) y, 1, d);homography
(b) planar homograph
, 1) and the 2D projected point
Y, Z, (x, y, 1, d); induc
eject
takeitpoints
intoallimages
two
Processing
by another image
lying on aof 3D yielding
to
a 3Dplane
common 2D
scene projections
n̂ · pfrom
+ c =different
0. camera pos
g on a common plane n̂56
0 · p + c0 = Computer
0.
0 0Vision: Algorithms and Applicatio
e camera to another from 4
2a)? Using the full
Mapping rank
= K̃ to one⇥ 4 camera
camera 1matrix P̃ = K̃E
1 to another 1 from
⇠ K̃from
x̃1Mapping 1 E 1one
p camera 1E 1 E 0 pK̃
another 0 x̃0 = P̃ 1 P̃ 0 x̃0 =
= (X,Y,Z,1)
ionwe
en
ne from world
take two
camera to of
images
to another screen coordinates
a 3D scene as cameraFull
from different positions
rank or
What happens when we take two images of a 3D scene from different camera po
e 2.12a)? Using the full rank 4 ⇥ 4 camera matrix P̃ = K̃E from (2.64),
orientations (Figure 2.12a)? Using the full rank 4x~ ⇥= 4(x camera matrix P̃ = ~K̃E fro
hen we take
rojection fromx̃two images
0 ⇠ to
world of
K̃screen ap = P̃ 0 p.as
3D scene
0 E 0coordinates from 1different
1,y 1,1,d 1) camera positions
x0 = (x0,y0
ately, we
we can do
write not
the usually
projection fromhave
x~ = (x ,y access
world to to
screen coordinates
ure 2.12a)? Using the full rank 4 ⇥ 4 camera matrix P̃ = K̃E from (2.6H
0 0 ,1,d )
0 0
the depth
as coordina
x̃0 ⇠ K̃ 0 E 0 p = P̃x̃00p. (2.68)
cprojection
w image.
the z-buffer However,
or for
disparity
from world to screen coordinates a planar
value ⇠ K̃d
M 10
pscene,
0 E 0for
0 as
= P̃a p. as in
0pixel discussed
one image ab
know
ast row
ocation thepz-buffer
Assuming
of that we
usingP x̃ orin disparity
know
(2.64) value
the z-buffer
with for a pixel
dor0 disparity
a general
(a) value indone
0 forimage,
plane a pixel we
in one
equation,canimag
0 0 ⇠ K̃ 0 E 0 p = P̃ 0 p. (2.6
oint location p usingpoint location p using
compute the 3D
e plane to d0 = 0 Figure values1 2.12(Figure 12.12b).
1A point is projected 1 into Thus,
two images:if we
(a) set
relations
e know the z-buffer pp ⇠ ⇠orEEdisparity
0
1
ordinate
K̃K̃ 1
(X, x̃
0 value
Y, x̃
p
Z,
⇠
01)
E 00 for
dand K̃
the 0a
2D pixel
x̃0 in one
projected pointimage,
(x, (2.69)
y, 1, we(b)c
d);
mn of
pointand M
location 10 in
p using
then project (2.70)
it intoby
0
and
points
another
0 0
also
all lying
image on aits
yielding lastplane
common row, n̂0 ·since
p + c0 =we0. do
another
into anotherimage
h. The mapping
imageyielding
yielding
equation (2.70)
1 1 thus 1
x̃0 = P̃ 1 P̃ 0 x̃0to
reduces 1
x̃1 ⇠pK̃⇠ 1EE1 p1=K̃
1 K̃ 1 Ex̃1 E 0 K̃
Mapping from one
0 10
0 camera to another
= M 10 x̃0 .(2.6
⇠ K̃ 1 E 1 p = K̃ 1 E 1 E 0 11K̃ 00 x̃10 = P̃ 1 P̃ 0 x̃0 =1M 10 x̃0 . (2.70)
tE
into = K̃ 1image
1 panother E 1 Eyielding
K̃ 0happens
0 What =⇠
x̃0x̃when P̃we1H̃
P̃
take
0 x̃
two 0, = M
x̃images of a 10
3D x̃scene
0 . from d
1 access to10
Unfortunately, we do not usually have 0 coordinates of pixels in
the depth
1 (Figure 2.12a)? 1Using the full rank 4 ⇥ 4 camera m
orientations