Part 09

Visual Information Interpretation: 3D from stereosis
Ji Hui
National University of Singapore
October 11, 2021
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 1 / 43
Multi-camera systems for 3D perception
Using multiple (stereo) cameras to capturing 3D information from two or

more images
Multi-camera survillance
Stereo camera for driverless car
Depth from Stereopsis
Shape from “disparity” between two views

3D shape of scene from two (multiple) images from different viewpoints
Main idea: scene point
image plane
optical center
3D perception for stereo imaging
Depth is represented by color.
World co-ordinate vs image co-ordinate
Geometric camera calibration
Camera calibration estimates the co-ordinate mapping among image
frame, camera frame and world frame
intrinsics: pixel co-ordinates () Pinhole camera co-ordinates
extrinsics: camera coordinates () world co-ordinates
Usage of calibration mapping

measure the size of an object in world units
determine the location of the camera in the world
Warping the images to preferred co-ordinates.
Many applications in 3D scene reconstruction, object detection, visual
navigation, and many others.
Perspective transform
Pinpole camera model
Projection of pinhole camera when world co-ordinate is aligned with

camera co-ordinate
the origin of world co-ordinate is optical center of camera
No rotation
Xf Yf
y= x= .
Z Z
Notion in homogenous co-ordinates:
2 3
2 3 2 3 X
✓ ◆ ✓ u ◆ u f 0 0 0 6
x Y 7
= w
v , 4 v 5=4 0 f 0 0 56
4 Z
7
5
y w w 0 0 1 0
1
Extrinsic mapping
Extrinsic mapping describes the relation between camera co-ordinate
(image frame) and world co-ordinate (world frame)
T describes the position of the origin of camera frame with respect to

world frame
R describes the rotation aligning the camera frame with the world frame
For example, rotation around z-axis:
2 3
cos !z sin !z 0
RZ = 4 sin !z cos !z 0 5
0 1
3D rotation matrix: R = Rx Ry Rz
Mapping between camera co-ordinate and world
co-ordinate
Consider a point P
Pc : coordinate of the point in camera frame
Pw : coordinate of the point in world frame
The mapping between two coordinates:
Pc = R(Pw T ).
In homogenous form:
2 3 2 32
3
Xc Xw
6 Yc 7 6 R RT 7 6 7
6 7=6 7 6 Yw 7 ,
4 Zc 5 4 5 4 Zw 5
1 0 0 0 1 1
where R = Rx Ry Rz
Intrinsic mapping
Intrinsic mapping describes the relation between camera frame and

image frame
The mapping between two, in homogenous form:

2 3
2 3 2 3 Xc
 u fx s x0 0 6
xim Yc 7
4 v 5=4 0 fy y0 0 56 7
4 Zc 5
yim
w 0 0 1 0
1
Calibration matrix M
Combining both extrinsic and intrinsic mapping:
2 3 2 3
2 3 Xw Xw
u 6 7 6
4 v 5 = Mint Mext 6 Yw 7 = M 6 Yw 7
7,
4 Zw 5 4 Zw 5
w
1 1
with
2 3 2 3
fx s xo r11 r12 r13 R1T T
Mint =4 0 fy yo 5 ; Mext = [R, RT ] = 4 r21 r22 r23 R2T T 5
0 0 1 r31 r32 r33 R2T T
where ri,j and Ri are the entries and columns of R.

M = Mint Mext , , 3x4 matrix defined up to scale, has 11 degrees of
freedom (5 internal, 3 rotation, 3 translation parameters)
The 3x3 submatrix M 0 = Mint R is non-singular (Mint is upper triangular,
R is orthogonal ! essential QR decomposition)
M = Mint Mext = Mint R [I3 , T ] = M 0 [I3 , T ]
Calibration pattern
Show checkerboard pattern to cameras with different poses for
estimating calibration matrix
32
localization. As a counterexample, this is not the case for the corners of a white square on a black
background. If the image is blurred somewhat, changing the image gamma will cause the square to
Calibration pattern for radial distortion

shrink or enlarge, which will affect corner localization.
Image of checker board pattern and its edge map

Figure 4.2: Original image of a calibration checkerboard pattern, taken with a Canon 24mm EF lens.
The straight lines in several orientations throughout this image are used to determine the pattern of
radial lens distortion. The letter “P” in the center is used to record the orientation of the grid with
respect Ji Huicamera.
to the (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 13 / 43
Camera calibration using known patterns
Calibration with known correspondence
pk , P~k }K
Point correspondence: matching the corners of board: {~ k=1
2 3
2 3 Xw
u 6 Yw 7
p~ ⌘ 4 v 5 = Mint Mext 6 7
4 Zw 5 = Mint Mext P
~
w
1
Calibration: how to estimate M
Using the following equation to estimate parameters of calibration matrix
p~ = M P~ =) p~ ⇥ M P~ = p~ ⇥ p~ = 0,
How to estimate M from point correspondences
p~k ⇥ (M P~k ) = 0, k = 1, . . . , K
Let m
~ 2 R12 denote the 12 entries of matrix M . the constraints above can
be expressed as a linear system
Am
~ =0
It can be solved by the least squares estimation:
~ 22 ,
min kAmk subject to kmk = 1,
whose solution is the eigenvector of least eigenvalue of A> A in

magnitude.
The solution is not optimal, as it does not reflect structure of matrix
M = Mint Mext .
Preliminaries: QR decomposition
QR decomposition is a decomposition of a matrix A:
A = QR,
Q is an orthogonal matrix satisfying Q> Q = QQ> = I

R is an upper triangular matrix.
QR is associated with Gram-Schmidt process
Consider the matrix A=[a1 , a2 , . . . , aK ]
The G-S process convert {a1 , . . . , aK } to an orthonormal basis {e1 , . . . , eK }
by
u1 = a1 , e1 = ku11 k2 u1
Pk huj ,uk i
uk = ak j=1 ku k2 uj , ek = kuk1 k2 uk , k = 2, . . . , K.
j 2
The relationship
⇢
hei , aj i, i j
Q = [e1 , e2 , . . . , eK ]; Ri,j =
0, otherwise.
numpy.linalg.qr: Compute the QR factorization of a matrix.
Calibration: Refinement
Estimate calibration matrix M from 8 point correspondences.
Decompose M into internal and external mappings M = Mint Mext
1 Estimating translation: Let T̃ denote the estimation of T
In non-homogenous co-ordinates. Recall that Pc = R(Pw T ). Then, for the
world point T , we have
Pc = R(T T ) = 0.
In homogenous co-ordinates,
Mext T̃ = 0 =) M T̃ = Mint R [I3 , T ] T̃ = 0.
Thus, T̃ can be estimated by finding the eigenvector of the eigenvalue of M with
smallest magnitude.
2 Estimating camera rotation and intrinsic parameters.
Recall that
⇥ ⇤
M = Mint Mext = Mint R [I3 , T ] = M 0 [I3 , T ] = M 0 , M 0 T
where M 0 = Mint R, Mint : upper-triangular, R :unitary matrix.
Running the QR decompostion on the left 3 ⇥ 3 block of M , M [0 : 2, 0 : 2], to
have R and Mint .
Updating M using estimated Mext and Mint , and using it as an initial for
some iterative scheme for solving
X
min pk ⇥ M P~k |2 ,
|~
M
k
e.g. Deepest descent method for several iterations.
3D vision: Stereosis
Two viewing points provide disparity, which translates to depth
Multi-view images can provide a 3D model of object
epth from Two Views: Stereo
Stereo vision
For an point in the world, its single view cannot determine its location in
All points
space
on projective line to P map to p
Stereo
Depth vision
from Two Views: Stereo
For an point in the world, 2 different views can determine its location in
I can get 3D!
space
The 3D co-ordinates of the point P can be found by triangulation.

Figure: I can get a point in 3D by triangulation!
Pixel correspondence
Triangulation needs dense pixel correspondence

Ransac-based matching only gives sparse point corresponce
It appears that given a pixel in left image, finding its correspondence in

right image needs to run 2D search over the whole image
How do we match a point in the first image to a point in the second? How
can we constrain our search?
Epipolar geometry: Pixel correspondence
Find pairs of points that correspond to same scene point
Epipolar geometry: For each pixel, its correspondence lies in a line,

epipolar line.
Consider two parallel cameras with

same focal length. For each pixel
in left, its correspondence lies in a
horizontal scan line
Epipolar geometry
epipoles e, e0
intersection of baseline with image plane
projection of projection center in other image
vanishing point of camera motion direction
epipolar plane = plane containing baseline (1-D family)
epipolar line = intersection of epipolar plane with image
Example
Converging camera
Forward camera motion
Calibrated camera: essential matrix
Suppose we know intrinsic mapping of cameras: Mint and Mint 0

.
Convert to normalized coordinates p by pre-multiplying all points p̂ with
the inverse of the calibration matrix
p = Mint1 p̂; p0 = Mint1 p̂0 .
Set the first camera’s coordinate system as world coordinates and define
R and t that map from X’ to X
p = P ; p0 = P 0 ; P = RP 0 + t
The relationship
p = P = RP 0 + t = Rp0 + t
showed that the vectors p, t and Rp0 are co-planar as the vector p is
summation of the vector t and Rp0 . That means,
p ? (t ⇥ Rp0 ) ! p> (t ⇥ Rp0 ) = 0 ! p> [t⇥ ]Rp0 = 0,
where 2 3 (
0 t3 t2 T
p = (u, v, 1)
[t⇥ ] = 4 t3 0 t1 5 ; T
t2 t1 0 p0 = (u0 , v 0 , 1)
Thus, we have
p> Ep0 = 0 with E = [t⇥ ] R,
Uncalibrated camera: fundamental matrix
If we do not know the intrinsic parameters. Then,

⇢
p̂ = Mint p; p̂0 = Mint
0
p0
> 0
p Ep = 0.
leads to
p̂> F p̂0
where Fundamental matrix F = Mint> E(Mint
0
) 1
.
Properties of fundamental (essential) matrix
Fundamental matrix F :
F is of rank 2,
Has 7 degrees of freedom
There are 9 elements, but scaling can be omitted and det F = 0
Essential matrix E:
E is of rank 2
Its two nonzero singular values.
Has only 5 degrees of freedom, 3 for rotation, 2 for translation
8-point algorithm
Recall that fundamental matrix is determined by the correspondence
pairs {(xi , yi )> , (x0i , yi0 )> }ni=1 :
x0>
i F xi = 0,
Let f denote the entries of the matrix F , we have
Af = 0,
where 2 3
x01 x1 x01 y1 x01 y10 y1 y10 x1 y1 1
A = 4 ... .. .. .. .. .. .. .. 7
6
. . . . . . . 5
x0n xn x0n yn x0n yn0 yn yn0 xn yn 1
Normalized 8-point method
Normalize points by shifting to the origin
Computing F by SVD for minimizing Mean squares error (MSE)
Enforce the rank-2 constraints.
Output F by re-shifting back.
Triangulation: calibrated camera
Finding P as the midpoint of the common perpendicular to the two rays in
space.
Linear triangulation:
⇢ ⇢
x = MX x ⇥ MX = 0
=) =) AX = 0,
x0 = M 0 X x0 ⇥ M 0 X = 0
where A is determined by the pairs (x, x0 ).
The linear system can be solved by
min kAXk22 , subject to kXk2 = 1.
Reconstruction via minimizing geometric error
Finding a pair (x̂, x̂0 ) whose rays intersections and is close to (x, x0 ):
>
b)2 + d(x0 , x
min0 d(x, x b0 )2 , subject to xb0 F x
b = 0,
x̂,x̂
Example of d: d(x, x̂) = kx x̂k22 .
which is equivalent to minimizing the reprojection error:

min d(x, M X)2 + d(x0 , M 0 X)2
X
Rectification
Camera rectification for simplifying reconstruction
Re-project image planes onto common plane such as all epipolar lines are
horizontal, i.e. two cameras are parallel
The distance between two optical center T is called the stereo baseline,
and is assumed to be known.
Depth and disparity from rectified camera
Point correspondence in rectified camera with baseline T
xl and xr : the matched point pair

xl and xr : the coordinates of the pair in their
own image frame.
The measurement d = xl xr is called
disparity of matched point pair
The distance between two points are
T d = T x` + xr
Notice that
T T d T
= =) Z = f
Z Z f d
Thus, the disparity d is proportional to inverse depth 1
|Z|
1
d / |Z| .
Correlation-based dense correspondence
3D scene reconstruction requires dense correspondences
For rectified camera, Correspondence is done as follows

For each epipolar line, for each point in left image, finding point in right
image with closest intensity.
Often, it is not possible without additional constraints, e.g.
Window matching
Issue: ambiguity exists when comparing only single intensity
Idea: comparing the neighboring window
Window matching
For each window (e.g. 3 ⇥ 3), match to closest window on epipolar line in
another image.
Two often seemX

matching metrics:
SSD = |f [i, j] g[i, j]|2
[i,j]2⌦
X
Cf,g = f [i, j]g[i, j]
[i,j]2⌦
Additional constraints on dense correspondence

Ordering constraint: order of points in two images is same.
Smooth constraint: disparity doesn’t change too quickly
Uniqueness constraint: each feature at most has one match
Occlusion and disparity are connected.
Demonstration of matching in stereo
Stereo: Parallel Calibrated Cameras
Given a point in left image, we scan the scanline and find one local
window
For eachpatch
pointthat is (x
pl = most similar
l , yl ), how doto Ithe
getone in (x
pr = ther , yleft.
r )? By matching. Patch
The similarity of two patches are measured by matching(xcost,
around (x r , y r )) should look similar to the patch around l , yl ).which
could be SSD or correlation.
The correspondent pixel is the one with the lowest matching cost.
Results
Stereo: with different
Parallel patch
Calibrated sizes
Cameras
Smaller patches:
Smaller patches:more
more detail, noisy. Bigger:
detail, but noisy. Bigger:less
less detail,
detail, butbut smooth
smooth
Sanja Fidler CSC420: Intro to Image Understanding 7 / 12

Demonstration
two images
Depth map from Stereo
Original image Ground truth
Window matching Window matching w/

constraints
Solutions with other sensors
Kinect: Structured infrared light
Solutions with other sensors
Lidar in iphone

Part 09

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part 09

Uploaded by

Copyright:

Available Formats

Visual Information Interpretation: 3D from stereosis

National University of Singapore

October 11, 2021

Using multiple (stereo) cameras to capturing 3D information from two or

Shape from “disparity” between two views

Main idea: scene point

Depth is represented by color.

Usage of calibration mapping

Projection of pinhole camera when world co-ordinate is aligned with

T describes the position of the origin of camera frame with respect to

Intrinsic mapping describes the relation between camera frame and

The mapping between two, in homogenous form:

where ri,j and Ri are the entries and columns of R.

M = Mint Mext = Mint R [I3 , T ] = M 0 [I3 , T ]

Calibration pattern for radial distortion

Image of checker board pattern and its edge map

How to estimate M from point correspondences

It can be solved by the least squares estimation:

whose solution is the eigenvector of least eigenvalue of A> A in

Q is an orthogonal matrix satisfying Q> Q = QQ> = I

numpy.linalg.qr: Compute the QR factorization of a matrix.

Multi-view images can provide a 3D model of object

The 3D co-ordinates of the point P can be found by triangulation.

Triangulation needs dense pixel correspondence

It appears that given a pixel in left image, finding its correspondence in

Epipolar geometry: For each pixel, its correspondence lies in a line,

Consider two parallel cameras with

Forward camera motion

Suppose we know intrinsic mapping of cameras: Mint and Mint 0

p ? (t ⇥ Rp0 ) ! p> (t ⇥ Rp0 ) = 0 ! p> [t⇥ ]Rp0 = 0,

If we do not know the intrinsic parameters. Then,

Let f denote the entries of the matrix F , we have

Example of d: d(x, x̂) = kx x̂k22 .

which is equivalent to minimizing the reprojection error:

xl and xr : the matched point pair

For rectified camera, Correspondence is done as follows

Two often seemX

Additional constraints on dense correspondence

Sanja Fidler CSC420: Intro to Image Understanding 7 / 12

Original image Ground truth

Window matching Window matching w/

Kinect: Structured infrared light

You might also like