Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Visual Information Interpretation: 3D from stereosis

Ji Hui

National University of Singapore

October 11, 2021

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 1 / 43
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 2 / 43
Multi-camera systems for 3D perception

Using multiple (stereo) cameras to capturing 3D information from two or


more images

Multi-camera survillance
Stereo camera for driverless car

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 3 / 43
Depth from Stereopsis

Shape from “disparity” between two views


3D shape of scene from two (multiple) images from different viewpoints

Main idea: scene point

image plane
optical center

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 4 / 43
3D perception for stereo imaging

Depth is represented by color.

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 5 / 43
World co-ordinate vs image co-ordinate

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 6 / 43
Geometric camera calibration
Camera calibration estimates the co-ordinate mapping among image
frame, camera frame and world frame
intrinsics: pixel co-ordinates () Pinhole camera co-ordinates
extrinsics: camera coordinates () world co-ordinates

Usage of calibration mapping


measure the size of an object in world units
determine the location of the camera in the world
Warping the images to preferred co-ordinates.
Many applications in 3D scene reconstruction, object detection, visual
navigation, and many others.

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 7 / 43
Perspective transform
Pinpole camera model

Projection of pinhole camera when world co-ordinate is aligned with


camera co-ordinate
the origin of world co-ordinate is optical center of camera
No rotation
Xf Yf
y= x= .
Z Z
Notion in homogenous co-ordinates:
2 3
2 3 2 3 X
✓ ◆ ✓ u ◆ u f 0 0 0 6
x Y 7
= w
v , 4 v 5=4 0 f 0 0 56
4 Z
7
5
y w w 0 0 1 0
1
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 8 / 43
Extrinsic mapping
Extrinsic mapping describes the relation between camera co-ordinate
(image frame) and world co-ordinate (world frame)

T describes the position of the origin of camera frame with respect to


world frame
R describes the rotation aligning the camera frame with the world frame
For example, rotation around z-axis:
2 3
cos !z sin !z 0
RZ = 4 sin !z cos !z 0 5
0 1
3D rotation matrix: R = Rx Ry Rz
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 9 / 43
Mapping between camera co-ordinate and world
co-ordinate

Consider a point P
Pc : coordinate of the point in camera frame
Pw : coordinate of the point in world frame
The mapping between two coordinates:

Pc = R(Pw T ).

In homogenous form:
2 3 2 32
3
Xc Xw
6 Yc 7 6 R RT 7 6 7
6 7=6 7 6 Yw 7 ,
4 Zc 5 4 5 4 Zw 5
1 0 0 0 1 1

where R = Rx Ry Rz

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 10 / 43
Intrinsic mapping

Intrinsic mapping describes the relation between camera frame and


image frame

The mapping between two, in homogenous form:


2 3
2 3 2 3 Xc
 u fx s x0 0 6
xim Yc 7
4 v 5=4 0 fy y0 0 56 7
4 Zc 5
yim
w 0 0 1 0
1

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 11 / 43
Calibration matrix M
Combining both extrinsic and intrinsic mapping:
2 3 2 3
2 3 Xw Xw
u 6 7 6
4 v 5 = Mint Mext 6 Yw 7 = M 6 Yw 7
7,
4 Zw 5 4 Zw 5
w
1 1

with
2 3 2 3
fx s xo r11 r12 r13 R1T T
Mint =4 0 fy yo 5 ; Mext = [R, RT ] = 4 r21 r22 r23 R2T T 5
0 0 1 r31 r32 r33 R2T T

where ri,j and Ri are the entries and columns of R.


M = Mint Mext , , 3x4 matrix defined up to scale, has 11 degrees of
freedom (5 internal, 3 rotation, 3 translation parameters)
The 3x3 submatrix M 0 = Mint R is non-singular (Mint is upper triangular,
R is orthogonal ! essential QR decomposition)

M = Mint Mext = Mint R [I3 , T ] = M 0 [I3 , T ]

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 12 / 43
Calibration pattern
Show checkerboard pattern to cameras with different poses for
estimating calibration matrix

32

localization. As a counterexample, this is not the case for the corners of a white square on a black

background. If the image is blurred somewhat, changing the image gamma will cause the square to

Calibration pattern for radial distortion


shrink or enlarge, which will affect corner localization.

Image of checker board pattern and its edge map


Figure 4.2: Original image of a calibration checkerboard pattern, taken with a Canon 24mm EF lens.
The straight lines in several orientations throughout this image are used to determine the pattern of
radial lens distortion. The letter “P” in the center is used to record the orientation of the grid with
respect Ji Huicamera.
to the (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 13 / 43
Camera calibration using known patterns

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 14 / 43
Calibration with known correspondence

pk , P~k }K
Point correspondence: matching the corners of board: {~ k=1
2 3
2 3 Xw
u 6 Yw 7
p~ ⌘ 4 v 5 = Mint Mext 6 7
4 Zw 5 = Mint Mext P
~
w
1
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 16 / 43
Calibration: how to estimate M
Using the following equation to estimate parameters of calibration matrix

p~ = M P~ =) p~ ⇥ M P~ = p~ ⇥ p~ = 0,

How to estimate M from point correspondences

p~k ⇥ (M P~k ) = 0, k = 1, . . . , K

Let m
~ 2 R12 denote the 12 entries of matrix M . the constraints above can
be expressed as a linear system

Am
~ =0

It can be solved by the least squares estimation:

~ 22 ,
min kAmk subject to kmk = 1,

whose solution is the eigenvector of least eigenvalue of A> A in


magnitude.
The solution is not optimal, as it does not reflect structure of matrix
M = Mint Mext .
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 17 / 43
Preliminaries: QR decomposition
QR decomposition is a decomposition of a matrix A:

A = QR,

Q is an orthogonal matrix satisfying Q> Q = QQ> = I


R is an upper triangular matrix.
QR is associated with Gram-Schmidt process
Consider the matrix A=[a1 , a2 , . . . , aK ]
The G-S process convert {a1 , . . . , aK } to an orthonormal basis {e1 , . . . , eK }
by
u1 = a1 , e1 = ku11 k2 u1
Pk huj ,uk i
uk = ak j=1 ku k2 uj , ek = kuk1 k2 uk , k = 2, . . . , K.
j 2

The relationship

hei , aj i, i j
Q = [e1 , e2 , . . . , eK ]; Ri,j =
0, otherwise.

numpy.linalg.qr: Compute the QR factorization of a matrix.

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 18 / 43
Calibration: Refinement
Estimate calibration matrix M from 8 point correspondences.
Decompose M into internal and external mappings M = Mint Mext
1 Estimating translation: Let T̃ denote the estimation of T
In non-homogenous co-ordinates. Recall that Pc = R(Pw T ). Then, for the
world point T , we have
Pc = R(T T ) = 0.
In homogenous co-ordinates,
Mext T̃ = 0 =) M T̃ = Mint R [I3 , T ] T̃ = 0.
Thus, T̃ can be estimated by finding the eigenvector of the eigenvalue of M with
smallest magnitude.
2 Estimating camera rotation and intrinsic parameters.
Recall that
⇥ ⇤
M = Mint Mext = Mint R [I3 , T ] = M 0 [I3 , T ] = M 0 , M 0 T
where M 0 = Mint R, Mint : upper-triangular, R :unitary matrix.
Running the QR decompostion on the left 3 ⇥ 3 block of M , M [0 : 2, 0 : 2], to
have R and Mint .
Updating M using estimated Mext and Mint , and using it as an initial for
some iterative scheme for solving
X
min pk ⇥ M P~k |2 ,
|~
M
k
e.g. Deepest descent method for several iterations.
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 19 / 43
3D vision: Stereosis
Two viewing points provide disparity, which translates to depth

Multi-view images can provide a 3D model of object

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 20 / 43
epth from Two Views: Stereo
Stereo vision
For an point in the world, its single view cannot determine its location in
All points
space
on projective line to P map to p

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 21 / 43
Stereo
Depth vision
from Two Views: Stereo
For an point in the world, 2 different views can determine its location in
I can get 3D!
space

The 3D co-ordinates of the point P can be found by triangulation.


Figure: I can get a point in 3D by triangulation!
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 22 / 43
Pixel correspondence

Triangulation needs dense pixel correspondence


Ransac-based matching only gives sparse point corresponce

It appears that given a pixel in left image, finding its correspondence in


right image needs to run 2D search over the whole image
How do we match a point in the first image to a point in the second? How
can we constrain our search?

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 23 / 43
Epipolar geometry: Pixel correspondence
Find pairs of points that correspond to same scene point

Epipolar geometry: For each pixel, its correspondence lies in a line,


epipolar line.

Consider two parallel cameras with


same focal length. For each pixel
in left, its correspondence lies in a
horizontal scan line

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 24 / 43
Epipolar geometry

epipoles e, e0
intersection of baseline with image plane
projection of projection center in other image
vanishing point of camera motion direction
epipolar plane = plane containing baseline (1-D family)
epipolar line = intersection of epipolar plane with image
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 25 / 43
Example
Converging camera

Forward camera motion

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 26 / 43
Calibrated camera: essential matrix

Suppose we know intrinsic mapping of cameras: Mint and Mint 0


.
Convert to normalized coordinates p by pre-multiplying all points p̂ with
the inverse of the calibration matrix
p = Mint1 p̂; p0 = Mint1 p̂0 .
Set the first camera’s coordinate system as world coordinates and define
R and t that map from X’ to X
p = P ; p0 = P 0 ; P = RP 0 + t
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 27 / 43
The relationship
p = P = RP 0 + t = Rp0 + t
showed that the vectors p, t and Rp0 are co-planar as the vector p is
summation of the vector t and Rp0 . That means,

p ? (t ⇥ Rp0 ) ! p> (t ⇥ Rp0 ) = 0 ! p> [t⇥ ]Rp0 = 0,

where 2 3 (
0 t3 t2 T
p = (u, v, 1)
[t⇥ ] = 4 t3 0 t1 5 ; T
t2 t1 0 p0 = (u0 , v 0 , 1)

Thus, we have
p> Ep0 = 0 with E = [t⇥ ] R,

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 28 / 43
Uncalibrated camera: fundamental matrix

If we do not know the intrinsic parameters. Then,



p̂ = Mint p; p̂0 = Mint
0
p0
> 0
p Ep = 0.
leads to
p̂> F p̂0
where Fundamental matrix F = Mint> E(Mint
0
) 1
.
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 29 / 43
Properties of fundamental (essential) matrix

Fundamental matrix F :
F is of rank 2,
Has 7 degrees of freedom
There are 9 elements, but scaling can be omitted and det F = 0
Essential matrix E:
E is of rank 2
Its two nonzero singular values.
Has only 5 degrees of freedom, 3 for rotation, 2 for translation

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 30 / 43
8-point algorithm
Recall that fundamental matrix is determined by the correspondence
pairs {(xi , yi )> , (x0i , yi0 )> }ni=1 :

x0>
i F xi = 0,

Let f denote the entries of the matrix F , we have

Af = 0,

where 2 3
x01 x1 x01 y1 x01 y10 y1 y10 x1 y1 1
A = 4 ... .. .. .. .. .. .. .. 7
6
. . . . . . . 5
x0n xn x0n yn x0n yn0 yn yn0 xn yn 1
Normalized 8-point method
Normalize points by shifting to the origin
Computing F by SVD for minimizing Mean squares error (MSE)
Enforce the rank-2 constraints.
Output F by re-shifting back.

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 31 / 43
Triangulation: calibrated camera
Finding P as the midpoint of the common perpendicular to the two rays in
space.

Linear triangulation:
⇢ ⇢
x = MX x ⇥ MX = 0
=) =) AX = 0,
x0 = M 0 X x0 ⇥ M 0 X = 0
where A is determined by the pairs (x, x0 ).
The linear system can be solved by
min kAXk22 , subject to kXk2 = 1.
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 32 / 43
Reconstruction via minimizing geometric error
Finding a pair (x̂, x̂0 ) whose rays intersections and is close to (x, x0 ):
>
b)2 + d(x0 , x
min0 d(x, x b0 )2 , subject to xb0 F x
b = 0,
x̂,x̂

Example of d: d(x, x̂) = kx x̂k22 .

which is equivalent to minimizing the reprojection error:


min d(x, M X)2 + d(x0 , M 0 X)2
X

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 33 / 43
Rectification
Camera rectification for simplifying reconstruction
Re-project image planes onto common plane such as all epipolar lines are
horizontal, i.e. two cameras are parallel

The distance between two optical center T is called the stereo baseline,
and is assumed to be known.
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 34 / 43
Depth and disparity from rectified camera
Point correspondence in rectified camera with baseline T

xl and xr : the matched point pair


xl and xr : the coordinates of the pair in their
own image frame.
The measurement d = xl xr is called
disparity of matched point pair
The distance between two points are
T d = T x` + xr

Notice that
T T d T
= =) Z = f
Z Z f d
Thus, the disparity d is proportional to inverse depth 1
|Z|

1
d / |Z| .

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 35 / 43
Correlation-based dense correspondence
3D scene reconstruction requires dense correspondences

For rectified camera, Correspondence is done as follows


For each epipolar line, for each point in left image, finding point in right
image with closest intensity.
Often, it is not possible without additional constraints, e.g.

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 36 / 43
Window matching
Issue: ambiguity exists when comparing only single intensity
Idea: comparing the neighboring window
Window matching
For each window (e.g. 3 ⇥ 3), match to closest window on epipolar line in
another image.

Two often seemX


matching metrics:
SSD = |f [i, j] g[i, j]|2
[i,j]2⌦
X
Cf,g = f [i, j]g[i, j]
[i,j]2⌦

Additional constraints on dense correspondence


Ordering constraint: order of points in two images is same.
Smooth constraint: disparity doesn’t change too quickly
Uniqueness constraint: each feature at most has one match
Occlusion and disparity are connected.
Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 37 / 43
Demonstration of matching in stereo
Stereo: Parallel Calibrated Cameras
Given a point in left image, we scan the scanline and find one local
window
For eachpatch
pointthat is (x
pl = most similar
l , yl ), how doto Ithe
getone in (x
pr = ther , yleft.
r )? By matching. Patch
The similarity of two patches are measured by matching(xcost,
around (x r , y r )) should look similar to the patch around l , yl ).which
could be SSD or correlation.
The correspondent pixel is the one with the lowest matching cost.

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 38 / 43
Results
Stereo: with different
Parallel patch
Calibrated sizes
Cameras
Smaller patches:
Smaller patches:more
more detail, noisy. Bigger:
detail, but noisy. Bigger:less
less detail,
detail, butbut smooth
smooth

Sanja Fidler CSC420: Intro to Image Understanding 7 / 12


Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 39 / 43
Demonstration

two images

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 40 / 43
Depth map from Stereo

Original image Ground truth

Window matching Window matching w/


constraints

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 41 / 43
Solutions with other sensors

Kinect: Structured infrared light

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 42 / 43
Solutions with other sensors

Lidar in iphone

Ji Hui (National University of Singapore) Visual Information Interpretation: 3D from stereosis October 11, 2021 43 / 43

You might also like