CV M2 -Ktunotes.in

Module II
Local Image Features and Stereo Vision
Downloaded from Ktunotes.in

Image Gradients –
Computing the Image Gradient
Gradient Based Edge and Corner Detection.
Stereopsis-
Binocular Camera Geometry
Epipolar Constraint
Binocular Reconstruction
Local Methods for Binocular Fusion
Global Methods for Binocular Fusion.

Image Gradients
Image gradient the directional change in image intensity.
Measure of change in
image function f(x,y)
in x and y

By computing gradient for a small area of image and repeating
the process for entire image, edges can be detected in the
image.
Edge detection is pervasive in several applications such as

finger print matching , medical diagnosis and number plate
detection.
These applications basically highlights the areas where image

intensity changes drastically and ignore everything else.
Sharp changes in brightness cause large image gradients

COMPUTING THE IMAGE GRADIENT
These kinds of derivative estimates are known as finite differences.

Gradient computation involves computation of gradient magnitude
and direction.
To compute magnitude, first need to compute gradient in x (Gx)

and y(Gy) directions.
To compute gradient Gx, move from left to right and highlight the
points where image intensity is changing drastically.
This gives points along vertical edges of rectangle.
Similarly to compute gradient Gy, move from top to bottom and

highlight the points where image intensity is changing drastically.
This gives points along horizontal edges of rectangle.

Gradient Direction
Gradient Magnitude

Image noise tends to result in pixels not looking like their neighbours.
So that simple finite differences tend to give strong responses to

noise.
As a result, just taking one finite difference for x and one for y gives
noisy gradient estimates.
The way to deal with this problem is to smooth the image and then
differentiate it

Smoothing works because, any image gradient has effects over a pool of pixels.
For example, the contour of an object can result in a long chain of points where
the image derivative is large.
Smoothing the image before differentiation will tend to suppress noise at the
scale of individual pixels, because it will tend to make pixels look like their
neighbours.
However, gradients that are supported by evidence over multiple pixels will
tend not to be smoothed out.
This suggests differentiating a smoothed image

Without noise
With noise

Smooth First

Derivative of Gaussian Filters
Smoothing an image and then differentiating it is the same as convolving it
with the derivative of a smoothing kernel.
Differentiation is linear and shift invariant.
There is some kernel, it differentiates. That is, given a function I(x, y),
Requires the derivative of a smoothed function.
Write the convolution kernel for the smoothing as S.
Convolution is associative, so

Commonly used form when the smoothing function is a Gaussian
Need only convolve with the derivative of the Gaussian, rather than convolve
and then differentiate.

Gradient-Based Edge Detectors
Sharp changes in image intensity as lying on curves
in the image, which are known as edges.
The curves are made up of edge points.
Many effects can cause edges, each effect that can cause an
edge is not guaranteed to cause an edge.
For example, an object may happen to be the same intensity

as the background, and so the occluding contour will not
result in an edge.
This means that interpreting edge points can be very difficult.

The gradient magnitude can be estimated by smoothing an image and then
differentiating it.
This is equivalent to convolving with the derivative of a smoothing kernel.
The extent of the smoothing affects the gradient magnitude.
At the center, gradient magnitude estimated using the derivatives of a Gaussian with
σ = 1 pixel; and on the right, gradient magnitude estimated using the derivatives of a
Gaussian with σ = 2 pixel.
Large values of the gradient magnitude form thick trails.

Start by computing an estimate of the gradient magnitude.
The gradient magnitude is large along a thick trail in the image, but occluding contours
are curves, so must obtain a curve of the most distinctive points on this trail.

The gradient magnitude can be thought of as a chain of low hills.
Marking local maxima would mark isolated points .
A better criterion is to slice the gradient magnitude along the gradient

direction, which should be perpendicular to the edge, and mark the points
along the slice where the magnitude is maximal.
This would get a chain of points along the crown of the hills in chain.
Each point in the chain can be used to predict the location of the next point,
which will be in a direction roughly at right angles to the gradient at the edge
point.
Forming these chains is called non maximum suppression.

Gradient magnitude is at a maximum along the direction of the gradient.
The dots are the pixel grid.
From pixel q, attempting to determine whether the gradient is at a maximum; the

gradient direction through q does not pass through any convenient pixels in the
forward or backward direction.
Must interpolate to obtain the values of the gradient magnitude at p and r.
If the value at q is larger than both, q is an edge point.

On the right, sketch how to find candidates for the next edge point
given that q is an edge point.
An appropriate search direction is perpendicular to the gradient, so

that points s and r should be considered for the next edge point.
No need to restrict to pixel points on the image grid, because the

predicted position lies between s and r.
Hence, again interpolate to obtain gradient values for points off the
grid.

Stereopsis
Fusing the pictures recorded by our two eyes and exploiting the
difference (or disparity) between them allows to gain a strong sense of
depth.
The design and implementation of algorithms that mimic the ability to

perform this task, known as stereopsis.
Applications
1. Image segmentation for object recognition.
2. Construction of three-dimensional scene models.
3. Robot navigation
Stereo vision involves two processes:
1. The fusion of features observed by two (or more) eyes.
2. The reconstruction of their three-dimensional preimage.

Binocular Camera Geometry
Given a stereo image pair, any pixel in the first (or left) image may match any
pixel in the second (or right) one.
Matching pairs of pixels are in fact restricted to lie on corresponding epipolar

lines in the two pictures.
This constraint plays a fundamental role in the stereo fusion process.

The binocular fusion problem:
Left, there is no ambiguity, and stereo reconstruction is simple.
Right, any of the four points in the left picture may, match any of the four points in the
right one. Only four of these correspondences are correct.
The other ones yield the incorrect reconstructions shown as small gray discs.

Epipolar Geometry
Consider the images p and p’ of a point P observed by two cameras with optical
centers O and O’.
These five points all belong to the epipolar plane defined by the two intersecting rays
OP and O’P.
In particular, the point p’ lies on the line l’ where this plane and the retina Π ‘ of the
second camera intersect.
The line l’ is the epipolar line associated with the point p, and it
passes through the point e’ where the baseline joining the optical
centers O and O’ intersects Π’.
Likewise, the point p lies on the epipolar line l associated with the
point p, and this line passes through the intersection e of the
baseline with the plane Π.
The points e and e’ are called the epipoles of the two cameras.

The epipole e’ is the projection of the optical center O of the first camera in
the image observed by the second camera, and vice versa.
As noted before, if p and p’ are images of the same point, then p’ must lie on
the epipolar line associated with p.
This epipolar constraint plays a fundamental role in stereo vision and motion
analysis.

Most difficult part of constructing an artificial stereo vision system
is to find effective methods for establishing correspondences
between the two images.
Deciding which points in the second picture match the points in

the first one.
The epipolar constraint greatly limits the search for these

correspondences.

Assume that the rig is calibrated, the coordinates of the point p
completely determine the ray joining O and p, and thus the
associated epipolar plane OO’p and epipolar line l’.
The search for matches can be restricted to this line instead of the
whole image
Epipolar constraint: Given a calibrated stereo rig, the set of possible matches
for the point p is constrained to lie on the associated epipolar line l’.

The Essential Matrix
The intrinsic parameters of each camera are known, and work in normalized image
coordinates.
Take According to the epipolar constraint, the three vectors
must be coplanar.
Equivalently, one of them must lie in the plane spanned by the other two, or
rewrite this coordinate-independent equation in the coordinate frame associated

to the first camera as

where p and p’ denote the homogeneous normalized image coordinate vectors of
p and p’, t is the coordinate vector of the translation separating the two
coordinate systems.
is the rotation matrix such that a free vector with coordinates w’ in the second
coordinate system has coordinates in the first one.
The two projection matrices are given in the coordinate system attached to the first
camera by
Equation (7.1) can be rewritten as
The matrix is called the essential matrix.

The Fundamental Matrix
The fundamental matrix also tells how pixels (points) in each image are related to
epipolar lines in the other image.

BINOCULAR RECONSTRUCTION
Given a calibrated stereo rig and two matching image points p and p’
In principle straightforward to reconstruct the corresponding scene point

by intersecting the two rays R = Op and R’ = O’p’.
However, the rays R and R’ never actually intersect in practice, due to

calibration and feature localization errors.
Method 1.
Consider the line segment perpendicular to R and R’ that intersects both

rays its mid-point P is the closest point to the two rays and can be taken
as the preimage of p and p’.

Method -2
A scene point can be reconstructed using a purely algebraic

approach:
Given the projection matrices M and M’ and the matching points p

and p’.
Rewrite the constraints Zp =MP and Z’p’ =MP as
This is a system of four independent linear equations in the homogeneous coordinates of

P that is solved using the linear least-squares techniques

Method 3
Estimate scene point Q as the point that minimizes summed

squared distance between p and q, and p’ and q’

Image Rectification
Image rectification is a transformation process used to project images onto a
common image plane
The calculations associated with stereo algorithms are often considerably simplified
when the images of interest have been rectified—that is, replaced by two equivalent
pictures with a common image plane parallel to the baseline joining the two optical
Centers.
The rectification process can be implemented by projecting the original pictures onto
the new image plane.
With an appropriate choice of coordinate system, the rectified epipolar lines are scan
lines of the new images, and they are also parallel to the baseline.

There are two degrees of freedom involved in the choice of the rectified
image plane:
(a) the distance between this plane and the baseline, which is essentially
irrelevant because modifying it only changes the scale of the rectified
pictures—an effect easily balanced by an inverse scaling of the image
coordinate axes; and
(b) the direction of the rectified plane normal in the plane perpendicular to
the baseline.

Image rectification warps both images such that they appear as if they have
been taken with only a horizontal displacement and as a consequence all
epipolar lines are horizontal, which slightly simplifies the stereo matching
process.
Local Methods for Binocular Fusion
Simple methods for stereo fusion that exploit local information, such as the
similarity of brightness patterns near candidate matches, to establish
correspondences
Correspondence using window matching
Matching points across images important for:

object identification (instance recognition)
object (class) recognition
motion estimate

1. Correlation
Correlation methods find pixel-wise image correspondences by comparing

intensity profiles in the neighbourhood of potential matches, and they are among
the first techniques proposed to solve the binocular fusion problem.
Consider a rectified stereo pair and a point (x, y) in the first image
Associate the window of size p = (2m+1) × (2n + 1) centered in (x, y).
The vector w(x, y) ∈ Rp obtained by scanning the window values one row at a
time.
Now, given a potential match (x+d, y) in the second image, construct a second
vector w’ (x+d, y) and define the corresponding normalized correlation function as
Cosine similarity
The normalized correlation function C clearly ranges from −1 to +1.

2. Multi-Scale Edge Matching
Slanted surfaces pose problems to correlation-based matchers
correspondences should be found at a variety of scales, with matches

between physically significant image features such as edges, preferred to
matches between raw pixel intensities

A Laplacian filter is one of edge detectors used
to compute the second spatial derivatives of an
image. It measures the rate at which the first
derivatives changes.

Zero Crossing -- points where the Laplacian changes sign

GLOBAL METHODS FOR BINOCULAR FUSION
Global approaches to stereo fusion, that formulate the

minimization of a function incorporating ordering or smoothness
constraints among adjacent pixels.
Ordering Constraints and Dynamic Programming
The order of matching image features along a pair of epipolar lines is the
inverse of the order of the corresponding surface attributes along the curve
where the epipolar plane intersects the observed object’s boundary.

Ordering Constraints
The order of feature points along the two (oriented) epipolar

lines is the same.

Ordering Constraints
A small object lies in front of a larger one.
Some of the surface points are not visible in one of the images.
(A is not visible in the right image)
Order of the image points is not the same in the two pictures.
b is on the right of d in theDownloaded

left image,from
but bKtunotes.in
is on the left of d in the right image.
Dynamic programming is used to establish stereo correspondences.
Dynamic programming is a combinatorial optimization algorithm

aimed at minimizing an error function (a path cost) over some
discrete variables (correspondences between pairs of features).
Assume that a number of feature points (say, edges) have been

found on corresponding epipolar lines.
Objective here is to match the intervals separating those points

along the two intensity profiles.
According to the ordering constraint, the order of the feature points

must be the same, although the occasional interval in either image
may be reduced to a single point corresponding to missing
correspondences associated with occlusion and/or noise.
This setting allows to recast the matching problem as the
optimization of a path’s cost over a graph whose nodes
correspond to pairs of left and right image features; and arcs
represent matches between left and right intensity profile
intervals bounded by the features of the corresponding nodes.
The cost of an arc measures the discrepancy between the

corresponding intervals (e.g., the squared difference of the
mean intensity values).
This optimization problem can be solved, exactly and efficiently,

using dynamic programming

Two intensity profiles along An arc (thick line segment) joins
matching epipolar lines. The two nodes (i, i’) and (j, j’) when the
polygons joining the two profiles intervals (i, j) and (i’, j’) of the
indicate matches between intensity profiles match each other
successive intervals (some of the
matched intervals may have
zero length).

End

CV M2 -Ktunotes.in

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CV M2 -Ktunotes.in

Uploaded by

Copyright:

Available Formats

Module II

Local Image Features and Stereo Vision

Downloaded from Ktunotes.in

Computing the Image Gradient

Gradient Based Edge and Corner Detection.

Binocular Camera Geometry

Local Methods for Binocular Fusion

Global Methods for Binocular Fusion.

Downloaded from Ktunotes.in

Edge detection is pervasive in several applications such as

These applications basically highlights the areas where image

Sharp changes in brightness cause large image gradients

Downloaded from Ktunotes.in

These kinds of derivative estimates are known as finite differences.

To compute magnitude, first need to compute gradient in x (Gx)

This gives points along vertical edges of rectangle.

Similarly to compute gradient Gy, move from top to bottom and

This gives points along horizontal edges of rectangle.

Downloaded from Ktunotes.in

Downloaded from Ktunotes.in

So that simple finite differences tend to give strong responses to

Downloaded from Ktunotes.in

This suggests differentiating a smoothed image

Downloaded from Ktunotes.in

Downloaded from Ktunotes.in

Downloaded from Ktunotes.in

Differentiation is linear and shift invariant.

Requires the derivative of a smoothed function.

Write the convolution kernel for the smoothing as S.

Downloaded from Ktunotes.in

Downloaded from Ktunotes.in

The curves are made up of edge points.

For example, an object may happen to be the same intensity

This means that interpreting edge points can be very difficult.

Downloaded from Ktunotes.in

This is equivalent to convolving with the derivative of a smoothing kernel.

The extent of the smoothing affects the gradient magnitude.

Large values of the gradient magnitude form thick trails.

Downloaded from Ktunotes.in

Marking local maxima would mark isolated points .

A better criterion is to slice the gradient magnitude along the gradient

Forming these chains is called non maximum suppression.

Downloaded from Ktunotes.in

From pixel q, attempting to determine whether the gradient is at a maximum; the

Must interpolate to obtain the values of the gradient magnitude at p and r.

If the value at q is larger than both, q is an edge point.

Downloaded from Ktunotes.in

An appropriate search direction is perpendicular to the gradient, so

No need to restrict to pixel points on the image grid, because the

Downloaded from Ktunotes.in

The design and implementation of algorithms that mimic the ability to

Stereo vision involves two processes:

1. The fusion of features observed by two (or more) eyes.

2. The reconstruction of their three-dimensional preimage.

Matching pairs of pixels are in fact restricted to lie on corresponding epipolar

This constraint plays a fundamental role in the stereo fusion process.

Downloaded from Ktunotes.in

Left, there is no ambiguity, and stereo reconstruction is simple.

Downloaded from Ktunotes.in

Downloaded from Ktunotes.in

Downloaded from Ktunotes.in

Deciding which points in the second picture match the points in

The epipolar constraint greatly limits the search for these

Downloaded from Ktunotes.in