Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Module II

Local Image Features and Stereo Vision

Downloaded from Ktunotes.in


Image Gradients –

Computing the Image Gradient

Gradient Based Edge and Corner Detection.

Stereopsis-

Binocular Camera Geometry

Epipolar Constraint

Binocular Reconstruction

Local Methods for Binocular Fusion

Global Methods for Binocular Fusion.


Downloaded from Ktunotes.in
Image Gradients
Image gradient the directional change in image intensity.

Measure of change in
image function f(x,y)
in x and y

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
By computing gradient for a small area of image and repeating
the process for entire image, edges can be detected in the
image.

Edge detection is pervasive in several applications such as


finger print matching , medical diagnosis and number plate
detection.

These applications basically highlights the areas where image


intensity changes drastically and ignore everything else.

Sharp changes in brightness cause large image gradients

Downloaded from Ktunotes.in


COMPUTING THE IMAGE GRADIENT

These kinds of derivative estimates are known as finite differences.


Downloaded from Ktunotes.in
Gradient computation involves computation of gradient magnitude
and direction.

To compute magnitude, first need to compute gradient in x (Gx)


and y(Gy) directions.

To compute gradient Gx, move from left to right and highlight the
points where image intensity is changing drastically.

This gives points along vertical edges of rectangle.

Similarly to compute gradient Gy, move from top to bottom and


highlight the points where image intensity is changing drastically.

This gives points along horizontal edges of rectangle.

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
Gradient Direction

Gradient Magnitude

Downloaded from Ktunotes.in


Image noise tends to result in pixels not looking like their neighbours.

So that simple finite differences tend to give strong responses to


noise.

As a result, just taking one finite difference for x and one for y gives
noisy gradient estimates.

The way to deal with this problem is to smooth the image and then
differentiate it

Downloaded from Ktunotes.in


Smoothing works because, any image gradient has effects over a pool of pixels.

For example, the contour of an object can result in a long chain of points where
the image derivative is large.

Smoothing the image before differentiation will tend to suppress noise at the
scale of individual pixels, because it will tend to make pixels look like their
neighbours.

However, gradients that are supported by evidence over multiple pixels will
tend not to be smoothed out.

This suggests differentiating a smoothed image

Downloaded from Ktunotes.in


Without noise

With noise

Downloaded from Ktunotes.in


Smooth First

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
Derivative of Gaussian Filters
Smoothing an image and then differentiating it is the same as convolving it
with the derivative of a smoothing kernel.

Differentiation is linear and shift invariant.

There is some kernel, it differentiates. That is, given a function I(x, y),

Requires the derivative of a smoothed function.

Write the convolution kernel for the smoothing as S.

Convolution is associative, so

Downloaded from Ktunotes.in


Commonly used form when the smoothing function is a Gaussian

Need only convolve with the derivative of the Gaussian, rather than convolve
and then differentiate.

Downloaded from Ktunotes.in


Gradient-Based Edge Detectors
Sharp changes in image intensity as lying on curves
in the image, which are known as edges.

The curves are made up of edge points.

Many effects can cause edges, each effect that can cause an
edge is not guaranteed to cause an edge.

For example, an object may happen to be the same intensity


as the background, and so the occluding contour will not
result in an edge.

This means that interpreting edge points can be very difficult.

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
The gradient magnitude can be estimated by smoothing an image and then
differentiating it.

This is equivalent to convolving with the derivative of a smoothing kernel.

The extent of the smoothing affects the gradient magnitude.

At the center, gradient magnitude estimated using the derivatives of a Gaussian with
σ = 1 pixel; and on the right, gradient magnitude estimated using the derivatives of a
Gaussian with σ = 2 pixel.

Large values of the gradient magnitude form thick trails.


Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
Start by computing an estimate of the gradient magnitude.

The gradient magnitude is large along a thick trail in the image, but occluding contours
are curves, so must obtain a curve of the most distinctive points on this trail.

Downloaded from Ktunotes.in


The gradient magnitude can be thought of as a chain of low hills.

Marking local maxima would mark isolated points .

A better criterion is to slice the gradient magnitude along the gradient


direction, which should be perpendicular to the edge, and mark the points
along the slice where the magnitude is maximal.

This would get a chain of points along the crown of the hills in chain.

Each point in the chain can be used to predict the location of the next point,
which will be in a direction roughly at right angles to the gradient at the edge
point.

Forming these chains is called non maximum suppression.

Downloaded from Ktunotes.in


Gradient magnitude is at a maximum along the direction of the gradient.
The dots are the pixel grid.

From pixel q, attempting to determine whether the gradient is at a maximum; the


gradient direction through q does not pass through any convenient pixels in the
forward or backward direction.

Must interpolate to obtain the values of the gradient magnitude at p and r.

If the value at q is larger than both, q is an edge point.

Downloaded from Ktunotes.in


On the right, sketch how to find candidates for the next edge point
given that q is an edge point.

An appropriate search direction is perpendicular to the gradient, so


that points s and r should be considered for the next edge point.

No need to restrict to pixel points on the image grid, because the


predicted position lies between s and r.

Hence, again interpolate to obtain gradient values for points off the
grid.

Downloaded from Ktunotes.in


Stereopsis
Fusing the pictures recorded by our two eyes and exploiting the
difference (or disparity) between them allows to gain a strong sense of
depth.

The design and implementation of algorithms that mimic the ability to


perform this task, known as stereopsis.

Applications
1. Image segmentation for object recognition.
2. Construction of three-dimensional scene models.
3. Robot navigation

Stereo vision involves two processes:

1. The fusion of features observed by two (or more) eyes.

2. The reconstruction of their three-dimensional preimage.


Downloaded from Ktunotes.in
Binocular Camera Geometry
Given a stereo image pair, any pixel in the first (or left) image may match any
pixel in the second (or right) one.

Matching pairs of pixels are in fact restricted to lie on corresponding epipolar


lines in the two pictures.

This constraint plays a fundamental role in the stereo fusion process.

Downloaded from Ktunotes.in


The binocular fusion problem:

Left, there is no ambiguity, and stereo reconstruction is simple.

Right, any of the four points in the left picture may, match any of the four points in the
right one. Only four of these correspondences are correct.

The other ones yield the incorrect reconstructions shown as small gray discs.

Downloaded from Ktunotes.in


Epipolar Geometry

Consider the images p and p’ of a point P observed by two cameras with optical
centers O and O’.

These five points all belong to the epipolar plane defined by the two intersecting rays
OP and O’P.

In particular, the point p’ lies on the line l’ where this plane and the retina Π ‘ of the
second camera intersect.
Downloaded from Ktunotes.in
The line l’ is the epipolar line associated with the point p, and it
passes through the point e’ where the baseline joining the optical
centers O and O’ intersects Π’.

Likewise, the point p lies on the epipolar line l associated with the
point p, and this line passes through the intersection e of the
baseline with the plane Π.

The points e and e’ are called the epipoles of the two cameras.

Downloaded from Ktunotes.in


The epipole e’ is the projection of the optical center O of the first camera in
the image observed by the second camera, and vice versa.

As noted before, if p and p’ are images of the same point, then p’ must lie on
the epipolar line associated with p.

This epipolar constraint plays a fundamental role in stereo vision and motion
analysis.

Downloaded from Ktunotes.in


Most difficult part of constructing an artificial stereo vision system
is to find effective methods for establishing correspondences
between the two images.

Deciding which points in the second picture match the points in


the first one.

The epipolar constraint greatly limits the search for these


correspondences.

Downloaded from Ktunotes.in


Assume that the rig is calibrated, the coordinates of the point p
completely determine the ray joining O and p, and thus the
associated epipolar plane OO’p and epipolar line l’.

The search for matches can be restricted to this line instead of the
whole image

Epipolar constraint: Given a calibrated stereo rig, the set of possible matches
for the point p is constrained to lie on the associated epipolar line l’.

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
The Essential Matrix

The intrinsic parameters of each camera are known, and work in normalized image
coordinates.

Take According to the epipolar constraint, the three vectors

must be coplanar.

Equivalently, one of them must lie in the plane spanned by the other two, or

rewrite this coordinate-independent equation in the coordinate frame associated


to the first camera as

Downloaded from Ktunotes.in


where p and p’ denote the homogeneous normalized image coordinate vectors of
p and p’, t is the coordinate vector of the translation separating the two
coordinate systems.

is the rotation matrix such that a free vector with coordinates w’ in the second
coordinate system has coordinates in the first one.

The two projection matrices are given in the coordinate system attached to the first
camera by

Equation (7.1) can be rewritten as

The matrix is called the essential matrix.


Downloaded from Ktunotes.in
The Fundamental Matrix

The fundamental matrix also tells how pixels (points) in each image are related to
epipolar lines in the other image.

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
BINOCULAR RECONSTRUCTION
Given a calibrated stereo rig and two matching image points p and p’

In principle straightforward to reconstruct the corresponding scene point


by intersecting the two rays R = Op and R’ = O’p’.

However, the rays R and R’ never actually intersect in practice, due to


calibration and feature localization errors.

Method 1.

Consider the line segment perpendicular to R and R’ that intersects both


rays its mid-point P is the closest point to the two rays and can be taken
as the preimage of p and p’.

Downloaded from Ktunotes.in


Method -2

A scene point can be reconstructed using a purely algebraic


approach:

Given the projection matrices M and M’ and the matching points p


and p’.

Rewrite the constraints Zp =MP and Z’p’ =MP as

This is a system of four independent linear equations in the homogeneous coordinates of


P that is solved using the linear least-squares techniques

Downloaded from Ktunotes.in


Method 3

Estimate scene point Q as the point that minimizes summed


squared distance between p and q, and p’ and q’

Downloaded from Ktunotes.in


Image Rectification
Image rectification is a transformation process used to project images onto a
common image plane

The calculations associated with stereo algorithms are often considerably simplified
when the images of interest have been rectified—that is, replaced by two equivalent
pictures with a common image plane parallel to the baseline joining the two optical
Centers.

The rectification process can be implemented by projecting the original pictures onto
the new image plane.

With an appropriate choice of coordinate system, the rectified epipolar lines are scan
lines of the new images, and they are also parallel to the baseline.

Downloaded from Ktunotes.in


There are two degrees of freedom involved in the choice of the rectified
image plane:

(a) the distance between this plane and the baseline, which is essentially
irrelevant because modifying it only changes the scale of the rectified
pictures—an effect easily balanced by an inverse scaling of the image
coordinate axes; and

(b) the direction of the rectified plane normal in the plane perpendicular to
the baseline.

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
Image rectification warps both images such that they appear as if they have
been taken with only a horizontal displacement and as a consequence all
epipolar lines are horizontal, which slightly simplifies the stereo matching
process.
Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
Local Methods for Binocular Fusion
Simple methods for stereo fusion that exploit local information, such as the
similarity of brightness patterns near candidate matches, to establish
correspondences
Correspondence using window matching

Matching points across images important for:


object identification (instance recognition)
object (class) recognition
motion estimate

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
1. Correlation

Correlation methods find pixel-wise image correspondences by comparing


intensity profiles in the neighbourhood of potential matches, and they are among
the first techniques proposed to solve the binocular fusion problem.

Consider a rectified stereo pair and a point (x, y) in the first image

Associate the window of size p = (2m+1) × (2n + 1) centered in (x, y).

The vector w(x, y) ∈ Rp obtained by scanning the window values one row at a
time.

Now, given a potential match (x+d, y) in the second image, construct a second
vector w’ (x+d, y) and define the corresponding normalized correlation function as

Cosine similarity

The normalized correlation function C clearly ranges from −1 to +1.


Downloaded from Ktunotes.in
Downloaded from Ktunotes.in
2. Multi-Scale Edge Matching

Slanted surfaces pose problems to correlation-based matchers

correspondences should be found at a variety of scales, with matches


between physically significant image features such as edges, preferred to
matches between raw pixel intensities

Downloaded from Ktunotes.in


A Laplacian filter is one of edge detectors used
to compute the second spatial derivatives of an
image. It measures the rate at which the first
derivatives changes.

Downloaded from Ktunotes.in


Zero Crossing -- points where the Laplacian changes sign

Downloaded from Ktunotes.in


GLOBAL METHODS FOR BINOCULAR FUSION

Global approaches to stereo fusion, that formulate the


minimization of a function incorporating ordering or smoothness
constraints among adjacent pixels.
Ordering Constraints and Dynamic Programming

The order of matching image features along a pair of epipolar lines is the
inverse of the order of the corresponding surface attributes along the curve
where the epipolar plane intersects the observed object’s boundary.

Downloaded from Ktunotes.in


Ordering Constraints

The order of feature points along the two (oriented) epipolar


lines is the same.

Downloaded from Ktunotes.in


Ordering Constraints

A small object lies in front of a larger one.

Some of the surface points are not visible in one of the images.
(A is not visible in the right image)

Order of the image points is not the same in the two pictures.

b is on the right of d in theDownloaded


left image,from
but bKtunotes.in
is on the left of d in the right image.
Dynamic programming is used to establish stereo correspondences.

Dynamic programming is a combinatorial optimization algorithm


aimed at minimizing an error function (a path cost) over some
discrete variables (correspondences between pairs of features).

Assume that a number of feature points (say, edges) have been


found on corresponding epipolar lines.

Objective here is to match the intervals separating those points


along the two intensity profiles.

According to the ordering constraint, the order of the feature points


must be the same, although the occasional interval in either image
may be reduced to a single point corresponding to missing
correspondences associated with occlusion and/or noise.
Downloaded from Ktunotes.in
This setting allows to recast the matching problem as the
optimization of a path’s cost over a graph whose nodes
correspond to pairs of left and right image features; and arcs
represent matches between left and right intensity profile
intervals bounded by the features of the corresponding nodes.

The cost of an arc measures the discrepancy between the


corresponding intervals (e.g., the squared difference of the
mean intensity values).

This optimization problem can be solved, exactly and efficiently,


using dynamic programming

Downloaded from Ktunotes.in


Two intensity profiles along An arc (thick line segment) joins
matching epipolar lines. The two nodes (i, i’) and (j, j’) when the
polygons joining the two profiles intervals (i, j) and (i’, j’) of the
indicate matches between intensity profiles match each other
successive intervals (some of the
matched intervals may have
zero length).

Downloaded from Ktunotes.in


Downloaded from Ktunotes.in
End

Downloaded from Ktunotes.in

You might also like