Autonomous Navigatiion Stereo Vision

AUTONOMOUS NAVIGATION USING STEREO VISUAL ODOMETRY
1. INTRODUCTION
Localization is an essential feature for autonomous vehicles and therefore

Visual Odometry has been a well investigated area in robotics vision.
Visual Odometry helps augment the information where conventional
sensors such as wheel odometer and inertial sensors such as gyroscopes
and accelerometers fail to give correct information. Visual odometry
estimates vehicle motion from a sequence of camera images from an
onboard camera. It produces full 6-DOF (degrees of freedom) motion
estimate, that is the translation along the axis and rotation around each of
co-ordinate axis.
FIG 1: 6-DOF (degrees of freedom)

The term ‘Visual Odometry’ was selected for its resemblance to wheel
odometry. In wheel odometry the trajectory of a vehicle is estimated by
computing the number of turns of its wheels over time. Similarly, In VO
the position of the vehicle is estimated with the changes-that motion-
induces on-the images obtained from the stereo cameras which is
attached on the vehicle. For VO to work in an efficient manner, there
must be enough amount of light illumination in the environment and
cameras should capture consecutive frames ensuring that there is enough
amount of scene overlap. VO has most importance at environments like
aerial and underwater.
DEPT of Electronics Engineering MPTC MattakkaraPage 1
2. IMPORTANCE AND ACCURACY OF VISUAL ODOMETRY
IMPORTANCE: Visual odometry allows for enhanced navigational

accuracy in robots or vehicles using any type of locomotion on any
surface. VO is unaffected by wheel slippage in uneven terrains or other
unfavorable conditions. Furthermore, VO works effectively in GPS-
denied environments. Different from laser and sonar localization
systems, VO does not emit any detectable energy into the environment.
Moreover, compared with GPS, VO does not require the existence of
other signals. Compared with the use of other sensors, the use of cameras
for robot localization has the advantages of cost reduction, allowing for a
simple integration of ego-motion data into other vision-based algorithms,
such as obstacle, pedestrian and lane detection, and without the need for
calibration between sensors. Cameras are small, cheap, lightweight, low
powered, and versatile. Thus, they can also be employed in any vehicle
(land, underwater, air) and for other robotic tasks (e.g., object detection
and recognition).
ACCURACY: Visual odometry is an inexpensive and alternative

odometry technique. That is more accurate than conventional techniques,
such as GPS, Wheel Odometry and Sonar localization system with a
relative position error ranging from 0.1 to 2%. The rate of local drift
under VO is smaller than the drift rate of wheel encoders and low-
precision INS. VO can be integrated with GPS and INS for maximum
accuracy.

3. TYPES OF VISUAL ODOMETRY
Monocular Visual Odometry: A single camera is used to capture

motion. Usually a five-point relative pose estimation method is used to
estimate motion, motion computed is on a relative scale. Typically used
in hybrid methods where other sensor data is also available.
Stereo Visual Odometry: A calibrated stereo camera pair is used
which helps compute the feature depth between images at various time
points. Computed output is actual motion (on scale). If only faraway
features are tracked then degenerates to monocular case.
FIG 2: Stereo vision camera
4. STEREO VISION
Stereo vision uses two or more cameras, but the condition is that cameras
must be coplanar and parallel to each other. An object in 3D space can be
found using stereo vision. An example for stereo vision is the human
visual system. Each person has two eyes that see two slightly different
views of the observers environment. An object seen by the right eye is in
a slightly different position in the observers field of view than an object
seen by the left eye.
FIG 3:Example for Stereo vision
5. ALGORITHM DESCRIPTION

FIG 4: Stereo Visual Odometry Pipeline

We have used KITTI visual odometry dataset for experimentation. All
the computation is done on gray scale images.
KITTI (Karlsruhe Institute of Technology and Toyota Technological
Institute) is one of the most popular datasets for use in mobile robotics
and autonomous driving.
In digital images, grayscale means that the value of each pixel represents
only the intensity information of the light. Such images typically display
only the darkest black to the brightest white. In other words, the image
contains only black, white, and gray colors, in which gray has multiple
levels.
FIG 5: Example for gray scale image
6. INPUT IMAGE SEQUENCE

An image sequence is a series of sequential still images that represent

frames of an animation. Commonly, the images are saved within one
folder and are labeled with an incrementing file name in order to preserve
the chronological order.Chronological order is the order in which the
events occurred, from first to last. They have the same pixel resolution,
size and file format.
Capture stereo image pair at time T and T+1. The images are then
processed to compensate for lens distortion.Lens distortion is a
deviation from the ideal projection considered in pinhole camera model.
It is a form of optical aberration in which straight lines in the scene do
not remain straight in an image. Examples of lens distortions are barrel
distortion and pincushion distortion.
FIG 6: Example of lens distortion
To simplify the task of disparity map computation stereo rectification is

done so that epipolar lines become parallel to horizontal.
Disparity map refers to the apparent pixel difference or motion between
a pair of stereo images. To experience this, try closing one of your eyes
and then rapidly close it while opening the other. Objects that are close to
you will appear to jump a significant distance while objects further away
will move very little.

FIG 7: Example for disparity

Image stereo-rectification is the process by which two images of the
same solid scene undergo homographic transforms, so that their
corresponding epipolar lines coincide and become parallel to the x-axis
of image. A pair of stereo-rectified images is helpful for dense stereo
matching. Dense stereo matching is finding corresponding points for
each pixel from two or more images obtained at slightly different
viewpoints.

FIG 8: Rectification
In KITTI dataset the input images are already corrected for lens
distortion and stereo rectified.
7. FEATURE DETECTION
Features are generated on left camera image at time T using FAST
(Features from Accelerated Segment Test) corner detector. FAST is a
corner detection method, which could be used to extract feature points and
later used to track and map objects in many computer vision tasks.

FIG 9: FAST corner detection
To accurately compute the motion between image frames, feature

bucketing is used. The image is divided into several non-overlapping
rectangles and a maximum number of feature points with highest
response value are then selected from each bucket.
There are two benefits of bucketing:
i) Input features are well distributed throughout the image which results
in higher accuracy in motion estimation.
ii) Due to less number of features computation complexity of algorithm is

reduced which is a requirement in low-latency applications.
Disparity map for time T is also generated using the left and right image
pair.
8. FEATURE TRACKING
Features generated in previous step are then searched in image at time

T+1. The original paper [1] does feature matching by computing the
feature descriptors and then comparing them from images at both time
instances. More recent literature uses KLT (Kanade-Lucas-Tomasi)
tracker for feature matching.
KLT is an implementation, in the C programming language, of a feature

tracker for the computer vision community. The source code is in the
public domain, available for both commercial and non-commerical use.In
computer vision, the Kanade–Lucas–Tomasi feature tracker is an
approach to feature extraction. It is proposed mainly for the purpose of
dealing with the problem that traditional image registration techniques
are generally costly.

Features from image at time T are tracked at time T+1 using a 15x15
search windows and 3 image pyramid level search. KLT tracker outputs
the corresponding coordinates for each input feature and accuracy and
error measure by which each feature was tracked. Feature points that are
tracked with high error or lower accuracy are dropped from further
computation.
FIG 10: Features at time T
FIG 11: KLT tracked features at time T+1

9. 3D POINT CLOUD GENERATION
Now that we have the 2D points at time T and T+1, corresponding 3D

points with respect to left camera are generated using disparity
information and camera projection matrices. In computer vision a camera
matrix or (camera) projection matrix is a matrix which describes the
mapping of a pinhole camera from 3D points in the world to 2D points in
an image. For each feature point a system of equations is formed for
corresponding 3D coordinates (world coordinates) using left, right image
pair and it is solved using singular value decomposition to obtain 3D
points.
In linear algebra, the singular value decomposition is a factorization of

a real or complex matrix. Singular Value Decomposition (SVD) is a
widely used technique to decompose a matrix into several component
matrices, exposing many of the useful and interesting properties of the
original matrix.

10. INLIER DETECTION

This paper uses an inlier detection algorithm which exploits the rigidity
of scene points to find a subset of consistent 3D points at both time steps.
The key idea here is the observation that although the absolute position
of two feature points will be different at different time points the relative
distance between them remains the same. If any such distance is not
same, then either there is an error in 3D triangulation of at least one of
the two features.

11. MOTION ESTIMATION

Frame to frame camera motion is estimated by minimizing the image re-
projection error for all matching feature points. Image re-projection here
means that for a pair of corresponding matching points Ja and Jb at time
T and T+1, there exits corresponding world coordinates Wa and Wb. The
world coordinates are re-projected back into image using a transform
(delta) to estimate the 2D points for complementary time step and the
distance between the true and projected 2D point is minimized using
Levenberg-Marquardt least square optimization. In mathematics and
computing, the Levenberg–Marquardt algorithm (LMA or just LM), also
known as the damped least-squares (DLS) method, is used to solve non
linear least square problems.

FIG 12: Image Reprojection : Wb --> Ja and Wa --> Jb
12. TRAJECTORY COMPUTATION
We have implemented above algorithm using Python 3 and OpenCV 3.0

and source code is maintained here. KITTI visual odometry dataset is
used for evaluation.


FIG 13: Output trajectory

13. ADVANTAGES
Allows for enhanced navigational accuracy in robots or vehicles using

any type of locomotion on any surface. It is more accurate than
conventional techniques and also it is an inexpensive odometry
technique.The use of a consumer-grade camera instead of expensive
sensors or systems, such as GPS, INS, and laser-based localization
systems, is a straightforward and inexpensive method to estimate
location. Unaffected by wheel slippage in uneven terrains or other
unfavorable conditions and works effectively in GPS-denied
environments.

14. APPLICATIONS
Visual odometry has a wide range of applications and has been

effectively applied in several fields. Its application domains include
robotics, automotive, and wearable computing.VO is applied in many
types of mobile robotic systems, such as ground, underwater, aerial, and
space robots. In space exploration, for example, VO is used to estimate
the ego-motion of the NASA Mars rovers.Egomotion is defined as the
3D motion of a camera within an environment.NASA utilizes VO to
track the motion of the rovers as a supplement to dead reckoning.
VO is mainly used for navigation and to reach targets efficiently as well

as to avoid obstacles while driving. It is also applied in unmanned aerial
vehicles to perform autonomous take-off and landing and point-to-point
navigation. Moreover, VO plays a significant role in autonomous
underwater vehicles and coral-reef inspection systems. Given that the
GPS signal degrades or becomes unavailable in underwater
environments, underwater vehicles cannot rely on GPS for pose
estimation therefore, VO is considered a cost-effective solution for
underwater localization systems. In the automotive industry, VO also
plays a big role. It is applied in numerous driver assistance systems, such
as vision-based assisted braking systems. It is also used in agricultural
field robots to estimate the robot’s position relative to the crops.

15. CONCLUTION
VO is the localization of a robot using only a stream of images acquired

from a camera attached to the robot. VO is a highly accurate solution to
estimate the ego-motion of robots. It can avoid most of the drawbacks of
other sensors. VO is an inexpensive solution and is unaffected by wheel
slippage in uneven terrains.The main challenges in VO systems are
related to computational cost and light and imaging conditions (i.e.,
directional sunlight, shadows, image blur, and image scale/rotation
variance). Most of the VO systems proposed in existing literature fail or
cannot work effectively in outdoor environments with shadows and
directional sunlight. Shadows and directional sunlight have negative
effects that disturb the estimation of pixel displacement between image
frames and lead to errors in vehicle position estimation.

16. REFERENCE
[1] A. Howard. Real-time stereo visual odometry for autonomous ground

vehicles. In IEEE Int. Conf. on Intelligent Robots and Systems , Sep
2008
[2] http://www.cvlibs.net/datasets/kitti/eval_odometry.php
[3] C. B. Choy, J. Gwak, S. Savarese and M. Chandraker. Universal

Correspondence Network. NIPS , 2016


Autonomous Navigatiion Stereo Vision

Uploaded by

Copyright:

Available Formats

You might also like

Autonomous Navigatiion Stereo Vision

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Autonomous Navigatiion Stereo Vision

Uploaded by

Copyright:

Available Formats

AUTONOMOUS NAVIGATION USING STEREO VISUAL ODOMETRY

Localization is an essential feature for autonomous vehicles and therefore

FIG 1: 6-DOF (degrees of freedom)

2. IMPORTANCE AND ACCURACY OF VISUAL ODOMETRY

IMPORTANCE: Visual odometry allows for enhanced navigational

ACCURACY: Visual odometry is an inexpensive and alternative

DEPT of Electronics Engineering MPTC MattakkaraPage 2

3. TYPES OF VISUAL ODOMETRY

Monocular Visual Odometry: A single camera is used to capture

FIG 2: Stereo vision camera

FIG 3:Example for Stereo vision

DEPT of Electronics Engineering MPTC MattakkaraPage 4

FIG 4: Stereo Visual Odometry Pipeline

FIG 5: Example for gray scale image

6. INPUT IMAGE SEQUENCE

DEPT of Electronics Engineering MPTC MattakkaraPage 5

An image sequence is a series of sequential still images that represent

FIG 6: Example of lens distortion

To simplify the task of disparity map computation stereo rectification is

DEPT of Electronics Engineering MPTC MattakkaraPage 6

FIG 7: Example for disparity

DEPT of Electronics Engineering MPTC MattakkaraPage 7

DEPT of Electronics Engineering MPTC MattakkaraPage 8

FIG 9: FAST corner detection

To accurately compute the motion between image frames, feature

There are two benefits of bucketing:

ii) Due to less number of features computation complexity of algorithm is

Features generated in previous step are then searched in image at time

KLT is an implementation, in the C programming language, of a feature

DEPT of Electronics Engineering MPTC MattakkaraPage 9

FIG 10: Features at time T

FIG 11: KLT tracked features at time T+1

DEPT of Electronics Engineering MPTC MattakkaraPage 10

9. 3D POINT CLOUD GENERATION

Now that we have the 2D points at time T and T+1, corresponding 3D

In linear algebra, the singular value decomposition is a factorization of

DEPT of Electronics Engineering MPTC MattakkaraPage 11

10. INLIER DETECTION

DEPT of Electronics Engineering MPTC MattakkaraPage 12

11. MOTION ESTIMATION

DEPT of Electronics Engineering MPTC MattakkaraPage 13

FIG 12: Image Reprojection : Wb --> Ja and Wa --> Jb

12. TRAJECTORY COMPUTATION

We have implemented above algorithm using Python 3 and OpenCV 3.0

DEPT of Electronics Engineering MPTC MattakkaraPage 14

DEPT of Electronics Engineering MPTC MattakkaraPage 15

FIG 13: Output trajectory

DEPT of Electronics Engineering MPTC MattakkaraPage 16

Allows for enhanced navigational accuracy in robots or vehicles using

DEPT of Electronics Engineering MPTC MattakkaraPage 17

Visual odometry has a wide range of applications and has been

VO is mainly used for navigation and to reach targets efficiently as well

DEPT of Electronics Engineering MPTC MattakkaraPage 18

VO is the localization of a robot using only a stream of images acquired

DEPT of Electronics Engineering MPTC MattakkaraPage 19

[1] A. Howard. Real-time stereo visual odometry for autonomous ground

[3] C. B. Choy, J. Gwak, S. Savarese and M. Chandraker. Universal

DEPT of Electronics Engineering MPTC MattakkaraPage 20