CU Thesis Template

Odometry using sensor fusion of stereo vision, IMU & GPS
by
Muhammad Abdul Wasae
A thesis submitted to the Faculty of Graduate and Postdoctoral

Affairs in partial fulfillment of the requirements for the degree of
Master of Applied Science
in
Mechanical Engineering
Carleton University
Ottawa, Ontario
© 2023
Muhammad Abdul Wasae
Abstract
In this thesis, the author creates a sensor fusion model that combines information from
stereo vision, IMU & GPS to give odometry for autonomous vehicles. The proposed
model solves three main problems in the automobile industry: (1) It gives high precision
and accurate odometry for real time applications where the margin of error is low; (2)
Unlike most models, it does not use LiDAR which is expensive and unpractical; (3) The
model is robust & reliable which means if it loses information from one of the sensors
then it will still give reasonably good pose estimate. The system was tested on the
Karlsruhe Institute of Technology (KITTI) Vision Benchmark Suit.
ii
Acknowledgements
I would like to acknowledge my friends & family. I am grateful to the science
communicators who motivated me to pursue math & engineering for life.
I would like to thank the faculty of Carleton University for motivating me and showing
me how professional math & engineering is like.
I would like to acknowledge my supervisor Jurek Z. Sasiadek who guided me in this
thesis.
iii
Table of Contents
Abstract.............................................................................................................................. ii
Acknowledgements .......................................................................................................... iii
Table of Contents ............................................................................................................. iv
List of Illustrations ......................................................................................................... viii
(Page intentionally left blank) ............................................................................................ x
Chapter 1: Introduction .................................................................................................. 1

1.1 Thesis Motivation & Contribution .................................................................................. 1
1.2 Thesis Summary .............................................................................................................. 2
Chapter 2: Relevant Work .............................................................................................. 3
Chapter 3: Visual Odometry........................................................................................... 6

3.1 Camera Calibration ......................................................................................................... 7
3.2 Stereo Depth .................................................................................................................. 10
3.3 Visual odometry ............................................................................................................ 12
3.3.1 Epipolar geometry..................................................................................................... 13
3.3.2 Scale-Invariant Feature Transform (SIFT) ............................................................... 14
3.3.3 Projective 3-point algorithm ..................................................................................... 14
3.3.4 Random Sample Consensus ...................................................................................... 17
3.3.5 Combined algorithm for visual odometry ................................................................. 18
3.4 Calculating standard deviation from ground truth ........................................................ 19
3.4.1 Position accuracy ...................................................................................................... 19
3.4.2 Orientation accuracy ................................................................................................. 20
Chapter 4: Inertial Measurement Unit ........................................................................ 22

4.1 Bias Error ...................................................................................................................... 23
iv
4.2 Euler angle .................................................................................................................... 24
4.3 Calculating standard deviation from ground truth ........................................................ 25
4.3.1 Position accuracy ...................................................................................................... 25
4.3.2 Orientation accuracy: ................................................................................................ 26
Chapter 5: Global Positioning System (GPS) .............................................................. 27

5.1 Standard deviation from manufacturer’s specificatation .............................................. 29
5.2 Standard deviation of error from ground truth .............................................................. 31
Chapter 6: Sensor Fusion .............................................................................................. 31

6.1 Bayes Rule .................................................................................................................... 33
6.1.1 Bayesian Inference .................................................................................................... 34
6.1.2 Bayes filter ................................................................................................................ 35
6.1.3 Bayes filter algorithm ............................................................................................... 36
6.2 Particle filter .................................................................................................................. 36
6.2.1 Particle filter algorithm ............................................................................................. 38
6.3 Building sensor fusion method...................................................................................... 39
6.3.1 First Model for position using Particle filter ............................................................ 39
6.3.2 Second Model for orientation using Particle filter.................................................... 41
Chapter 7: Experimental Results ................................................................................. 43

7.1 Set up............................................................................................................................. 43
7.2 Sensor’s Description ..................................................................................................... 44
7.3 Position Results ............................................................................................................. 44
7.4 Orientation Results ........................................................................................................ 46
Chapter 8: Conclusion ................................................................................................... 47

8.1 Position estimates .......................................................................................................... 48
8.2 Orientation estimates..................................................................................................... 48
References ........................................................................................................................ 49
v
vi
List of Tables
Table 1 Parameters of camera intrinsic matrix .............................................................. 10
Table 2 Types of GPS services and their respective accuracy....................................... 30
Table 3 Circular error probable...................................................................................... 30
Table 4 Sensor types and their respective standard deviation of error for position
estimate ............................................................................................................................. 42
Table 5 Sensor types and their respective standard deviation of error for orientation
estimate ............................................................................................................................. 42
Table 6 Each sensor model and their respective RMS error for position estimate ........ 46
Table 7 Each sensor model and their respective RMS error for orientation estimate ... 47
vii
List of Illustrations
Illustration 1 Visual Odometry ......................................................................................... 7
Illustration 2 Relation between three reference frames .................................................... 8
Illustration 3 Transformation of world coordinates to image coordinates using camera
intrinsic and extrinsic matrix .............................................................................................. 8
Illustration 4 Camera calibration to find unknown parameters of intrinsic matrix using
corner detection algorithm .................................................................................................. 9
Illustration 5 Color Image of our dataset ........................................................................ 10
Illustration 6 Greyscale image of our dataset ................................................................. 11
Illustration 7 Disparity map calculation using StereoBM algorithm .............................. 11
Illustration 8 Disparity map calculation using StereoSGBM algorithm ......................... 12
Illustration 9 Epipolar geometry ..................................................................................... 13
Illustration 10 SIFT matches ........................................................................................... 14
Illustration 11 Projective 3-point algorithm geometry.................................................... 15
Illustration 12 Four solutions to 4-degree polynomial equation ..................................... 16
Illustration 13 RANSAC model fitting ........................................................................... 18
Illustration 14 Position accuracy for stereo vision .......................................................... 20
Illustration 15 Position accuracy for stereo vision (True scale) ..................................... 20
Illustration 16 Orientation accuracy for stereo vision..................................................... 21
Illustration 17 Robot position & orientation from linear acceleration and angular
velocity .............................................................................................................................. 22
Illustration 18 Bias error ................................................................................................. 23
viii
Illustration 19 Pitch, Roll and Yaw Axes ....................................................................... 24
Illustration 20 Position accuracy for IMU ...................................................................... 26
Illustration 21 Position accuracy for IMU (True scale) .................................................. 26
Illustration 22 Orientation accuracy for IMU ................................................................. 27
Illustration 23 GPS coordinates & ENU coordinates relation ........................................ 29
Illustration 24 Position accuracy for GPS ....................................................................... 31
Illustration 25 Fusion algorithms .................................................................................... 32
Illustration 26 Bayesian inference .................................................................................. 34
Illustration 27 Bayes filter algorithm [] .......................................................................... 36
Illustration 28 Particle filter steps ................................................................................... 37
Illustration 29 Particle filter algorithm [] ........................................................................ 39
Illustration 30 Information flow for position estimate .................................................... 40
Illustration 31 Information flow for orientation estimate ............................................... 41
Illustration 32 Set up of sensors in KITTI dataset .......................................................... 44
Illustration 33 Vision & IMU fused position readings ................................................... 45
Illustration 34 Vision, IMU and GPS fused position readings ........................................ 45
Illustration 35 Vision, IMU and GPS fused position readings (Zoomed in) .................. 46
Illustration 36 Vision and IMU fused orientation readings ............................................ 47
ix
(Page intentionally left blank)
x
Chapter 1: Introduction
Odometry is a method used in robotics and autonomous navigation to estimate the
position and orientation of a mobile robot by analyzing data from its own sensors. The
term "odometry" is derived from the Greek words "odos" (meaning path) and "metron"
(meaning measure), and it essentially involves measuring the robot's path or motion.
There are many ways to do this task. We can use wheel encoders sensors, Inertial
measurement unit, Lidar, anything that can give us any information about robot’s location
and orientation. Every sensor is unique with its advantages and disadvantages. Sensor
fusion is the method used to combine sensors to get a more accurate and reliable estimate
that is better than each individual sensor alone.
1.1 Thesis Motivation & Contribution
Precision and Accuracy:
Precision and Accuracy of sensor is essential in advanced robotics. For example, we have
robots that are now used for operations in hospitals which have to balance force and
displacement which require high precision and accuracy. Autonomous cars need precise
and accurate odometry sensing when they are moving fast, and the margin of error is very
low. In future, machine learning & robotics will enable the technology to have robots at
home which will do tasks for us which will also require a high precision and accurate
1
odometry. Our method improves both precision and accuracy of odometry that is better
than each individual sensor alone.
No Lidar:
Autonomous automobile and robotics companies use Lidar sensor to try to perceive the
environment better, but the additional cost of Lidar sensor is a big issue. Lidar heavy
weight and big size are also problems in both cars and humanoid robots. We have not
used Lidar in our sensor fusion method like most people and made a more practical
method to be adopted by the industry.
Reliability:
Our method is reliable because it does not rely on only one sensor to get odometry. For
example, if a car or a robot is in a place where it does not receive GPS signals like
underground tunnels then it would still predict very accurate and precise odometry from
IMU and vision. And if due to bad weather, we have poor visibility then still our model
will be able to navigate using information from IMU and GPS.
1.2 Thesis Summary
We have used one of the sensor fusion methods called particle filter which is derived
from bayes filter to combine information that we get from vision, IMU and GPS to get
high precision, robust and reliable odometry that does not drift with time.
This requires that we first calculate the pose for each sensor individually and find the
standard deviation of the error by comparing it to the ground truth. Since the error that we
2
get may not be gaussian and linear so we cannot use Kalman filter. We used a derivation
of bayes filter called particle filter which requires less computation and is a very good
candidate for real time robotics application like autonomous cars or robots.
We used KITTI dataset to evaluate our model and then concluded the thesis with our
findings.
Chapter 2: Relevant Work
3
In 1948, Claude Shannon [1] introduced a mathematical way to quantify information and
defined it as a measure of uncertainty or surprise. He developed a unit called "bit" (short
for binary digit) to represent the smallest amount of information. This allowed for the
precise measurement of information content.
Claude Shannon also showed that logic gates can perform computation in his master
thesis at MIT. In 1936, Alan Turing introduced the concept of the Turing machine, a
theoretical model of a simple computational device that can simulate the logic of any
computer algorithm. Both mathematicians started a revolution in information processing.
This gave birth to new fields like computer vision and soon visual odometry which uses
computer vision to find pose (position and orientation) of the robot. The term “Visual
Odometry” was first mentioned by D. Nister, O. Naroditsky, and J. Bergen [2].
In 1980s, Moravec [3] studied the problem of ego-motion estimation of a vehicle by
observing a sequence of images. Moravec also developed corner detection method that he
published in his book "Sensor Fusion in Certainty Grids for Mobile Robots," in 1988.
In 1987, Matthies and Shaffer [4] recovered Moravec’s work and extended it by deriving
the motion error model using 3D Gaussian distributions instead of scalar model.
In 1988, Chris Harris and Mike Stephens [5] developed corner detection operator that
was improvement upon the Moravec’s corner detection method.
4
Most of the early research for visual odometry was done by NASA for Mars exploration
rover by Y. Cheng, M. W. Maimone, and L. H. Matthies [6].
In 2004, David Lowe [7] introduced a method to find image features from scale invariant
key points which proved to be a very powerful tool for visual odometry that is both
robust and accurate. SIFT (scale invariant feature transform) is often used in combination
with RANSAC (Random Sample Consensus) developed by M. A. Fischler and R. C.
Bolles [8].
Visual odometry can be combined with IMU and GPS using multi sensor fusion.
Different definitions of the architectures of multi sensor fusions have been proposed in
literature. Luo and Kay [9]–[11]. One type of architecture is built upon Bayesian
inference.
Most popular of Bayesian inference methods is Kalman filter developed by R. E. Kalman
[12] in 1960. The Kalman filter was used in guidance and navigation of Apollo
spacecraft. Kalman filter limitation is that it can only work for linear models with
gaussian noise. In 1993, Gordon, Salmond and Smith [13] developed particle filter which
can work for nonlinear models and non - gaussian noise.
5
Chapter 3: Visual Odometry
Odometry means to find the pose (position & orientation) of a robot. Visual Odometry is
the method where we use vision to determine the pose of the robot. It has been used in a
wide variety of robotic applications, such as on the Mars Exploration Rovers [14].
6
Illustration 1 Visual Odometry
The method of getting the pose of the robot using vision can be broken into three main
parts:
1. Camera Calibration
2. Stereo Depth
3. Visual Odometry
3.1 Camera Calibration
We have three reference frames in our system. The first reference frame is the Image
coordinate frame which gives the image coordinates of the pixel that correspond to 3-
dimensional point. The second reference frame is the World coordinate frame which
gives us coordinates for 3-dimensional point from world’s perspective and the third
reference frame is Camera coordinate frame which give us coordinates for 3-dimensional
point w.r.t camera’s center.
7
.
Illustration 2 Relation between three reference frames
Illustration 3 Transformation of world coordinates to image coordinates using camera intrinsic and
extrinsic matrix
Where:
(X, Y, Z) are the world coordinates of the point.
(u, v) are the image coordinates of the point.
(cx, cy) are the principal point coordinates (the optical center).
fx and fy are the focal lengths of the camera.
8
s is the scaling factor.
We have four unknowns (fx, fy, cx, cy) that we need to calculate so we can transform
image coordinates into world coordinates.
To find the 4 unknowns, we take an image of chessboard whose world coordinates w.r.t
to one edge can easily be calculated as a chessboard have grid like structure. We now
know the world coordinates of some points and their respective image coordinates.
Illustration 4 Camera calibration to find unknown parameters of intrinsic matrix using corner
detection algorithm
We know the World coordinates and image coordinates so we can calculate the
projection matrix (Intrinsic matrix multiplied by Extrinsic matrix).
Since extrinsic matrix is just the translation in horizontal direction of known length and
no change in orientation, we know extrinsic matrix as well from the baseline of stereo
images. We can now calculate the intrinsic matrix from extrinsic matrix and projection
matrix.
9
The values of 4 unknown parameters of camera intrinsic matrix are shown below:
fx 718.856
fy 718.856
cx 607
cy 185
Table 1 Parameters of camera intrinsic matrix
3.2 Stereo Depth
To find world coordinates from image coordinates, we will have to first create a depth
map that assigns each pixel in an image a depth value.
To create a depth map we will need a disparity map between the stereo images.
We start with left and right images to compute the disparity between them.
Suppose we have an image like this.
Illustration 5 Color Image of our dataset
10
This is an RGB (color) image and it contains 3 times more information than a grayscale
image but in our algorithm for stereo vision color is not useful, instead color images
make things more complicated so we will use a grayscale image for all our calculations.
Illustration 6 Greyscale image of our dataset
Disparity is the difference of pixels (horizontal in our case) that correspond to the same
world point in stereo images. Disparity map can be calculated by using algorithms and
then be used to get depth.
Illustration 7 Disparity map calculation using StereoBM algorithm
11
We notice that StereoBM algorithm gives holes in the information and is not continuous.
Another algorithm that calculates the disparity map better and quicker is with no holes is
StereoSGBM algorithm.
Illustration 8 Disparity map calculation using StereoSGBM algorithm
If we know the disparity, baseline (given) and focal length (given) then we can get the
depth map from disparity map using the formula:
Disparity = (Baseline × Focal Length) / Depth
Where:
Baseline is the distance between the two stereo cameras (camera separation).
Focal Length is the focal length of the camera lens.
3.3 Visual odometry
12
This is the part where we use internal camera matrix and depth map to get pose of the
robot using Epipolar geometry, Swift algorithm, RANSAC algorithm and P3P algorithm.
3.3.1 Epipolar geometry
Illustration 9 Epipolar geometry
Here Points O1, O2 are camera center and Xw is the world point. Our goal is to find the
relative position and orientation of the camera at time t and at next time step t+1. Camera
center at t is O1 and camera center at time t+1 is O2. This relative orientation and
translation between two camera centers can be described by epipolar geometry. Here we
are trying to compute the R and t in the image. The points O1, O2 and Xw make up a
plane called Epipolar plane.
13
3.3.2 Scale-Invariant Feature Transform (SIFT)
SIFT is a computer vision algorithm used for keypoint detection, description, and
matching in images. SIFT was developed by David Lowe in 1999 and has been widely
adopted in various computer vision applications due to its robustness and invariance
properties.
We will use sift to get matching key points between two consecutive images and then
RANSAC algorithm to find the best fit.
Illustration 10 SIFT matches
3.3.3 Projective 3-point algorithm
Projective 3-point algorithm is a method to locate the camera orientation and translation
in world coordinates if we know the world coordinates of 3 points and their
corresponding image coordinates. Though 3 is the least number of points required, this
14
will give us 4 possibilities and if we have 4 points then we will get only one possible
orientation and translation of camera. So, we will be using 4 points.
Illustration 11 Projective 3-point algorithm geometry
The P3P method works only if the camera is calibrated internally, that is intrinsic matrix
is known. Let A, B, C be the 3 world points and O be the projection center of the camera
then I can generate a tetrahedron that is 4 triangles connecting these individual points.
This is a 2-step approach. The first step is to calculate the length of x, y, z which are
length of the projection rays, and the second step is computation of translation and
orientation.
In the first step we are basically exploiting the geometry of the triangles and using cosine
relations to relate different lengths of the triangle with each other. These relations
15
combined will result in a polynomial of degree 4. The coefficients of these polynomials
can be computed if we know 3 world points.
Since we have a polynomial of degree 4, the solution to this equation will yield 4 possible
solutions. If we have one more extra point, then we can verify which of these 4 possible
solutions is the correct one. We will get 4 solutions using 3 world coordinates but only
one solution using 4 world coordinates.
Illustration 12 Four solutions to 4-degree polynomial equation
In the second step we can get the translation and rotation of camera at time t+1 relative to
time t. The details of this step will be in the section where all algorithms are combined.
The P3P method is simple and uses the minimum number of points to give us translation
and rotation using 4 points.
16
3.3.4 Random Sample Consensus
It takes some random (not all) points and using the P3P algorithm, compute the relative
orientation and translation between camera at time t and t+1. After getting translation and
rotation using some points, it calculates the error between where the other points (that
were not included in calculation) are in reality and where the calculated translation and
rotation predict them to be.
After a few samples of points, it chooses the points that give the minimum error. Now
with SIFT and RANSAC we have robust way of getting key point matches between
images at time t and time t+1.
17
Illustration 13 RANSAC model fitting
3.3.5 Combined algorithm for visual odometry
This section is about how SIFT, P3P, RANSAC and Epipolar geometry are combined to
calculate the pose of the camera at time t+1 relative to time t.
1. We have already computed the depth map and intrinsic matrix (K).
2. Using SIFT algorithms we find matching key points in images of the camera at
time t and t+1.
3. Using depth map we compute the world coordinates of each pixel in the camera at
time t.
18
4. We use P3P algorithm that computes translation and rotation of the camera at
time t+1 relative to time t by using the image coordinates at time t+1 and world
coordinates that we got in step 2 using depth map at time t. The formula for this
step is as follows:
Image coordinates (t + 1) = Projection matrix × World coordinates matrix (t)
Where Projection matrix = Intrinsic matrix × Extrinsic matrix (t+1 relative to t)
Here Extrinsic matrix is the only unknown. We know intrinsic matrix, image
coordinates at t+1 with its SIFT matching key points in images coordinates at
time t corresponding to the world points at t.
Since we get hundreds of SIFT key point matches but only need 4 matching points to get
pose. We can use other points to make our algorithm accurate by reducing outliers and
using RANSAC to get the 4 points that will give us the best fit on our data.
3.4 Calculating standard deviation from ground truth
We can get standard deviation of error to be used in sensor fusion by comparing the data
that we get from sequence using this algorithm with the ground truth.
3.4.1 Position accuracy
19
Illustration 14 Position accuracy for stereo vision
Illustration 15 Position accuracy for stereo vision (True scale)
Standard deviation for position is 0.03867
3.4.2 Orientation accuracy
20
Illustration 16 Orientation accuracy for stereo vision
Standard deviation for orientation is 0.0427
21
Chapter 4: Inertial Measurement Unit
IMU can be used for odometry as well but usually they are less accurate than visual
odometry. IMU works principally by measuring linear and angular velocity. The
acceleration we get is then integrated to get both the position and orientation of a robot.
Errors in IMU are also integrated with time and it drifts with time. GPS on the other hand
never drifts in time but is less accurate than the IMU. Both can be fused to give more
robust and accurate translation and orientation. They are complementary to each other.
The Angular velocity and linear acceleration are converted to orientation and linear
acceleration in the following way:
Illustration 17 Robot position & orientation from linear acceleration and angular velocity
22
This equation is used to find change in position from acceleration. Because we have
discrete data, we take average of two acceleration and multiply it with delta t = 0.1 for 10
hertz to get almost instantaneous velocity which is again multiped by delta t=0.1 to get
change in position. Similarly, we can get orientation as well from discrete data.
4.1 Bias Error
Error bias in an Inertial Measurement Unit (IMU) refers to a systematic or constant offset
in the sensor's measurements. This bias can result in a consistent error in the IMU's
output, which can affect the accuracy of the IMU's measurements over time. It needs to
be calibrated or compensated for to ensure accurate orientation and position estimation.
Illustration 18 Bias error
23
We can correct for linear bias for both position and orientation by taking IMU readings at
rest with only gravitational acceleration acting on it and then taking the mean of the data.
We can find the linear bias offset and correct it.
4.2 Euler angle
The Euler angles are three angles introduced by Leonhard Euler to describe the
orientation of a rigid body with respect to a fixed coordinate system. They describe the
same amount of information as a rotational matrix.
Illustration 19 Pitch, Roll and Yaw Axes
24
We can convert the Rotational matrix into Euler angles using the algorithm:
4.3 Calculating standard deviation from ground truth
We can compare the position and orientation that we get using IMU with the ground truth
to get the standard deviation of how accurate IMU model is. This will be useful in the
step for sensor fusion.
4.3.1 Position accuracy
25
Illustration 20 Position accuracy for IMU
Illustration 21 Position accuracy for IMU (True scale)
4.3.2 Orientation accuracy:
26
Illustration 22 Orientation accuracy for IMU
Standard deviation for orientation is 0.012435
Chapter 5: Global Positioning System (GPS)
27
GPS stands for Global Positioning System, and it is a satellite-based navigation system
that allows users to determine their precise location and track movement almost
anywhere on Earth. The system works by using a network of satellites in orbit around the
Earth. These satellites continuously transmit signals that are received by GPS receivers,
such as those in smartphones or dedicated GPS devices.
GPS can be used to calculate robot position, but it is not very accurate. The error never
drifts with time and almost stays the same unlike IMU. GPS does not have long term
errors and only has short term errors. They can be fused to get a more accurate position of
the robot.
We get data from GPS sensor in ENU coordinates, and they can be converted into the
cartesian coordinates on the surface of earth using formula:
Given ENU coordinates (E, N, U) and the azimuth angle (θ) of the local frame:
L (Local East) = E × cos(θ) - N × sin(θ)
M (Local North) = E × sin(θ) + N × cos(θ)
N (Local Up) = U
28
Illustration 23 GPS coordinates & ENU coordinates relation
5.1 Standard deviation from manufacturer’s specificatation
We can get the standard deviation of GPS either by looking at its specifications or by
comparing it to the ground truth. This information tells us how accurate our GPS system
is and how much it can be trusted. This will be required if we want to do sensor fusion
with other sensors.
GPS accuracy given by our manufacturer is given in table below:
29
SPS 1.5m
SBAS 0.6m
DGPS 0.4m
PPP 0.1m
RTK 0.01m
Table 2 Types of GPS services and their respective accuracy
The following table shows how to get standard deviation from other units of error
calculation. This conversion table is also used to convert values expressed for one
percentile level, to another which will be used in sensor fusion.
Conversion RMS CEP DRMS R95 2DRMS R99.7
RMS 1 1.18 1.41 2.45 2.83 3.41
CEP 0.849 1 1.2 2.08 2.4 2.9
DRMS 0.707 0.833 1 1.73 2 2.41
R95 0.409 0.481 0.578 1 1.16 1.39
2DRMS 0.354 0.416 0.5 0.865 1 1.21
R99.7 0.293 0.345 0.415 0.718 0.830 1
Table 3 Circular error probable
For example, to convert manufacturer’s error which is given in the unit of R95 (95
percent chance that the reading is within specified radius) to RMS.
30
0.409 × R95 error (Given my manufacturer) = RMS error
5.2 Standard deviation of error from ground truth
Illustration 24 Position accuracy for GPS
Chapter 6: Sensor Fusion
31
Sensor fusion, also known as sensor data fusion, is the process of integrating data from
multiple sensors to obtain a more accurate, complete, and reliable understanding of a
physical or environmental phenomenon. This technology is commonly used in various
fields, including robotics, autonomous systems, navigation, and artificial intelligence.
The primary goals of sensor fusion are to improve data accuracy, reduce uncertainty,
enhance situational awareness, and make more informed decisions. Classical fusion
algorithms can be classised into following categories:
Illustration 25 Fusion algorithms
At the core of probabilistic robotics is the idea of estimating state from sensor data.
Probability theory and statistics are the building blocks of sensor fusion.
Each sensor conveys some information with different degree of certainty. By integrating
different sensors, we can get advantages of both while the disadvantages cancel
themselves out.
32
For example, in imu we can get very accurate estimation of short-term distance, but it is
bad for long term prediction of distance because it has some error in it and by integrating
that error, we become less sure of position as time passes. GPS gives us position to less
accuracy than IMU but does not suffer from the problem of error increasing as time
passes instead the error remains constant.
6.1 Bayes Rule
If probability of event A happening is P(A) and if the probability of event B happening is
P(B) then what is the probability that both happen P (A AND B) is given by:
Independent random variable: P (A AND B) = P(A) x P(B)
Dependent random variable : P (A AND B) = P(A|B) x P(B)
Also, P (A AND B) =P(B|A) x P(A)
There is a difference between the two. When the random variable is no longer
independent then we must ask what is the probability that A happens when B has
occurred x probability that B occurs. P (A |B) is called conditional probability.
In probability theory, conditional probability is a measure of the probability of an event
occurring, given that another event has already occurred.
Also notice that we can write it in two ways either P(A|B) x P(B) or P(B|A) x P(A)
because both are equal. This expression is used to formulate the bayes rule.
Bayes Rule:
33
P(A|B) × P(B) = P(B|A) × P(A)
P(A|B) = [P(B|A) × P(A)] / P(B)
Here A and B can be discrete or continuous probability distributions.
6.1.1 Bayesian Inference
Illustration 26 Bayesian inference
𝑃(𝐵 𝐺𝑖𝑣𝑒𝑛 𝐴) ⋅ 𝑃(𝐴)

𝑃(𝐴 𝐺𝑖𝑣𝑒𝑛 𝐵) =
𝑃(𝐵)
Prior: P(A) is our prior beliefs in our minds.
Likelihood: P(B|A) is the probability of observing evidence/data given our prior beliefs.
Posterior: P(A|B) is our new updated belief that is much closer to reality.
Evidence: P(B) is just normalizing our probability distribution back to 1.
34
6.1.2 Bayes filter
A Bayes filter is a powerful and widely used framework for estimating the state of a
dynamic system in the presence of noise and uncertainty. This filter is crucial in various
fields, including robotics, autonomous vehicles, finance, and signal processing.
The Bayes filter operates through a series of steps:
1. Prediction (Prior Update): In this step, the filter predicts the new state of the system
based on the previous state and a mathematical model that describes how the system
evolves over time. The result is a probability distribution representing the expected state,
often referred to as the "prior belief."
2. Measurement (Likelihood Update) When a new measurement becomes available, the
filter calculates the likelihood of the measurement given the predicted state. Bayes'
theorem is employed to update the prior belief with this measurement information,
resulting in a new probability distribution, the "posterior belief."
3. Estimation (Posterior Estimate): From the posterior belief, an estimate of the system's
state is computed. Common estimates include the mean or mode of the posterior
distribution, providing the best estimate of the current state.
4. Resampling (Optional): In some filter variants like the particle filter, resampling is
performed to ensure a representative set of particles approximating the posterior
35
distribution. This helps prevent particle depletion and maintains a more accurate state
estimate.
5. Iteration: The process repeats as new measurements arrive, using the posterior belief
from the previous step as the prior belief for the next prediction. This recursive approach
continually refines the state estimate as more data is acquired.
Bayes filters are versatile, with variants like the Kalman filter for linear systems and the
Particle filter for non-linear and non-Gaussian systems. They play a fundamental role in
enabling systems to make informed decisions and navigate uncertainty, making them
indispensable tools for state estimation in dynamic environments.
6.1.3 Bayes filter algorithm
Illustration 27 Bayes filter algorithm []
6.2 Particle filter
36
Particle filter is a derivation of bayes filter, and this is the sensor fusion method that we
will be using. A particle filter, also known as a sequential Monte Carlo filter, is a
recursive Bayesian filtering method used for estimating the state of a dynamic system.
Particle filters are particularly useful in situations where the state estimation process
involves nonlinear and non-Gaussian systems, where traditional filters like the Kalman
filter may not be applicable. Particle filters work by representing the probability
distribution of the state as a set of discrete, weighted samples called particles.
Illustration 28 Particle filter steps
Steps are as follows:
1. Initialization: At the beginning of the estimation process, a set of particles is generated
to represent the possible states of the system. These particles are drawn from the prior
state distribution (initial belief about the state), which may include information from
sensors or prior knowledge.
37
2. Prediction: In each time step, the particles are propagated forward in time based on the
system's dynamics. This accounts for how the system is expected to evolve. Each
particle's state is adjusted according to the system's motion model.
3. Measurement Update: After collecting sensor measurements, each particle is assigned
a weight that represents how well it agrees with the observed measurements. Particles
that are more consistent with the measurements receive higher weights, while those that
are less consistent receive lower weights.
4. Resampling: Particles are resampled with replacement from the existing set, and
particles with higher weights have a higher chance of being selected. This process
emphasizes the particles that are more likely to represent the true state.
5. State Estimation: The state estimate is calculated as a weighted average of the
resampled particles. This estimate represents the most probable state of the system given
the measurements and the motion model.
6.2.1 Particle filter algorithm
38
Illustration 29 Particle filter algorithm []
6.3 Building sensor fusion method
We will build two separate models. One for position and one for orientation of the robot.
We have cleaned and processed data. We have already calculated the standard deviations
of error for visual odometry, IMU and GPS measurements.
6.3.1 First Model for position using Particle filter
Following are the steps:
1. We predict the position of particles from IMU measurement. (Prediction step)
2. Each particle is assigned a weight that represents how well it agrees with the
observed measurements from visual odometry. (Update step)
39
3. Generate new sample set by resampling (with replacement) such that more
weighted particles have more probability to be drawn at random. (Resampling)
4. The combined vision and IMU position estimate is calculated as an average of the
resampled particles. (State estimation)
5. We predict the position of particles from vision and IMU position fused.
(Prediction step)
observed GPS measurements. (Update step)
8. The final position estimate is calculated as an average of the resampled particles.
(State estimation)
The following chart illustrates the flow of information:
Illustration 30 Information flow for position estimate
40
6.3.2 Second Model for orientation using Particle filter
Following are the steps:
1. We predict the orientation of particles from visual odometry measurement.
(Prediction step)
observed measurements from IMU orientation. (Update step)
4. The combined vision and IMU orientation estimate is calculated as an average of
the resampled particles. (State estimation)
The following chart illustrates the flow of information:
Illustration 31 Information flow for orientation estimate
In the next chapter we will test this model using KITTI dataset and standard deviations of
errors calculated in the last sections of chapter 3, chapter 4 and chapter 5.
41
Following are the tables for position and orientation standard deviation of error from
ground truth:
Position Standard deviation
Vision 0.03867
IMU 0.5018
GPS 0.0404
Table 4 Sensor types and their respective standard deviation of error for position estimate
Orientation Standard deviation
Vision 0.0427
IMU 0.012435
Table 5 Sensor types and their respective standard deviation of error for orientation estimate
42
Chapter 7: Experimental Results
7.1 Set up
We have used KITTI dataset for our calculations. The KITTI dataset, or the KITTI
Vision Benchmark Suite, is a widely used collection of datasets for various computer
vision and robotics tasks, primarily focused on autonomous driving and scene
understanding.
43
Illustration 32 Set up of sensors in KITTI dataset
It contains GPS/IMU unit and stereo grayscale cameras which will be used in our
calculations. The above image can help us to calibrate our sensor's location and
orientation with respect to each other.
1. We have synced the readings from all three sensors.
2. The images that we get are not raw, they are rectified.
3. The sensor readings are synced at 10 hertz, that is 10 readings per second.
7.2 Sensor’s Description
Following are the sensors used in KITTI dataset:
1. 2 x Grayscale cameras, 1.4 Megapixels: Point Grey Flea 2 (FL2-14S3M-C)
2. 1 x Inertial Navigation System (GPS/IMU): OXTS RT 3003
We load the readings that we get from our sequence into our sensor fusion model to get
experimental results.
7.3 Position Results
IMU and Vision Combined:
44
Illustration 33 Vision & IMU fused position readings
IMU, Vision & GPS combined:
Illustration 34 Vision, IMU and GPS fused position readings
45
IMU, Vision & GPS combined (Zoomed in)
Illustration 35 Vision, IMU and GPS fused position readings (Zoomed in)
The following table shows the Root mean squared (RMS) error for position:
Position RMS error
IMU 34.084
Vision 5.5174
IMU & Vision combined 0.1103
GPS 0.0703
IMU, Vision & GPS combined 0.0606
Table 6 Each sensor model and their respective RMS error for position estimate
7.4 Orientation Results
IMU and Vision combined
46
Illustration 36 Vision and IMU fused orientation readings
The following table shows the Root mean squared (RMS) error for orientation:
Orientation RMS error
IMU 0.0220
Vision 0.0169
IMU & Vision combined 0.0065
Table 7 Each sensor model and their respective RMS error for orientation estimate
Chapter 8: Conclusion
47
8.1 Position estimates
The first sensor fusion for position between IMU and stereo vision gives us a very good
estimate that is comparable to GPS but because both IMU and visual odometry are bad at
long-term accuracy, this model will start to lose accuracy with time.
The next sensor fusion model which combines all three sensors together gives the best
estimate and least root squared mean error as compared to each individual sensor. The
new path that we get is most of the time between GPS path and IMU-Stereo vison path.
This combination will not drift with time as GPS will make sure that there are no long-
term errors. This model is reliable with high accuracy and precision as compared to each
individual sensor alone.
8.2 Orientation estimates
Orientation calculated by this model is better as compared to orientation given by each
individual sensor alone. This model will drift with time because small errors will
integrate with time and there will be no long-term correction like GPS.
Just like in position estimate, the curve for orientation is most of the time in between the
curves for IMU and stereo vision. The fused model has the least RMS error.
48
References
1. “A Mathematical Theory of Communication” by Claude Shannon
2. “Visual odometry” by D. Nister, O. Naroditsky, and J. Bergen
3. “Obstacle Avoidance and Navigation in the Real World by a Seeing Robot
Rover” by H. P. Moravec
4. L. Matthies and S. A. Shafer, “Error Modeling in Stereo Vision,” IEEE J. Robot.
Autom., vol. 3, no. 3, pp. 239–248, 1987.
5. C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” in
Proceedings of Fourth Alvey Vision Conference, 1988, pp. 147–151.
6. Y. Cheng, M. W. Maimone, and L. H. Matthies, “Visual Odometry on the Mars
Exploration Rovers,” IEEE Robot. Autom. Mag., vol. 13, no. 2, pp. 54–62, 2006.
7. D. G. Lowe, “Distinctive image features from scale-invariant keypoints ”
International journal of computer vision 60 (2), 91-110, 2004.
8. M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for
Model Fitting with,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.
9. R. C. Luo and M. G. Kay, “A tutorial on multisensor integration and fusion” in
Proc. 16th Anuu. Conf. IEEE Ind. Electron., 1990, vol. 1, pp. 707–722.
10. R. C. Luo and M. G. Kay, “Multisensor fusion and integration in intelligent
systems,” IEEE Trans. Syst., Man, Cybern., vol. 19, no. 5, pp. 901–931, Sep./Oct.
1989.
11. R. C. Luo and M. G. Kay, Multisensor Integration and Fusion for Intelligent
Machines and Systems. Norwood, MA: Ablex Publishing, 1995.
49
12. R. E. Kalman ,”A new approach to linear filtering and prediction problems” in
ASME–Journal of Basic Engineering, 82 (Series D): 35-45 , 1960.
13. Gordon, N.J.; Salmond, D.J.; Smith, A.F.M. , "Novel approach to nonlinear/non-
Gaussian Bayesian state estimation". IEE Proceedings F - Radar and Signal
Processing. 140 (2): 107–113. April 1993.
14. Maimone, M.; Cheng, Y.; Matthies, L. "Two years of Visual Odometry on the
Mars Exploration Rovers" . Journal of Field Robotics. 24 (3): 169–186 , 2007.
15. “Information Theory, Inference and Learning Algorithms” by David J.C. McKay.
16. “Deep learning” by Aaron Courville, Ian Goodfellow, and Yoshua Bengio.
17. Probabilistic Robotics by Thrun, Sebastian, Burgard, Wolfram, Fox, Dieter.
18. KITTI dataset.
19. Abstract Algebra Hungerford.
20. Multisensor Fusion and Integration: Theories, Applications, and its Perspectives
Ren C. Luo, Fellow, IEEE, Chih Chia Chang, and Chun Chi Lai.
21. Sensor Fusion INS/GNSS based on Fuzzy Logic Weighted Kalman Filter by
Cunto & J. Z. Sasiadek.
22. SENSOR FUSION J.Z. Sasiadek.
23. Navigation with IMU/GPS/Digital Compass with Unscented Kalman Filter Pifu
Zhang, Jason Gu, Evangelos E. Milios, and Peter Huynh.
24. Wikipedia
50
51

CU Thesis Template

Uploaded by

Copyright:

Available Formats

You might also like

CU Thesis Template

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CU Thesis Template

Uploaded by

Copyright:

Available Formats

Odometry using sensor fusion of stereo vision, IMU & GPS

Muhammad Abdul Wasae

A thesis submitted to the Faculty of Graduate and Postdoctoral

Master of Applied Science

Karlsruhe Institute of Technology (KITTI) Vision Benchmark Suit.

I would like to acknowledge my friends & family. I am grateful to the science

communicators who motivated me to pursue math & engineering for life.

me how professional math & engineering is like.

I would like to acknowledge my supervisor Jurek Z. Sasiadek who guided me in this

Acknowledgements .......................................................................................................... iii

Table of Contents ............................................................................................................. iv

List of Illustrations ......................................................................................................... viii

(Page intentionally left blank) ............................................................................................ x

Chapter 1: Introduction .................................................................................................. 1

1.2 Thesis Summary .............................................................................................................. 2

Chapter 2: Relevant Work .............................................................................................. 3

Chapter 3: Visual Odometry........................................................................................... 6

3.2 Stereo Depth .................................................................................................................. 10

3.3 Visual odometry ............................................................................................................ 12

3.3.1 Epipolar geometry..................................................................................................... 13

3.3.2 Scale-Invariant Feature Transform (SIFT) ............................................................... 14

3.3.3 Projective 3-point algorithm ..................................................................................... 14

3.3.4 Random Sample Consensus ...................................................................................... 17

3.3.5 Combined algorithm for visual odometry ................................................................. 18

3.4 Calculating standard deviation from ground truth ........................................................ 19

3.4.1 Position accuracy ...................................................................................................... 19

3.4.2 Orientation accuracy ................................................................................................. 20

Chapter 4: Inertial Measurement Unit ........................................................................ 22

4.3 Calculating standard deviation from ground truth ........................................................ 25

4.3.1 Position accuracy ...................................................................................................... 25

4.3.2 Orientation accuracy: ................................................................................................ 26

Chapter 5: Global Positioning System (GPS) .............................................................. 27

5.2 Standard deviation of error from ground truth .............................................................. 31

Chapter 6: Sensor Fusion .............................................................................................. 31

6.1.1 Bayesian Inference .................................................................................................... 34

6.1.2 Bayes filter ................................................................................................................ 35

6.1.3 Bayes filter algorithm ............................................................................................... 36

6.2 Particle filter .................................................................................................................. 36

6.2.1 Particle filter algorithm ............................................................................................. 38

6.3 Building sensor fusion method...................................................................................... 39

6.3.1 First Model for position using Particle filter ............................................................ 39

6.3.2 Second Model for orientation using Particle filter.................................................... 41

Chapter 7: Experimental Results ................................................................................. 43

7.2 Sensor’s Description ..................................................................................................... 44

7.3 Position Results ............................................................................................................. 44

7.4 Orientation Results ........................................................................................................ 46

Chapter 8: Conclusion ................................................................................................... 47

8.2 Orientation estimates..................................................................................................... 48

Table 1 Parameters of camera intrinsic matrix .............................................................. 10

Table 2 Types of GPS services and their respective accuracy....................................... 30

Table 3 Circular error probable...................................................................................... 30

Illustration 1 Visual Odometry ......................................................................................... 7

Illustration 2 Relation between three reference frames .................................................... 8

Illustration 3 Transformation of world coordinates to image coordinates using camera

intrinsic and extrinsic matrix .............................................................................................. 8

Illustration 4 Camera calibration to find unknown parameters of intrinsic matrix using

corner detection algorithm .................................................................................................. 9

Illustration 5 Color Image of our dataset ........................................................................ 10