CU Thesis Template

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Odometry using sensor fusion of stereo vision, IMU & GPS

by

Muhammad Abdul Wasae

A thesis submitted to the Faculty of Graduate and Postdoctoral


Affairs in partial fulfillment of the requirements for the degree of

Master of Applied Science

in

Mechanical Engineering

Carleton University
Ottawa, Ontario

© 2023
Muhammad Abdul Wasae
Abstract

In this thesis, the author creates a sensor fusion model that combines information from

stereo vision, IMU & GPS to give odometry for autonomous vehicles. The proposed

model solves three main problems in the automobile industry: (1) It gives high precision

and accurate odometry for real time applications where the margin of error is low; (2)

Unlike most models, it does not use LiDAR which is expensive and unpractical; (3) The

model is robust & reliable which means if it loses information from one of the sensors

then it will still give reasonably good pose estimate. The system was tested on the

Karlsruhe Institute of Technology (KITTI) Vision Benchmark Suit.

ii
Acknowledgements

I would like to acknowledge my friends & family. I am grateful to the science

communicators who motivated me to pursue math & engineering for life.

I would like to thank the faculty of Carleton University for motivating me and showing

me how professional math & engineering is like.

I would like to acknowledge my supervisor Jurek Z. Sasiadek who guided me in this

thesis.

iii
Table of Contents

Abstract.............................................................................................................................. ii

Acknowledgements .......................................................................................................... iii

Table of Contents ............................................................................................................. iv

List of Illustrations ......................................................................................................... viii

(Page intentionally left blank) ............................................................................................ x

Chapter 1: Introduction .................................................................................................. 1


1.1 Thesis Motivation & Contribution .................................................................................. 1

1.2 Thesis Summary .............................................................................................................. 2

Chapter 2: Relevant Work .............................................................................................. 3

Chapter 3: Visual Odometry........................................................................................... 6


3.1 Camera Calibration ......................................................................................................... 7

3.2 Stereo Depth .................................................................................................................. 10

3.3 Visual odometry ............................................................................................................ 12

3.3.1 Epipolar geometry..................................................................................................... 13

3.3.2 Scale-Invariant Feature Transform (SIFT) ............................................................... 14

3.3.3 Projective 3-point algorithm ..................................................................................... 14

3.3.4 Random Sample Consensus ...................................................................................... 17

3.3.5 Combined algorithm for visual odometry ................................................................. 18

3.4 Calculating standard deviation from ground truth ........................................................ 19

3.4.1 Position accuracy ...................................................................................................... 19

3.4.2 Orientation accuracy ................................................................................................. 20

Chapter 4: Inertial Measurement Unit ........................................................................ 22


4.1 Bias Error ...................................................................................................................... 23

iv
4.2 Euler angle .................................................................................................................... 24

4.3 Calculating standard deviation from ground truth ........................................................ 25

4.3.1 Position accuracy ...................................................................................................... 25

4.3.2 Orientation accuracy: ................................................................................................ 26

Chapter 5: Global Positioning System (GPS) .............................................................. 27


5.1 Standard deviation from manufacturer’s specificatation .............................................. 29

5.2 Standard deviation of error from ground truth .............................................................. 31

Chapter 6: Sensor Fusion .............................................................................................. 31


6.1 Bayes Rule .................................................................................................................... 33

6.1.1 Bayesian Inference .................................................................................................... 34

6.1.2 Bayes filter ................................................................................................................ 35

6.1.3 Bayes filter algorithm ............................................................................................... 36

6.2 Particle filter .................................................................................................................. 36

6.2.1 Particle filter algorithm ............................................................................................. 38

6.3 Building sensor fusion method...................................................................................... 39

6.3.1 First Model for position using Particle filter ............................................................ 39

6.3.2 Second Model for orientation using Particle filter.................................................... 41

Chapter 7: Experimental Results ................................................................................. 43


7.1 Set up............................................................................................................................. 43

7.2 Sensor’s Description ..................................................................................................... 44

7.3 Position Results ............................................................................................................. 44

7.4 Orientation Results ........................................................................................................ 46

Chapter 8: Conclusion ................................................................................................... 47


8.1 Position estimates .......................................................................................................... 48

8.2 Orientation estimates..................................................................................................... 48

References ........................................................................................................................ 49

v
vi
List of Tables

Table 1 Parameters of camera intrinsic matrix .............................................................. 10

Table 2 Types of GPS services and their respective accuracy....................................... 30

Table 3 Circular error probable...................................................................................... 30

Table 4 Sensor types and their respective standard deviation of error for position

estimate ............................................................................................................................. 42

Table 5 Sensor types and their respective standard deviation of error for orientation

estimate ............................................................................................................................. 42

Table 6 Each sensor model and their respective RMS error for position estimate ........ 46

Table 7 Each sensor model and their respective RMS error for orientation estimate ... 47

vii
List of Illustrations

Illustration 1 Visual Odometry ......................................................................................... 7

Illustration 2 Relation between three reference frames .................................................... 8

Illustration 3 Transformation of world coordinates to image coordinates using camera

intrinsic and extrinsic matrix .............................................................................................. 8

Illustration 4 Camera calibration to find unknown parameters of intrinsic matrix using

corner detection algorithm .................................................................................................. 9

Illustration 5 Color Image of our dataset ........................................................................ 10

Illustration 6 Greyscale image of our dataset ................................................................. 11

Illustration 7 Disparity map calculation using StereoBM algorithm .............................. 11

Illustration 8 Disparity map calculation using StereoSGBM algorithm ......................... 12

Illustration 9 Epipolar geometry ..................................................................................... 13

Illustration 10 SIFT matches ........................................................................................... 14

Illustration 11 Projective 3-point algorithm geometry.................................................... 15

Illustration 12 Four solutions to 4-degree polynomial equation ..................................... 16

Illustration 13 RANSAC model fitting ........................................................................... 18

Illustration 14 Position accuracy for stereo vision .......................................................... 20

Illustration 15 Position accuracy for stereo vision (True scale) ..................................... 20

Illustration 16 Orientation accuracy for stereo vision..................................................... 21

Illustration 17 Robot position & orientation from linear acceleration and angular

velocity .............................................................................................................................. 22

Illustration 18 Bias error ................................................................................................. 23

viii
Illustration 19 Pitch, Roll and Yaw Axes ....................................................................... 24

Illustration 20 Position accuracy for IMU ...................................................................... 26

Illustration 21 Position accuracy for IMU (True scale) .................................................. 26

Illustration 22 Orientation accuracy for IMU ................................................................. 27

Illustration 23 GPS coordinates & ENU coordinates relation ........................................ 29

Illustration 24 Position accuracy for GPS ....................................................................... 31

Illustration 25 Fusion algorithms .................................................................................... 32

Illustration 26 Bayesian inference .................................................................................. 34

Illustration 27 Bayes filter algorithm [] .......................................................................... 36

Illustration 28 Particle filter steps ................................................................................... 37

Illustration 29 Particle filter algorithm [] ........................................................................ 39

Illustration 30 Information flow for position estimate .................................................... 40

Illustration 31 Information flow for orientation estimate ............................................... 41

Illustration 32 Set up of sensors in KITTI dataset .......................................................... 44

Illustration 33 Vision & IMU fused position readings ................................................... 45

Illustration 34 Vision, IMU and GPS fused position readings ........................................ 45

Illustration 35 Vision, IMU and GPS fused position readings (Zoomed in) .................. 46

Illustration 36 Vision and IMU fused orientation readings ............................................ 47

ix
(Page intentionally left blank)

x
Chapter 1: Introduction

Odometry is a method used in robotics and autonomous navigation to estimate the

position and orientation of a mobile robot by analyzing data from its own sensors. The

term "odometry" is derived from the Greek words "odos" (meaning path) and "metron"

(meaning measure), and it essentially involves measuring the robot's path or motion.

There are many ways to do this task. We can use wheel encoders sensors, Inertial

measurement unit, Lidar, anything that can give us any information about robot’s location

and orientation. Every sensor is unique with its advantages and disadvantages. Sensor

fusion is the method used to combine sensors to get a more accurate and reliable estimate

that is better than each individual sensor alone.

1.1 Thesis Motivation & Contribution

Precision and Accuracy:

Precision and Accuracy of sensor is essential in advanced robotics. For example, we have

robots that are now used for operations in hospitals which have to balance force and

displacement which require high precision and accuracy. Autonomous cars need precise

and accurate odometry sensing when they are moving fast, and the margin of error is very

low. In future, machine learning & robotics will enable the technology to have robots at

home which will do tasks for us which will also require a high precision and accurate

1
odometry. Our method improves both precision and accuracy of odometry that is better

than each individual sensor alone.

No Lidar:

Autonomous automobile and robotics companies use Lidar sensor to try to perceive the

environment better, but the additional cost of Lidar sensor is a big issue. Lidar heavy

weight and big size are also problems in both cars and humanoid robots. We have not

used Lidar in our sensor fusion method like most people and made a more practical

method to be adopted by the industry.

Reliability:

Our method is reliable because it does not rely on only one sensor to get odometry. For

example, if a car or a robot is in a place where it does not receive GPS signals like

underground tunnels then it would still predict very accurate and precise odometry from

IMU and vision. And if due to bad weather, we have poor visibility then still our model

will be able to navigate using information from IMU and GPS.

1.2 Thesis Summary

We have used one of the sensor fusion methods called particle filter which is derived

from bayes filter to combine information that we get from vision, IMU and GPS to get

high precision, robust and reliable odometry that does not drift with time.

This requires that we first calculate the pose for each sensor individually and find the

standard deviation of the error by comparing it to the ground truth. Since the error that we

2
get may not be gaussian and linear so we cannot use Kalman filter. We used a derivation

of bayes filter called particle filter which requires less computation and is a very good

candidate for real time robotics application like autonomous cars or robots.

We used KITTI dataset to evaluate our model and then concluded the thesis with our

findings.

Chapter 2: Relevant Work

3
In 1948, Claude Shannon [1] introduced a mathematical way to quantify information and

defined it as a measure of uncertainty or surprise. He developed a unit called "bit" (short

for binary digit) to represent the smallest amount of information. This allowed for the

precise measurement of information content.

Claude Shannon also showed that logic gates can perform computation in his master

thesis at MIT. In 1936, Alan Turing introduced the concept of the Turing machine, a

theoretical model of a simple computational device that can simulate the logic of any

computer algorithm. Both mathematicians started a revolution in information processing.

This gave birth to new fields like computer vision and soon visual odometry which uses

computer vision to find pose (position and orientation) of the robot. The term “Visual

Odometry” was first mentioned by D. Nister, O. Naroditsky, and J. Bergen [2].

In 1980s, Moravec [3] studied the problem of ego-motion estimation of a vehicle by

observing a sequence of images. Moravec also developed corner detection method that he

published in his book "Sensor Fusion in Certainty Grids for Mobile Robots," in 1988.

In 1987, Matthies and Shaffer [4] recovered Moravec’s work and extended it by deriving

the motion error model using 3D Gaussian distributions instead of scalar model.

In 1988, Chris Harris and Mike Stephens [5] developed corner detection operator that

was improvement upon the Moravec’s corner detection method.

4
Most of the early research for visual odometry was done by NASA for Mars exploration

rover by Y. Cheng, M. W. Maimone, and L. H. Matthies [6].

In 2004, David Lowe [7] introduced a method to find image features from scale invariant

key points which proved to be a very powerful tool for visual odometry that is both

robust and accurate. SIFT (scale invariant feature transform) is often used in combination

with RANSAC (Random Sample Consensus) developed by M. A. Fischler and R. C.

Bolles [8].

Visual odometry can be combined with IMU and GPS using multi sensor fusion.

Different definitions of the architectures of multi sensor fusions have been proposed in

literature. Luo and Kay [9]–[11]. One type of architecture is built upon Bayesian

inference.

Most popular of Bayesian inference methods is Kalman filter developed by R. E. Kalman

[12] in 1960. The Kalman filter was used in guidance and navigation of Apollo

spacecraft. Kalman filter limitation is that it can only work for linear models with

gaussian noise. In 1993, Gordon, Salmond and Smith [13] developed particle filter which

can work for nonlinear models and non - gaussian noise.

5
Chapter 3: Visual Odometry

Odometry means to find the pose (position & orientation) of a robot. Visual Odometry is

the method where we use vision to determine the pose of the robot. It has been used in a

wide variety of robotic applications, such as on the Mars Exploration Rovers [14].

6
Illustration 1 Visual Odometry

The method of getting the pose of the robot using vision can be broken into three main

parts:

1. Camera Calibration

2. Stereo Depth

3. Visual Odometry

3.1 Camera Calibration

We have three reference frames in our system. The first reference frame is the Image

coordinate frame which gives the image coordinates of the pixel that correspond to 3-

dimensional point. The second reference frame is the World coordinate frame which

gives us coordinates for 3-dimensional point from world’s perspective and the third

reference frame is Camera coordinate frame which give us coordinates for 3-dimensional

point w.r.t camera’s center.

7
.

Illustration 2 Relation between three reference frames

Illustration 3 Transformation of world coordinates to image coordinates using camera intrinsic and

extrinsic matrix

Where:

(X, Y, Z) are the world coordinates of the point.

(u, v) are the image coordinates of the point.

(cx, cy) are the principal point coordinates (the optical center).

fx and fy are the focal lengths of the camera.

8
s is the scaling factor.

We have four unknowns (fx, fy, cx, cy) that we need to calculate so we can transform

image coordinates into world coordinates.

To find the 4 unknowns, we take an image of chessboard whose world coordinates w.r.t

to one edge can easily be calculated as a chessboard have grid like structure. We now

know the world coordinates of some points and their respective image coordinates.

Illustration 4 Camera calibration to find unknown parameters of intrinsic matrix using corner

detection algorithm

We know the World coordinates and image coordinates so we can calculate the

projection matrix (Intrinsic matrix multiplied by Extrinsic matrix).

Since extrinsic matrix is just the translation in horizontal direction of known length and

no change in orientation, we know extrinsic matrix as well from the baseline of stereo

images. We can now calculate the intrinsic matrix from extrinsic matrix and projection

matrix.

9
The values of 4 unknown parameters of camera intrinsic matrix are shown below:

fx 718.856

fy 718.856

cx 607

cy 185

Table 1 Parameters of camera intrinsic matrix

3.2 Stereo Depth

To find world coordinates from image coordinates, we will have to first create a depth

map that assigns each pixel in an image a depth value.

To create a depth map we will need a disparity map between the stereo images.

We start with left and right images to compute the disparity between them.

Suppose we have an image like this.

Illustration 5 Color Image of our dataset

10
This is an RGB (color) image and it contains 3 times more information than a grayscale

image but in our algorithm for stereo vision color is not useful, instead color images

make things more complicated so we will use a grayscale image for all our calculations.

Illustration 6 Greyscale image of our dataset

Disparity is the difference of pixels (horizontal in our case) that correspond to the same

world point in stereo images. Disparity map can be calculated by using algorithms and

then be used to get depth.

Illustration 7 Disparity map calculation using StereoBM algorithm

11
We notice that StereoBM algorithm gives holes in the information and is not continuous.

Another algorithm that calculates the disparity map better and quicker is with no holes is

StereoSGBM algorithm.

Illustration 8 Disparity map calculation using StereoSGBM algorithm

If we know the disparity, baseline (given) and focal length (given) then we can get the

depth map from disparity map using the formula:

Disparity = (Baseline × Focal Length) / Depth

Where:

Baseline is the distance between the two stereo cameras (camera separation).

Focal Length is the focal length of the camera lens.

3.3 Visual odometry

12
This is the part where we use internal camera matrix and depth map to get pose of the

robot using Epipolar geometry, Swift algorithm, RANSAC algorithm and P3P algorithm.

3.3.1 Epipolar geometry

Illustration 9 Epipolar geometry

Here Points O1, O2 are camera center and Xw is the world point. Our goal is to find the

relative position and orientation of the camera at time t and at next time step t+1. Camera

center at t is O1 and camera center at time t+1 is O2. This relative orientation and

translation between two camera centers can be described by epipolar geometry. Here we

are trying to compute the R and t in the image. The points O1, O2 and Xw make up a

plane called Epipolar plane.

13
3.3.2 Scale-Invariant Feature Transform (SIFT)

SIFT is a computer vision algorithm used for keypoint detection, description, and

matching in images. SIFT was developed by David Lowe in 1999 and has been widely

adopted in various computer vision applications due to its robustness and invariance

properties.

We will use sift to get matching key points between two consecutive images and then

RANSAC algorithm to find the best fit.

Illustration 10 SIFT matches

3.3.3 Projective 3-point algorithm

Projective 3-point algorithm is a method to locate the camera orientation and translation

in world coordinates if we know the world coordinates of 3 points and their

corresponding image coordinates. Though 3 is the least number of points required, this

14
will give us 4 possibilities and if we have 4 points then we will get only one possible

orientation and translation of camera. So, we will be using 4 points.

Illustration 11 Projective 3-point algorithm geometry

The P3P method works only if the camera is calibrated internally, that is intrinsic matrix

is known. Let A, B, C be the 3 world points and O be the projection center of the camera

then I can generate a tetrahedron that is 4 triangles connecting these individual points.

This is a 2-step approach. The first step is to calculate the length of x, y, z which are

length of the projection rays, and the second step is computation of translation and

orientation.

In the first step we are basically exploiting the geometry of the triangles and using cosine

relations to relate different lengths of the triangle with each other. These relations

15
combined will result in a polynomial of degree 4. The coefficients of these polynomials

can be computed if we know 3 world points.

Since we have a polynomial of degree 4, the solution to this equation will yield 4 possible

solutions. If we have one more extra point, then we can verify which of these 4 possible

solutions is the correct one. We will get 4 solutions using 3 world coordinates but only

one solution using 4 world coordinates.

Illustration 12 Four solutions to 4-degree polynomial equation

In the second step we can get the translation and rotation of camera at time t+1 relative to

time t. The details of this step will be in the section where all algorithms are combined.

The P3P method is simple and uses the minimum number of points to give us translation

and rotation using 4 points.

16
3.3.4 Random Sample Consensus

It takes some random (not all) points and using the P3P algorithm, compute the relative

orientation and translation between camera at time t and t+1. After getting translation and

rotation using some points, it calculates the error between where the other points (that

were not included in calculation) are in reality and where the calculated translation and

rotation predict them to be.

After a few samples of points, it chooses the points that give the minimum error. Now

with SIFT and RANSAC we have robust way of getting key point matches between

images at time t and time t+1.

17
Illustration 13 RANSAC model fitting

3.3.5 Combined algorithm for visual odometry

This section is about how SIFT, P3P, RANSAC and Epipolar geometry are combined to

calculate the pose of the camera at time t+1 relative to time t.

1. We have already computed the depth map and intrinsic matrix (K).

2. Using SIFT algorithms we find matching key points in images of the camera at

time t and t+1.

3. Using depth map we compute the world coordinates of each pixel in the camera at

time t.

18
4. We use P3P algorithm that computes translation and rotation of the camera at

time t+1 relative to time t by using the image coordinates at time t+1 and world

coordinates that we got in step 2 using depth map at time t. The formula for this

step is as follows:

Image coordinates (t + 1) = Projection matrix × World coordinates matrix (t)

Where Projection matrix = Intrinsic matrix × Extrinsic matrix (t+1 relative to t)

Here Extrinsic matrix is the only unknown. We know intrinsic matrix, image

coordinates at t+1 with its SIFT matching key points in images coordinates at

time t corresponding to the world points at t.

Since we get hundreds of SIFT key point matches but only need 4 matching points to get

pose. We can use other points to make our algorithm accurate by reducing outliers and

using RANSAC to get the 4 points that will give us the best fit on our data.

3.4 Calculating standard deviation from ground truth

We can get standard deviation of error to be used in sensor fusion by comparing the data

that we get from sequence using this algorithm with the ground truth.

3.4.1 Position accuracy

19
Illustration 14 Position accuracy for stereo vision

Illustration 15 Position accuracy for stereo vision (True scale)

Standard deviation for position is 0.03867

3.4.2 Orientation accuracy

20
Illustration 16 Orientation accuracy for stereo vision

Standard deviation for orientation is 0.0427

21
Chapter 4: Inertial Measurement Unit

IMU can be used for odometry as well but usually they are less accurate than visual

odometry. IMU works principally by measuring linear and angular velocity. The

acceleration we get is then integrated to get both the position and orientation of a robot.

Errors in IMU are also integrated with time and it drifts with time. GPS on the other hand

never drifts in time but is less accurate than the IMU. Both can be fused to give more

robust and accurate translation and orientation. They are complementary to each other.

The Angular velocity and linear acceleration are converted to orientation and linear

acceleration in the following way:

Illustration 17 Robot position & orientation from linear acceleration and angular velocity

22
This equation is used to find change in position from acceleration. Because we have

discrete data, we take average of two acceleration and multiply it with delta t = 0.1 for 10

hertz to get almost instantaneous velocity which is again multiped by delta t=0.1 to get

change in position. Similarly, we can get orientation as well from discrete data.

4.1 Bias Error

Error bias in an Inertial Measurement Unit (IMU) refers to a systematic or constant offset

in the sensor's measurements. This bias can result in a consistent error in the IMU's

output, which can affect the accuracy of the IMU's measurements over time. It needs to

be calibrated or compensated for to ensure accurate orientation and position estimation.

Illustration 18 Bias error

23
We can correct for linear bias for both position and orientation by taking IMU readings at

rest with only gravitational acceleration acting on it and then taking the mean of the data.

We can find the linear bias offset and correct it.

4.2 Euler angle

The Euler angles are three angles introduced by Leonhard Euler to describe the

orientation of a rigid body with respect to a fixed coordinate system. They describe the

same amount of information as a rotational matrix.

Illustration 19 Pitch, Roll and Yaw Axes

24
We can convert the Rotational matrix into Euler angles using the algorithm:

4.3 Calculating standard deviation from ground truth

We can compare the position and orientation that we get using IMU with the ground truth

to get the standard deviation of how accurate IMU model is. This will be useful in the

step for sensor fusion.

4.3.1 Position accuracy

25
Illustration 20 Position accuracy for IMU

Illustration 21 Position accuracy for IMU (True scale)

Standard deviation for position is 0.5018

4.3.2 Orientation accuracy:

26
Illustration 22 Orientation accuracy for IMU

Standard deviation for orientation is 0.012435

Chapter 5: Global Positioning System (GPS)

27
GPS stands for Global Positioning System, and it is a satellite-based navigation system

that allows users to determine their precise location and track movement almost

anywhere on Earth. The system works by using a network of satellites in orbit around the

Earth. These satellites continuously transmit signals that are received by GPS receivers,

such as those in smartphones or dedicated GPS devices.

GPS can be used to calculate robot position, but it is not very accurate. The error never

drifts with time and almost stays the same unlike IMU. GPS does not have long term

errors and only has short term errors. They can be fused to get a more accurate position of

the robot.

We get data from GPS sensor in ENU coordinates, and they can be converted into the

cartesian coordinates on the surface of earth using formula:

Given ENU coordinates (E, N, U) and the azimuth angle (θ) of the local frame:

L (Local East) = E × cos(θ) - N × sin(θ)

M (Local North) = E × sin(θ) + N × cos(θ)

N (Local Up) = U

28
Illustration 23 GPS coordinates & ENU coordinates relation

5.1 Standard deviation from manufacturer’s specificatation

We can get the standard deviation of GPS either by looking at its specifications or by

comparing it to the ground truth. This information tells us how accurate our GPS system

is and how much it can be trusted. This will be required if we want to do sensor fusion

with other sensors.

GPS accuracy given by our manufacturer is given in table below:

29
SPS 1.5m

SBAS 0.6m

DGPS 0.4m

PPP 0.1m

RTK 0.01m

Table 2 Types of GPS services and their respective accuracy

The following table shows how to get standard deviation from other units of error

calculation. This conversion table is also used to convert values expressed for one

percentile level, to another which will be used in sensor fusion.

Conversion RMS CEP DRMS R95 2DRMS R99.7

RMS 1 1.18 1.41 2.45 2.83 3.41

CEP 0.849 1 1.2 2.08 2.4 2.9

DRMS 0.707 0.833 1 1.73 2 2.41

R95 0.409 0.481 0.578 1 1.16 1.39

2DRMS 0.354 0.416 0.5 0.865 1 1.21

R99.7 0.293 0.345 0.415 0.718 0.830 1

Table 3 Circular error probable

For example, to convert manufacturer’s error which is given in the unit of R95 (95

percent chance that the reading is within specified radius) to RMS.

30
0.409 × R95 error (Given my manufacturer) = RMS error

5.2 Standard deviation of error from ground truth

Illustration 24 Position accuracy for GPS

Standard deviation for position is 0.0404

Chapter 6: Sensor Fusion

31
Sensor fusion, also known as sensor data fusion, is the process of integrating data from

multiple sensors to obtain a more accurate, complete, and reliable understanding of a

physical or environmental phenomenon. This technology is commonly used in various

fields, including robotics, autonomous systems, navigation, and artificial intelligence.

The primary goals of sensor fusion are to improve data accuracy, reduce uncertainty,

enhance situational awareness, and make more informed decisions. Classical fusion

algorithms can be classised into following categories:

Illustration 25 Fusion algorithms

At the core of probabilistic robotics is the idea of estimating state from sensor data.

Probability theory and statistics are the building blocks of sensor fusion.

Each sensor conveys some information with different degree of certainty. By integrating

different sensors, we can get advantages of both while the disadvantages cancel

themselves out.

32
For example, in imu we can get very accurate estimation of short-term distance, but it is

bad for long term prediction of distance because it has some error in it and by integrating

that error, we become less sure of position as time passes. GPS gives us position to less

accuracy than IMU but does not suffer from the problem of error increasing as time

passes instead the error remains constant.

6.1 Bayes Rule

If probability of event A happening is P(A) and if the probability of event B happening is

P(B) then what is the probability that both happen P (A AND B) is given by:

Independent random variable: P (A AND B) = P(A) x P(B)

Dependent random variable : P (A AND B) = P(A|B) x P(B)

Also, P (A AND B) =P(B|A) x P(A)

There is a difference between the two. When the random variable is no longer

independent then we must ask what is the probability that A happens when B has

occurred x probability that B occurs. P (A |B) is called conditional probability.

In probability theory, conditional probability is a measure of the probability of an event

occurring, given that another event has already occurred.

Also notice that we can write it in two ways either P(A|B) x P(B) or P(B|A) x P(A)

because both are equal. This expression is used to formulate the bayes rule.

Bayes Rule:

33
P(A|B) × P(B) = P(B|A) × P(A)

P(A|B) = [P(B|A) × P(A)] / P(B)

Here A and B can be discrete or continuous probability distributions.

6.1.1 Bayesian Inference

Illustration 26 Bayesian inference

𝑃(𝐵 𝐺𝑖𝑣𝑒𝑛 𝐴) ⋅ 𝑃(𝐴)


𝑃(𝐴 𝐺𝑖𝑣𝑒𝑛 𝐵) =
𝑃(𝐵)

Prior: P(A) is our prior beliefs in our minds.

Likelihood: P(B|A) is the probability of observing evidence/data given our prior beliefs.

Posterior: P(A|B) is our new updated belief that is much closer to reality.

Evidence: P(B) is just normalizing our probability distribution back to 1.

34
6.1.2 Bayes filter

A Bayes filter is a powerful and widely used framework for estimating the state of a

dynamic system in the presence of noise and uncertainty. This filter is crucial in various

fields, including robotics, autonomous vehicles, finance, and signal processing.

The Bayes filter operates through a series of steps:

1. Prediction (Prior Update): In this step, the filter predicts the new state of the system

based on the previous state and a mathematical model that describes how the system

evolves over time. The result is a probability distribution representing the expected state,

often referred to as the "prior belief."

2. Measurement (Likelihood Update) When a new measurement becomes available, the

filter calculates the likelihood of the measurement given the predicted state. Bayes'

theorem is employed to update the prior belief with this measurement information,

resulting in a new probability distribution, the "posterior belief."

3. Estimation (Posterior Estimate): From the posterior belief, an estimate of the system's

state is computed. Common estimates include the mean or mode of the posterior

distribution, providing the best estimate of the current state.

4. Resampling (Optional): In some filter variants like the particle filter, resampling is

performed to ensure a representative set of particles approximating the posterior

35
distribution. This helps prevent particle depletion and maintains a more accurate state

estimate.

5. Iteration: The process repeats as new measurements arrive, using the posterior belief

from the previous step as the prior belief for the next prediction. This recursive approach

continually refines the state estimate as more data is acquired.

Bayes filters are versatile, with variants like the Kalman filter for linear systems and the

Particle filter for non-linear and non-Gaussian systems. They play a fundamental role in

enabling systems to make informed decisions and navigate uncertainty, making them

indispensable tools for state estimation in dynamic environments.

6.1.3 Bayes filter algorithm

Illustration 27 Bayes filter algorithm []

6.2 Particle filter

36
Particle filter is a derivation of bayes filter, and this is the sensor fusion method that we

will be using. A particle filter, also known as a sequential Monte Carlo filter, is a

recursive Bayesian filtering method used for estimating the state of a dynamic system.

Particle filters are particularly useful in situations where the state estimation process

involves nonlinear and non-Gaussian systems, where traditional filters like the Kalman

filter may not be applicable. Particle filters work by representing the probability

distribution of the state as a set of discrete, weighted samples called particles.

Illustration 28 Particle filter steps

Steps are as follows:

1. Initialization: At the beginning of the estimation process, a set of particles is generated

to represent the possible states of the system. These particles are drawn from the prior

state distribution (initial belief about the state), which may include information from

sensors or prior knowledge.

37
2. Prediction: In each time step, the particles are propagated forward in time based on the

system's dynamics. This accounts for how the system is expected to evolve. Each

particle's state is adjusted according to the system's motion model.

3. Measurement Update: After collecting sensor measurements, each particle is assigned

a weight that represents how well it agrees with the observed measurements. Particles

that are more consistent with the measurements receive higher weights, while those that

are less consistent receive lower weights.

4. Resampling: Particles are resampled with replacement from the existing set, and

particles with higher weights have a higher chance of being selected. This process

emphasizes the particles that are more likely to represent the true state.

5. State Estimation: The state estimate is calculated as a weighted average of the

resampled particles. This estimate represents the most probable state of the system given

the measurements and the motion model.

6.2.1 Particle filter algorithm

38
Illustration 29 Particle filter algorithm []

6.3 Building sensor fusion method

We will build two separate models. One for position and one for orientation of the robot.

We have cleaned and processed data. We have already calculated the standard deviations

of error for visual odometry, IMU and GPS measurements.

6.3.1 First Model for position using Particle filter

Following are the steps:

1. We predict the position of particles from IMU measurement. (Prediction step)

2. Each particle is assigned a weight that represents how well it agrees with the

observed measurements from visual odometry. (Update step)

39
3. Generate new sample set by resampling (with replacement) such that more

weighted particles have more probability to be drawn at random. (Resampling)

4. The combined vision and IMU position estimate is calculated as an average of the

resampled particles. (State estimation)

5. We predict the position of particles from vision and IMU position fused.

(Prediction step)

6. Each particle is assigned a weight that represents how well it agrees with the

observed GPS measurements. (Update step)

7. Generate new sample set by resampling (with replacement) such that more

weighted particles have more probability to be drawn at random. (Resampling)

8. The final position estimate is calculated as an average of the resampled particles.

(State estimation)

The following chart illustrates the flow of information:

Illustration 30 Information flow for position estimate

40
6.3.2 Second Model for orientation using Particle filter

Following are the steps:

1. We predict the orientation of particles from visual odometry measurement.

(Prediction step)

2. Each particle is assigned a weight that represents how well it agrees with the

observed measurements from IMU orientation. (Update step)

3. Generate new sample set by resampling (with replacement) such that more

weighted particles have more probability to be drawn at random. (Resampling)

4. The combined vision and IMU orientation estimate is calculated as an average of

the resampled particles. (State estimation)

The following chart illustrates the flow of information:

Illustration 31 Information flow for orientation estimate

In the next chapter we will test this model using KITTI dataset and standard deviations of

errors calculated in the last sections of chapter 3, chapter 4 and chapter 5.

41
Following are the tables for position and orientation standard deviation of error from

ground truth:

Position Standard deviation

Vision 0.03867

IMU 0.5018

GPS 0.0404

Table 4 Sensor types and their respective standard deviation of error for position estimate

Orientation Standard deviation

Vision 0.0427

IMU 0.012435

Table 5 Sensor types and their respective standard deviation of error for orientation estimate

42
Chapter 7: Experimental Results

7.1 Set up

We have used KITTI dataset for our calculations. The KITTI dataset, or the KITTI

Vision Benchmark Suite, is a widely used collection of datasets for various computer

vision and robotics tasks, primarily focused on autonomous driving and scene

understanding.

43
Illustration 32 Set up of sensors in KITTI dataset

It contains GPS/IMU unit and stereo grayscale cameras which will be used in our

calculations. The above image can help us to calibrate our sensor's location and

orientation with respect to each other.

1. We have synced the readings from all three sensors.

2. The images that we get are not raw, they are rectified.

3. The sensor readings are synced at 10 hertz, that is 10 readings per second.

7.2 Sensor’s Description

Following are the sensors used in KITTI dataset:

1. 2 x Grayscale cameras, 1.4 Megapixels: Point Grey Flea 2 (FL2-14S3M-C)

2. 1 x Inertial Navigation System (GPS/IMU): OXTS RT 3003

We load the readings that we get from our sequence into our sensor fusion model to get

experimental results.

7.3 Position Results

IMU and Vision Combined:

44
Illustration 33 Vision & IMU fused position readings

IMU, Vision & GPS combined:

Illustration 34 Vision, IMU and GPS fused position readings

45
IMU, Vision & GPS combined (Zoomed in)

Illustration 35 Vision, IMU and GPS fused position readings (Zoomed in)

The following table shows the Root mean squared (RMS) error for position:

Position RMS error

IMU 34.084

Vision 5.5174

IMU & Vision combined 0.1103

GPS 0.0703

IMU, Vision & GPS combined 0.0606

Table 6 Each sensor model and their respective RMS error for position estimate

7.4 Orientation Results

IMU and Vision combined

46
Illustration 36 Vision and IMU fused orientation readings

The following table shows the Root mean squared (RMS) error for orientation:

Orientation RMS error

IMU 0.0220

Vision 0.0169

IMU & Vision combined 0.0065

Table 7 Each sensor model and their respective RMS error for orientation estimate

Chapter 8: Conclusion

47
8.1 Position estimates

The first sensor fusion for position between IMU and stereo vision gives us a very good

estimate that is comparable to GPS but because both IMU and visual odometry are bad at

long-term accuracy, this model will start to lose accuracy with time.

The next sensor fusion model which combines all three sensors together gives the best

estimate and least root squared mean error as compared to each individual sensor. The

new path that we get is most of the time between GPS path and IMU-Stereo vison path.

This combination will not drift with time as GPS will make sure that there are no long-

term errors. This model is reliable with high accuracy and precision as compared to each

individual sensor alone.

8.2 Orientation estimates

Orientation calculated by this model is better as compared to orientation given by each

individual sensor alone. This model will drift with time because small errors will

integrate with time and there will be no long-term correction like GPS.

Just like in position estimate, the curve for orientation is most of the time in between the

curves for IMU and stereo vision. The fused model has the least RMS error.

48
References

1. “A Mathematical Theory of Communication” by Claude Shannon

2. “Visual odometry” by D. Nister, O. Naroditsky, and J. Bergen

3. “Obstacle Avoidance and Navigation in the Real World by a Seeing Robot

Rover” by H. P. Moravec

4. L. Matthies and S. A. Shafer, “Error Modeling in Stereo Vision,” IEEE J. Robot.

Autom., vol. 3, no. 3, pp. 239–248, 1987.

5. C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” in

Proceedings of Fourth Alvey Vision Conference, 1988, pp. 147–151.

6. Y. Cheng, M. W. Maimone, and L. H. Matthies, “Visual Odometry on the Mars

Exploration Rovers,” IEEE Robot. Autom. Mag., vol. 13, no. 2, pp. 54–62, 2006.

7. D. G. Lowe, “Distinctive image features from scale-invariant keypoints ”

International journal of computer vision 60 (2), 91-110, 2004.

8. M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for

Model Fitting with,” Commun. ACM, vol. 24, no. 6, pp. 381–395, 1981.

9. R. C. Luo and M. G. Kay, “A tutorial on multisensor integration and fusion” in

Proc. 16th Anuu. Conf. IEEE Ind. Electron., 1990, vol. 1, pp. 707–722.

10. R. C. Luo and M. G. Kay, “Multisensor fusion and integration in intelligent

systems,” IEEE Trans. Syst., Man, Cybern., vol. 19, no. 5, pp. 901–931, Sep./Oct.

1989.

11. R. C. Luo and M. G. Kay, Multisensor Integration and Fusion for Intelligent

Machines and Systems. Norwood, MA: Ablex Publishing, 1995.

49
12. R. E. Kalman ,”A new approach to linear filtering and prediction problems” in

ASME–Journal of Basic Engineering, 82 (Series D): 35-45 , 1960.

13. Gordon, N.J.; Salmond, D.J.; Smith, A.F.M. , "Novel approach to nonlinear/non-

Gaussian Bayesian state estimation". IEE Proceedings F - Radar and Signal

Processing. 140 (2): 107–113. April 1993.

14. Maimone, M.; Cheng, Y.; Matthies, L. "Two years of Visual Odometry on the

Mars Exploration Rovers" . Journal of Field Robotics. 24 (3): 169–186 , 2007.

15. “Information Theory, Inference and Learning Algorithms” by David J.C. McKay.

16. “Deep learning” by Aaron Courville, Ian Goodfellow, and Yoshua Bengio.

17. Probabilistic Robotics by Thrun, Sebastian, Burgard, Wolfram, Fox, Dieter.

18. KITTI dataset.

19. Abstract Algebra Hungerford.

20. Multisensor Fusion and Integration: Theories, Applications, and its Perspectives

Ren C. Luo, Fellow, IEEE, Chih Chia Chang, and Chun Chi Lai.

21. Sensor Fusion INS/GNSS based on Fuzzy Logic Weighted Kalman Filter by

Cunto & J. Z. Sasiadek.

22. SENSOR FUSION J.Z. Sasiadek.

23. Navigation with IMU/GPS/Digital Compass with Unscented Kalman Filter Pifu

Zhang, Jason Gu, Evangelos E. Milios, and Peter Huynh.

24. Wikipedia

50
51

You might also like