M3 Kalman Filter Equations

Kalman Filters and Sensor Fusion
Girish Chowdhary
Associate Professor,
Director Distributed Autonomous Systems Lab
University of Illinois at Urbana Champaign,
www.daslab.illinois.edu
March 16, 2021
Chowdhary www.daslab.illinois.edu March 16, 2021 1 / 105

Outline
Recall the basics of probability theory

Understand, PDF, CDF, Mean, and Variance
Recall the Gaussian probability distribution and the Gaussian noise
model

Axioms of Probability
p(A) denotes the probability that the event A is true: Axioms of prob
p(A) ≥ 0 (Probability is a positive number assigned to the event)
P(C ) = 1 The probability of the certain event is 1
If A and B are mutually exclusive, then p(A + B) = p(A) + p(B)

Random variable
(discrete) Random variable (RV): a variable that can take one of many
values
Denote the probability of event X = x by p(X = x) or simply p(x)
here p(x) is the probability mass function
0 ≥ p(x) ≤ 1 (The probability of some event happening between zero

and one)
P
x∈X p(x) = 1 (Something happens)

Frequency definition
p(A) = lim nnA , where nA is the number of occurrences of A and n is

n→∞
the number of trials
Classical definition ⇒ For the random variable X p(X = xi ) = ci /N

where N is the number of possible outcomes, ci is the number of
outcomes favorable to X = xi
3
e.g. even die roll: 6
However, the classical definition gives weired results, so the frequency

definition is preferred

Rules
For X be able to take any value xi and RV Y be able to take any value
yi
If ci is the number of trials in which X = xi over a set of N trials, then

frequentist definition of probability: p(X = xi ) = ci /N as N → ∞
Question: what is the joint probability that I will get “snake eyes”?
i.e.xi = 1, yi = 1
Joint probability: in the game of Craps, let X be the number of dots on

a side of a dice, and Y on another
nij
p(X = xi , Y = yi ) =
N
Here nij is the number of trials over which X = xi , Y = yi

Rules of probability
Probability of union of two events A, B, i.e. probability of A or B
p(A ∨ B) = p(A) + p(B) − P(A ∧ B)

= p(A) + p(B) If A and B are mutually exclusive
Joint event p(A, B) Product Rule
p(A, B) = p(A ∧ B) = p(A|B)p(B)
Given p(A, B) we define the marginal distribution over B

Called as the Sum rule or rule of total probability
X X
p(A) = p(A|B) = p(A|B = b)p(B = b)
b b

Basic rules of probability
If we are only concerned about the probability of one variable, we can

marginalize orPsum over the other variable, this leads to
L
p(X = xi ) = j=1 p(X = xi , Y = yj ). This leads to the sum rule
Sum rule
X X
p(X ) = p(X , Y ) = p(X |Y = y )p(Y = y )
Y Y
Conditional probability p(Y = yj |X = xi )
Product rule
p(X , Y ) = p(Y |X )p(X )

Conditional Probability
Manipulating the Product Rule

p(X , Y )
p(X |Y ) =
p(Y )
Bayes Theorem
p(X , Y ) p(Y |X )p(X )
p(X |Y ) = =
p(Y ) p(Y )
The denominator can be expressed as the Total Probability

p(Y ) = x 0 p(Y = y |X = x 0 )p(X = x 0 ). This is a normalization constant
P
required to ensured that the lhs of Bayes rule over all values of Y is equal to
1 To get here, we never needed the frequency definition of probability

Example Bayes rule
Example 2.2.3.1 from Murphy

Test sensitivity 80%, i.e. when you have cancer (y=1), test will be true
(x=1): p(x = 1|y = 1) = 0.8
This is a case of a high likelihood of measuring x when the state is y
But what is the prior probability of being in state y ?: ⇒ p(y ) = 0.004
Clearly, now
p(cancer = 1|test = 1) = p(y = 1|x = 1) ∝ p(x|y )p(y ) = 0.8 × 0.004
But we must account for the total probability p(x), which includes false
positive p(x = 1|y = 0) = 0.1
So Bayes law tells us:
p(x = 1|y = 1)p(y = 1)

p(y = 1|x = 1) =
p(x = 1|y = 1)p(y = 1) + p(x = 1|y = 0)p(y = 0)
= 0.031

Probability Density Functions (PDFs)
Concept of probability can be extended to continuous variables using the

PDF
PDF
Z b
p(x ∈ (a, b)) = p(x)dx
a
PDFs satisfy the rules of probability:
p(x) ≥ 0
Z ∞
p(x)dx = 1
−∞
Sum and Product rules apply to pdfs:

Z
p(x) = p(x, y )dy , p(x, y ) = p(y |x)p(x)

Expectations
Expectation: Average value of a function
Expectation of a continuous pdf

Z
E[f ] = p(x)f (x)dx
Notice LHS does not have x in it, why?
Expectation of function of several variables can be taken wrt a variable, e.g.

for f (x, y ) Z Z
Ex [f ] = p(x, y )f (x, y )dydx
Ex [f ] is a function of y
Conditional expectation
Z
Ex [f (x|y )] = p(x|y )f (x)dx

Variance
Variance known as the second moment

Variance
Var [f ] = E[(f (x) − E[f (x)])2 ]
Useful identity
Variance
Var [f ] = E[f (x)2 ] − E[f (x)]2
The Gaussian distribution:

1 1
N (x|µ, σ 2 ) = √ exp{− (x − µ)2 }
2πσ 2 2πσ 2
hyperparameters: Mean: µ, variance σ 2 , std deviation σ Figuring out

Gaussian distribution hyperparameters

Gaussian Distribution
Figure: Gaussian Distribution
image source

Gaussian Distribution
Figure: Samples from Multi-variate Gaussian Distribution
https://upload.wikimedia.org/wikipedia/commons/8/8e/MultivariateNormal.png
Goals
State Space representation

Definitions of state, state variables, state vector, state space, state
trajectory, input and output
State space representation of dynamic systems
Solution of state space models
Observer design
Observability

Definitions
State: The state of a dynamic system is the smallest set of variables

(called state variables) such that the knowledge of these variables at
initial time t = t0 , together with the knowledge of the input for t ≥ t0 ,
completely determines the behavior of the system for t ≥ t0 .
State
Current state, control system, and system model completely defines future
response
State Variables: The variables that make up the state of the dynamic
system. If n variables, (x1 , x2 , . . . , xn ), are required to completely
describe the behavior of the system, then such n variables are a set of
state variables.

State Vector
State Vector: The n-dimensional column vector, x, composed of the

state variables: x ∈ Rn  
x1
 x2 
x =.
 
 .. 
xn
State Space: The n-dimensional space whose coordinates axes consists
of the x1 -axis, x2 -axis, . . ., xn -axis, where x1 , x2 , . . . , xn are the state
variables.
For Example: If a system has two variables, i.e., n = 2, then the
state-space is a two-dimensional space with x1 as the abscissa and x2 as
the ordinate.

State-Space modeling
State Trajectory: The path representing the state of the system in a

state space: x
Inputs: Inputs describe the external excitation of the dynamics; and are
extrinsic to the system dynamics: u(t)
Outputs: Outputs describe the directly measured variables; outputs are
a function of the state variables and inputs; outputs are not
independent variables: y (t)

General form of state space models
Assume that a system has n state variables, m inputs and r outputs. Then
the general form of the state space model for the system is
ẋ1 (t) = f1 (x1 , x2 , . . . , xn ; u1 , u2 , . . . , um ; t)

ẋ2 (t) = f2 (x1 , x2 , . . . , xn ; u1 , u2 , . . . , um ; t)
..
.
ẋn (t) = fn (x1 , x2 , . . . , xn ; u1 , u2 , . . . , um ; t)
The above first-order differential equations in the state variables are called
state differential equations or state space equations.
They can be nonlinear

State space models
The outputs y1 (t), y2 (t), . . . , yr (t) are given by
y1 (t) = g1 (x1 , x2 , . . . , xn ; u1 , u2 , . . . , um ; t)
y2 (t) = g2 (x1 , x2 , . . . , xn ; u1 , u2 , . . . , um ; t)
..
.
yr (t) = gr (x1 , x2 , . . . , xn ; u1 , u2 , . . . , um ; t)
• In compact vector form, we can write the state differential equations and
outputs as
ẋ(t) = f (x, u, t)
y (t) = g (x, u, t)
where x(t) is the state vector, u(t) is the input vector, and y (t) is the
output vector.

Linear Time Invariant Systems
If the system dynamics is linear and time-invariant, then the state space
model is of the form
ẋ(t) = Ax(t) + Bu(t) (1)

y (t) = Cx(t) + Du(t) (2)
A ∈ Rn×n state matrix

x ∈ Rn×1 state vector
B ∈ Rn×m input matrix
u ∈ Rm×1 input vector
C ∈ Rr ×n output matrix
y ∈ Rr ×1 output vector
D ∈ Rm×m direct transmission matrix
• The matrices A, B, C , D are called the system matrices and they depend
on the physical parameters of the system such as mass, spring and damping
coefficients etc

Comments
State models are not unique: One can choose many different sets of
state variables to describe the system in state space form
Commonly used variables include:
variables associated with energy storage elements;
sets including one variable and its successive derivatives;
variables that give a canonical (standard) form for the state space
representation

Ex: Mass Spring Damper Sys
x
k
b m u(t)
• The state space model will depend on the set of state variables that are
chosen.

Energy Variables
• The variables associated with energy storage elements are the force in the
spring Fk (t) and the velocity of the mass v (t).
Elemental equations
Fb (t) = b ẋ(t) (3)

dFk (t)
= kv (t) (Fk (t) = kx(t)) (4)
dt
dv (t)
m = u(t) − Fb (t) − Fk (t) (5)
dt

Energy Variables contd
Choose state variables as
x1 = Fk
x2 = v
State variable equations:
ẋ1 = kx2 (6)

1 b 1
ẋ2 = − x1 − x2 + u (7)
m m m
Output equation: Assuming that the displacement of the mass is measured,
the output equation is
1
y = x1 (8)
k

Energy Variables State Space model
The matrix form of the state space model is (with

variables associated with energy storage elements as state variables)

ẋ1 0 k x1 0
= + 1 u (9)
ẋ2 − m1 − mb x2 m
|{z} | {z } |{z} |{z}
ẋ A x B

1 x1
y= k 0 + [0] u (10)
| {z } x2 |{z}
C |{z} D
x

Another set of state variables
Choose state variables as
x1 = displacement of the mass,

x2 = velocity of the mass,
and assume the measured output is the displacement of the mass. Then the
state variable equations and output equation are
ẋ1 = x2 (11)
k b 1
ẋ2 = − x1 − x2 + u (12)
m m m
y = x1 (13)

Another set contd
The matrix form of the state space model is (with

displacement and velocity of the mass as state variables)

ẋ1 0 1 x1 0
= k + 1 u (14)
ẋ2 −m − mb x2 m
|{z} | {z } |{z} |{z}
ẋ A x B

x1
y= 1 0 + [0] u (15)
| {z } x2 |{z}
C |{z} D
x

Solution of LTI systems
• Consider the first-order system described by the state equation
ẋ(t) = ax + bu, x(0) = x0 . (16)
• The solution is given by

Z t
x(t) = e at x0 + e a(t−τ ) bu(τ )dτ. (17)
0
• Notice that the solution consists of two parts: the first part is due to the
initial condition x0 , and the second part is due to the external input u(t).

Vector case
• Consider the n-th order system given by
ẋ(t) = Ax(t) + Bu(t), x(0) = x0 (18)
where x(t) is the n-dimensional state vector.

• The solution is given by
Z t
At
x(t) = e x0 + e A(t−τ ) Bu(τ )dτ. (19)
0
• In the case of higher-order systems (n > 1), e At is a matrix, and is called

the State Transition Matrix.
• The state transition matrix is denoted by Φ(t), that is,
Φ(t) := e At (20)

Solution to LTI systems
• The solution vector can be written as

Z t
x(t) = Φ(t)x0 + Φ(t − τ )Bu(τ )dτ (21)
0
• How does one compute Φ(t)?

• Consider the infinite series expansion of e at , that is,
(at)2 (at)3
e at = 1 + at + + + ··· (22)
2! 3!
• For the matrix case we have
(At)2 (At)3
e At = I + At + + + ··· (23)
2! 3!
where I denotes the n × n identity matrix.

Stability in dynamical systems
The problem of stability in dynamical systems is very simple and very

complex. In the context of linear time invariant systems of the form
ẋ = Ax + Bu
The full and final condition for stability is:
The LTI system ẋ = Ax + Bu is exponentially asymptotically stable if and

only if the real part of the eigenvalues of A are negative, that is
Real(λi (A)) < 0 ∀ i ∈ spectrum(A), or in other words, A is Hurwitz.

The state feedback idea
ẋ = Ax + Bu
Consider the full state feedback case: C = I
then the state feedback controller always has the form:
u = −Kx
where K is the gain matrix, and x is the state

Substituting this value of u, we have that
ẋ = (A − BK )x
So the stability of the system depends on the eigenvalues of A − BK

The estimation problem
Imagine that you want to estimate the state of a system ẋ = Ax + Bu

from the measurements y = Cx, how do we do this?


We can do this by designing an observer system, which is another
dynamical system that over time will track our dynamic system


We assume that we know A, B, C , and define the state of the observer
system to be x̂
x̂˙ = Ax̂ + Bu + L(y − C x̂)


We assume that we know A, B, C , and define the state of the observer
system to be x̂
x̂˙ = Ax̂ + Bu + L(y − C x̂)
We can now study the error dynamics: e = x − x̂, and ė = ẋ − x̂˙
ė = Ax + Bu − Ax̂ − Bu − L(y − C x̂)

= A(x − x̂) − LC (x − x̂)
= (A − LC )e
so the stability of the observer depends on the eigenvalues of A − LC

Observability
When can we guarantee that we can find an l such that A − LC is

guaranteed Hurwitz?
This is similar to asking the following question: Given a sequence of
measurements y1 , y2 , ..., yt can I estimate x1 , x2 , ...xt and x0 .
It turns out that the following condition, if satisfied, guarantees
observability: Let W be the observability matrix:
A system is observable if rank(W ) = n
 
C
  CA
CA2
 
W = (24)
 

  ..
  .
CAn−1

Discrete time systems
In robotics, we need to deal with updates that happen on computers, which

are digital devices, hence updates happen in discrete time.
Discrete time systems
xk+1 = Φk xk + Γuk (25)

yk = Cxk (26)
Φk = e A∆t where ∆t is the discretization interval

Γ = B∆t

Now note that
x1 = Φx0 (27)
2
x2 = Φx1 = Φ x0 (28)
..
. (29)
n
xn = Φxn−1 = Φ x0 (30)
Hence, the transition function Φ̄k = Φk

Intuition behind observability matrix
For the discrete system, yk = Cxk , so it follows that

     
y0 C x0 C
 y1   C Φ x0   CΦ 
 y2   C Φ2 x0   C Φ2
     
= =  x0 . (31)


 ..   ..   .. 
 .   .   . 
yn−1 C Φn−1 x0 C Φn−1
We can rewrite this as

Y = Wx0
Where W is the observability matrix from (24). Now its clear that the above
equation has a solution only if W has full row rank of n.

Outline
1 Principles of Probability for Filtering

2 Observer design for linear systems
3 Kalman Filters
Why study Kalman Filters?
The Kalman Filtering Algorithm
Software Example
4 Extended Kalman Filter
5 Applications: Aided Inertial Navigation
One-dimensional INS example
GPS+Encoders
AHRS
GPS-INS
System Identification

Class outline and outcomes
In this class we will learn about Kalman Filters, the learning outcomes are:
Understand the Origin, need, and applications of Kalman Filters (KF)
Learn about applications in robotics
Get familiar with foundations of necessary probability theory
Get familiar with the terminology, including process noise, measurement
noise, predictive covariance, Gaussian white noise etc.
Mathematically formulate the Kalman filtering problem
Understand the KF algorithm and how to tune the filter
Recognize when advanced filtering techniques are necessary and when
KF assumptions are violated

What is Kalman filter
The filtering problem: Find the best estimate of the true value of a
system’s state given noisy measurements of some values of that
system’s states
What should a good filter do?
Provide an accurate and un-biased estimate
Provide confidence in its estimate

system’s states
The Kalman Filter
The Kalman filter is an optimal estimator for estimating the states of a
linear dynamical system from sensor measurements corrupted with Gaussian
white noise of some of that system’s states.

system’s states
The Kalman Filter
The Kalman filter is an optimal estimator for estimating the states of a
linear dynamical system from sensor measurements corrupted with Gaussian
white noise of some of that system’s states.
1.6 1.6
1.4 1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6 2Σ
0.4 m(t)
m̄(t)
0.4 m(t) 0.2
−5 0 5
noisyobs
0.2
−5 0 5
Figure: Estimate of the mean with
Figure: Noisy data and mean predictive covariance
Why is the Kalman Filter so popular?
It Works!

Why is the Kalman Filter so popular?
It Works!
Well... It works good-enough for many real-world applications
Why?
Sensor noise does tend to be Gaussian in the limit of data received
(central limit theorem)
Most systems behave linearly in local regions
Kalman filter utilizes feedback, which makes it robust to uncertainties
Its convenient to implement in an on-line manner to process streaming
data
Amenable to real-time implementation for many problems

What is a Kalman Filter used for?
The Kalman filter finds many many applications across pretty much all
important scientific disciplines
Its early application was on trajectory estimation on the Apollo
space-craft
Since then, it has been applied to many dynamical system state
estimation problems, including: Cellphone, GPS, weather monitoring,
precision agriculture, digital camera, ag-sensors.

Kalman filtering problem setup
The Kalman filter will return a mean (average) of the quantity being
estimated and provide a predictive variance on its estimate given:
The process and measurement noise variance is known
The dynamical system model is known
The Kalman filter is guaranteed to be the optimal un-biased estimator
for the following case:
The noise in the sensors in Gaussian
The dynamical system is linear
Sufficient number of states of the dynamical system are measured (the
system is observable)
If these assumptions are violated, the Kalman filter will be sub-optimal

Outline

3 Kalman Filters
Software Example
GPS+Encoders
AHRS
GPS-INS

Preliminaries: Continuous Time Invariant Systems
If the system dynamics is Linear and Time-Invariant (LTI), then the

state space model is of the form
Noisy continuous-time LTI systems
ẋ = Ax + Bωt (32)
y = Hx + vt (33)
A ∈ Rn×n state matrix

n×1
x ∈R state vector B ∈ Rn×m the input
n×1
ω∈R (additive) process noise matrix (here we are using
y ∈ Rl×1 sensor measurements it for inputting noise)
v ∈ Rl×1 (additive) measurement noise H ∈ Rl×n output matrix

Q and R matrices
The assumptions on the process and measurement noise are:

Zero mean, uncorrelated, i.e. ωk ∼ N(0, σω2 ), vk ∼ N(0, σv2 )
E [(ωt , ωs )] = Qδ(t − s), E [(vt , vs )] = Rδ(t − s), where δ is the dirac
delta function, s.t. δ(t − s) = 1 when t = s.
no cross correlation between ωk and vk , i.e. cov (ωk , vk ) = 0 for all k
Herein lies the “official” definition of Process and Measurement noise. What
it is saying is that you can set the diagonal term as σω2 and σv2
Practically, Q matrix represents the confidence in the process model,
larger the Q matrix, the less confident we are
Practically, R matrix represents the confidence in the measurements
from the correcting sensors, higher the R matrix, the less confident in
the measurements we are

Preliminaries: Discrete time Linear Systems
If the system dynamics is Linear and Time-Invariant (LTI), then the

state space model is of the form
Noisy discrete-time systems
xk+1 = Φk xk + Γk ωk (34)
yk = H k xk + vk (35)
Φk ∈ Rn×n discretized
n×1
x ∈R state vector state transition matrix
ω ∈ Rn×1 (additive) process noise Γk ∈ Rn×m Discretized
l×1
y ∈R sensor measurements input matrix
v ∈ Rl×1 (additive) measurement noise Hk ∈ Rl×n output matrix

The matrices Φ, H are called the system matrices and they depend on
the physical parameters of the system such as mass, growth rate...
note that these are the discrete versions of the equations:
ẋ = A(t)x + B(t)u; y = C (t)x, in MATLAB the command is c2d
In particular, Φk = e A∆t , Γk = B(t)∆t, Hk = C (t), where ∆t is the
sampling time (dt)
The process noise ω encodes our uncertainty in the knowledge of the
dynamical evolution
The measurement noise v encodes sensor measurement uncertainty

Relationship between Q and Qd
Qd is the discrete time version of Q, remember, E [(ωt , ωs )] = Qk δ(t − s)

TO get Qd we integrate the dynamics in the continuous time case:
Z tk +∆t
x(k + 1) = Φk x(k) + e A(tk+1 −λ) B(λ)ω(λ)dλ (36)
tk
So we have
Z tk +∆t
ωk = e A(tk+1 −λ) B(λ)ω(λ)dλ (37)
tk
The exact solution is (assuming white noise process, and computing

Qdk = cov (ωk )
Z tk+1
Q dk = Φ(tk+1 , s)B(s)Q(s)B T (s)Φ(tk+1 , s)T ds (38)
tk
A good approximation is: Qd = BQB T ∆T this is only good as long as the

eigenvalue norm satisfies kA∆tkF << 1
Covariance matrices
Let x = [x(1), x(2), ..., x(n)] ∈ Rn , with x(i) the i th component of x,

then the covariance matrix P ∈ Rn×n is defined as:
COV(x) , E[(x − x̄)(x − x̄)T ]

Z ∞ Z ∞
= ··· (x − x̄)(x − x̄)T dx(1)dx(2) · · · dx(n) ,P
−∞ −∞
The i th diagonal elements of P are the variance of x(i) given by σi2

The off-diagonals are the cross-correlations, given by σi σj
The covariance matrix is symmetric and positive definite, it can always
be diagonalized

Noise properties
Additive process and measurement noise

The zero-mean additive Gaussian white noise assumption
ωk ∼ N(0, Qk ): measurement noise, encodes the uncertainty in the sensors
vk ∼ N(0, Rk ): The process noise, encodes our uncertainty in modeling the
process
Here Qk ∈ Rn×n and Rk ∈ Rl×l are positive definite matrices, encoding

the process and measurement noise covariances
Typically sufficient to pick diagonal matrices with positive entries
The measurement noise Rk (typically stationary: R) is typically
provided in the sensor specification sheets, its the variance of the sensor
Qk (or when stationary: Q) is a little more difficult to find, typically this
is the variable that needs to be tuned

Some notation
Kalman filter has two stages:

Prediction: This is the a-priori stage, that is before the measurement is
available, or in between any two measurements. We will denote all
variables in this stage with a − sign. E.g. P − , x̂ −
Correction: This is the posteriori stage, that is after the measurement is
available. We will denote all variables in this stage with a + sign, E.g.
P + , x̂ +
We study first the discrete implementation of the Kalman filter, so k for
us is the time or iteration step

Mean and Covariance propagation discrete time
E [xk+1 ] = E [Φk xk + ωk ] (39)

E [xk+1 ] = Φk E [xk ] + 0 (40)
Let µk = E [xk ] Covariance propagation
Pk = E [(x − µk )(x − µk )T ] (41)
Hence
Pk+1 = E [(xk+1 − µk+1 )(xk+1 − µk+1 )T ] (42)

= E [(Φk (xk − µk ) + ωk )(Φk (xk − µk ) + ωk )T ] (43)
= E [Φk (xk − µk )(xk − µk )T ΦT T T
k + ωk ωk + Φk (xk − µk )ωk (44)
+ ωk (xk − µk )T ΦT
k ]
Pk+1 = Φk Pk ΦT
k + Qk (45)
where E [wk wkT ] = Qk and noting that E [ωk ] = 0

So the prediction is
x̂k− +
= φk x̂k−1 (46)
−
Pk+1 = Φk Pk+ ΦT
k + Qk (47)
Remember we are propagating the mean x̂k

So the prediction is
x̂k− +
= φk x̂k−1 (46)
−
Pk+1 = Φk Pk+ ΦT
k + Qk (47)
Remember we are propagating the mean x̂k Now we use the observer
equation to set up the correction
+
x̂k+1 = Φk xk + Lk (yk − Hk x̂k− ) (48)
So now
x̂k+ = x̂k− + Lk δyk− (49)
Innovation term
δyk− = yk − yk− = yk − Hk x̂k−

Now the prediction error is ek− = xk − x̂k− (a-priori error) Let ek+ = xk − x̂k+
be the correction (a-posteriori) error and the innovation term:
δyk− = Hk δxk− + vk (50)
Now
ek+ = xk − x̂k+ (51)

= xk − x̂k− − Lk (δyk− ) (52)

Now
ek+ = xk − x̂k+ (51)

= xk − x̂k− − Lk (δyk− ) (52)
= xk − x̂k− − Lk (Hk ek− + vk ) (53)

Now
ek+ = xk − x̂k+ (51)

= xk − x̂k−
− Lk (δyk− ) (52)
= xk − x̂k−
− Lk (Hk ek− + vk ) (53)
= ek− − Lk (Hk ek− + vk ) (54)
ek+ = (I − Lk Hk )ek− − Lk vk ) (55)

ek+ = (I − Lk Hk )ek− − Lk vk (56)
Let us now derive Pk+

T
Pk+ = E [ek+ ek+ ] (57)

ek+ = (I − Lk Hk )ek− − Lk vk (56)

T
Pk+ = E [ek+ ek+ ] (57)
= E [(I − Lk Hk )ek− − Lk vk )(I − Lk Hk )ek− T
− Lk vk ) ] (58)

ek+ = (I − Lk Hk )ek− − Lk vk (56)

T
Pk+ = E [ek+ ek+ ] (57)
= E [(I − Lk Hk )ek− − Lk vk )(I − Lk Hk )ek− T
− Lk vk ) ] (58)
T
= E [(I − Lk Hk )ek− ek− (I − Lk Hk )T + Lk Rk LT
k ] (59)

Recapping: Prediction:
x̂k− +
= φk x̂k−1
−
Pk+1 = Φk Pk+ ΦT
k + Qk

x̂k− +
= φk x̂k−1
−
Pk+1 = Φk Pk+ ΦT
k + Qk
Correction after measurement:
x̂k+ = x̂k− + Lk δyk−
Posterior covariance
Pk+ = (I − Lk Hk )Pk− (I − Lk Hk )T + Lk Rk LT
k (60)

x̂k− +
= φk x̂k−1
−
Pk+1 = Φk Pk+ ΦT
k + Qk
Correction after measurement:
x̂k+ = x̂k− + Lk δyk−
Posterior covariance
Pk+ = (I − Lk Hk )Pk− (I − Lk Hk )T + Lk Rk LT
k (60)
Now what we need is the optimal gain Lk

The optimal Kalman gain
What does optimal mean here: What if we chose the gain to minimze
the error variance!

The optimal Kalman gain
What does optimal mean here: What if we chose the gain to minimze
the error variance!
The error variance is contained in the diagonal of Pk+
Optimal gain Lk
To compute the optimal gain Lk we minimize trace(Pk+ ) wrt Lk . To do this,
∂trace(Pk+ )
solve ∂Lk = 0 for Lk
Remember trace is the sum of the diagonal elements in the matrix

Computing the optimal gain
Multiply the terms out and drop the subscript k for convinience
P + = P − − LHP − − p − H T LT + L(HP − H T + R)LT (61)

Computing the optimal gain
Multiply the terms out and drop the subscript k for convinience
P + = P − − LHP − − p − H T LT + L(HP − H T + R)LT (61)

Now taking the trace, and remembering trace(AB) = trace(BA) and
trace(AT ) = trace(A), and P − is symmetric positive definite Hence
trace(P + ) = trace(P − ) − 2trace(LHP − ) + trace[L(HP − H T + R)LT ] (62)

∂trace(Pk+ )
Now taking the derivative ∂Lk
∂trace(Pk+ )
= −2(HP − )T + 2L(HP − H T + R) (63)
∂Lk

∂trace(Pk+ )
For minimum, set ∂Lk = 0,

∂trace(Pk+ )
∂(trace(AB)) ∂(trace(ABAT ))
Now remember ∂A = BT if AB is square, and ∂A = 2AB T
if B is symmetric. So now

∂trace(Pk+ )
−2(HP − )T + 2L(HP − H T + R) = 0 (64)

∂trace(Pk+ )
−2(HP − )T + 2L(HP − H T + R) = 0 (64)
Hence
Kalman Gain
L = P − H T (HP − H T + R)−1 (65)

The Kalman filter has two steps:

Prediction step:
In this step we predict forward the state of the system using our model
and Q
This is our best guess of what the system state would look like
But it will deviate from the true state if the system evolves in a different
manner than we expecte
Correction step: To ensure that our predictions do not drift for too
long, the KF utilizes the idea of feedback corrections
In this step we correct our predicted state using feedback between
predicted measurement and the actual sensor measurement
The correction brings our prediction back on track, without having to
have information about all the states, or doing it all the time
Together the predict-correct framework leads to a robust state
estimation technique

Kalman filtering algorithm mathematical specifics
Initialize x0 ∼ N(0, P0 )
Prediction step
xk− = Φk xk−1
Pk− = Φk Pk−1 ΦT
k + Qk
Correction step
ek = yk − Hk xk−
Sk = Hk Pk− HkT + Rk
Lk = Pk− HkT Sk−1
xk+ = xk− + Lk ek
Pk+ = Pk− − Lk Sk LT
k

Kalman filter algorithm
If we assume that Φk , Hk , Qk , Rk do not change, we can re-write the
algorithm more simply:
KF algorithm
Prediction step
xk− = Φxk−1
Pk− = ΦPk−1 ΦT + Q
Correction step
Lk = Pk− H T (HPk− H T + R)−1 (66)
Sk = HPk− H T + R (67)
xk = xk− + Lk (yk − Hxk− ) (68)
Pk = Pk− − Lk Sk LT
k (69)
Note that Pk does not depend on xk , this is a direct consequence of the

linearity and Gaussian noise assumption
This means we can pre-compute Kk off line by iteratively solving (1)
and (2) until they converge
Continuous time Kalman Filter
If we assume that A, C , Q, R are continuous-time counterparts:
KF algorithm
Prediction step Initialize x̂ − (t) = x̂(t = k), P(t) = Pk (t = k)
x̂˙ − = Ax̂
Ṗ − = AP + PAT + Q
Correction step
Pk− = P(t = k) (70)
Lk = Pk− C T (CPk− C T + R)−1 (71)
xk = xk− + L(yk − Cxk− ) (72)
Pk = [I − Lk C ]P − (k) (73)
Note that Pk does not depend on xk , this is a direct consequence of the

linearity and Gaussian noise assumption
This means we can pre-compute Kk off line by iteratively solving (1)
and (2) until they converge
Outline

3 Kalman Filters
Software Example
GPS+Encoders
AHRS
GPS-INS

Software example

−1 −5
A=
6 −1

C= 0 1
Matlab code KF_simple.m

The measurement noise variance is 0.1 for both states
The process noise variance is 0.01 for both states
output
2.5
1 real
estimated
0.8 2
0.6
0.4 1.5
x(1)
0.2
0 1
-0.2
-0.4 0.5
-0.6
0 1 2 3 4 5 6 7 8 9 10
0
y
state estimate -0.5

3
real
2 estimated
-1
1
x(2)
0 -1.5
-1
-2
-2
-3 -2.5
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Time in unit time Time in unit time
(a) States (b) Output

More details on the software implementation
The software implementation use a slightly different notation for ease in

variable naming
xk− is termed x̃, Pk− is termed P̃
The true state of the system (required for comparison) is simulated, and
called xk , whereas the estimated state is called x̂k consistent with
estimation theory notation
The system also has a known input u(t) ∈ R and is assumed to be
continuous, that is:
ẋ = Ax + Bu + ζ
With B = [10]T
The input is sinusoidal, and the system is discretized to work with the
framework outlined in class

What if the noise is non-Gaussian or the system is
non-linear?
If the system dynamics are non-linear but sufficiently smooth, then we

can try local linearization
This leads to the Extended Kalman Filter (more about this next week)
If the dynamics are not sufficiently smooth, or the noise is
non-Gaussian, we can utilize Particle Filters (more about this next week)
The idea here is to create a cloud of particles, transform them through
the dynamics and noise, and then re-compute the mean and variance at
the other side
A smart way of doing this is known as the Unscented Kalman Filter
(Julier and Uhlmann, 1997)

Extended Kalman Filter
What if the dynamics are nonlinear?
ẋ = f (x, u) (74)
y = h(x) (75)
The Bayesian inference problem will in general be intractable (remember

Champan Komlogrov equation and the challenges with Bayesian inference)
Our options:
Full-blown sample-based Bayesian inference (Markov Chain Monte Carlo
(see ABE 598))
Particle filters: Same as above, but with a specifically selected set of
sample “particles”
Extended Kalman Filter: Linearized filter in a nonlinear setting (as
opposed to a fully linearized filter)

EKF
EKF algorithm
Initialize x̂0 = x(0), P0 = diag ([σx21 , σx22 , . . . , σx2n ])
Prediction step
Z t+∆t
x̂k− = f (x̂k−1 , uk−1 )dt
t
∂f (x, u)
Ak = |x=x̂k−1 ; Φk = e (Ak ∆t)
∂x
Pk− = Φk Pk−1 ΦT k +Q
Correction step
y (k) = h(x) (76)
∂h(x)
Hk = |x=x̂k−1 (77)
∂x
Lk = Pk− H T (HPk− H T + R)−1 (78)
Sk = HPk− H T + R (79)
xk = xk− + L(yk − Hxk− ) (80)
Pk = Pk− − LSk LT (81)
Outline

3 Kalman Filters
Software Example
GPS+Encoders
AHRS
GPS-INS

Point mass system
Consider a point mass system

ṗ(t) = v (t), (82)
v̇ (t) = a(t) (83)
The output of the accelerometer is the true acceleration, plus bias, plus
noise. So the measurement model for the accelerometer is:
ũ(t) = a(t) − b(t) − va (t) (84)
|{z} |{z} | {z }
true acceleration bias noise(white)

Point mass system

ṗ(t) = v (t), (82)
v̇ (t) = a(t) (83)
ũ(t) = a(t) − b(t) − va (t) (84)
|{z} |{z} | {z }
We will assume that bias is a slowly evolving process that is stationary, so

ḃ(t) = ωb (t)
•Where, ωb ∼ N(0, σωb ). Let’s say that σb2 = 1.0 × 10−6 m2 /s 2 . We will also
assume that the initial bias is a draw from a Gaussian distribution:
b(0) ∼ N(0, σωb )

Point mass system

ṗ(t) = v (t), (82)
v̇ (t) = a(t) (83)
ũ(t) = a(t) − b(t) − va (t) (84)
|{z} |{z} | {z }
We will assume that bias is a slowly evolving process that is stationary, so

ḃ(t) = ωb (t)
•Where, ωb ∼ N(0, σωb ). Let’s say that σb2 = 1.0 × 10−6 m2 /s 2 . We will also
assume that the initial bias is a draw from a Gaussian distribution:
b(0) ∼ N(0, σωb )
•va is Gaussian white note, va (t) ∼ N(0, σva ), let’s say
σv2a = 2.5 × 10−3 m2 /s 3
How do the state estimates behave
Observer (navigation or prediction or mechanization) equations from 82
˙
p̂(t) = v̂ (t), (85)
˙v̂ (t) = â(t) (86)
Where, from (84)
â(t) = ũ(t) + b̂(t)

|{z} |{z}
accelerometermeasurements KFestimateofbias
˙
What should b̂(t) be?

How do the state estimates behave
Observer (navigation or prediction or mechanization) equations from 82
˙
p̂(t) = v̂ (t), (85)
˙v̂ (t) = â(t) (86)
Where, from (84)
â(t) = ũ(t) + b̂(t)

|{z} |{z}
accelerometermeasurements KFestimateofbias
˙
What should b̂(t) be? Remember that E [ḃ(t)] = 0 because ωb is Gaussian
white noise, so, we have our third equation
˙
b̂(t) = 0 (87)

Putting it all together
Our filter mechanization equations:

  
p̂˙
   
0 1 0 p̂ 0 0
 v̂˙   va
= 0 0 1   v̂  +  1 0  (88)
ωb

b̂˙ 0 0 0 b̂ 0 1

Putting it all together
Our filter mechanization equations:

  
p̂˙
   
0 1 0 p̂ 0 0
 v̂˙   va
= 0 0 1   v̂  +  1 0  (88)
ωb

b̂˙ 0 0 0 b̂ 0 1
Now we need our measurement equations, let’s assume position is being

measured, then
 
p̂
y = 1 0 0  v̂  (89)
b̂
Now we have enough to do our Kalman Filtering, but will the filter
converge? Will we get an optimal gain matrix L?

Observability analysis
We need to check observability from (24). Use matlab command obsv(A,C).

The rank needs to be 3 for observability
First case, only position measured

C= 1 0 0 (90)



C= 1 0 0 (90)
Rank is 3, system is observable

Second case, only velocity measured.

C= 0 1 0 (91)



C= 1 0 0 (90)


C= 0 1 0 (91)
Rank is 2, system is NOT observable

Third case, both position and velocity measured

1 0 0
C= (92)
0 1 0



C= 1 0 0 (90)


C= 0 1 0 (91)
Rank is 2, system is NOT observable

Third case, both position and velocity measured

1 0 0
C= (92)
0 1 0

Outline

3 Kalman Filters
Software Example
GPS+Encoders
AHRS
GPS-INS

GPS+Encoders
Now we will consider designing a navigation system for a ground robot that
has GPS and encoders. We will use encoders to mechanize the (extended)
Kalman filter and use GPS to correct it.
Kimenatic model of the Dubin’s car robot from Module 2, assuming no slip
and no lateral velocity in the body frame
ẋ = u sin ψ (93)
ẏ = u cos ψ (94)
ψ̇ = ω (95)

u
Let v b = . The velocity vector in the tangent frame is
v
v t = Rb→t v b

cos ψ − sin ψ
Where Rb→t =
sin ψ cos ψ

The encoders
The encoders UL and UR measure left and right encoder velocities
1
u= (UL + UR ) (96)
2
1
ω = (UL − UR ) (97)
L

The encoders
The encoders UL and UR measure left and right encoder velocities
1
u= (UL + UR ) (96)
2
1
ω = (UL − UR ) (97)
L
We use the above to mechanize equations (93), to correct, we use GPS,

which we assume is at the center of the robot (more on this later), and
measures position
 
x
1 0 0 
y= y  (98)
0 1 0
ω

EKF
Equations (93) are nonlinear, so we need here an Extended Kalman Filter,

∂f
remember Φk = ∂x |x=x̂
Doing the math, we get
 
0 0 −u sin ψ
Φk =  0 0 u cos ψ  (99)
0 0 0
So, the linearized prediction model is

 
0
x̂k+1 = Φk x̂k +  0  ω
1

Outline

3 Kalman Filters
Software Example
GPS+Encoders
AHRS
GPS-INS

Attitude Hold and Reference System
Used in aircraft that do not have GPS to hold attitude, does not work
in the presence of strong acceleration
Use inertial sensor only + magnetometers
3 axis accelerometers, gyroscopes, and magnetometers
Idea: Integrate gyro outputs to predict attitude and then correct with
gravity direction sensed from accelerometers
Figure: AHRS schematic

AHRS mechanization
ωb/I Angular rate of body WRT inertial
ωb/N Angular rate of body WRT body carried navigation frame
(remember Module 2)
ωN/b = ωN/I − ωb/I
The above is a 3-dimensional vector. Let q̄ denote the rotational quaternion
from nav frame to body frame q̄ = [q0 , q1 , q2 , q3 ], then
1
q̃ωn/b
q̇ =
2
Where q̃ is the quaternion multiplication operator, remember Module 2.
Now the body force vector is
fb/I = ab/I − gb/I (100)
and    
0 − sin θ
gb/I = RI →b  0  =  sin φ cos θ  g
−g cos φ cos θ
Sensor model u = ωb/I + xg + vg
|{z} |{z}
bias noise
Outline

3 Kalman Filters
Software Example
GPS+Encoders
AHRS
GPS-INS

Example: GPS-INS integration
States to estimate:
p: x, y , z position in NED frame (which we are going to assume is
inertial)
v : U, V , W velocities in NED frame
q: Attitude quaternion (note the discussion on error quaternion in a
latter slide)
bω : Bias of the rate gyro
ba : Bias of the accelerometer
Using the following measurements:
ã: 3 axis accelerometers
ω̃: 3 axis gyro
m̃ : 3 axis magnetometer (we won’t really use this)
p̃: position from GPS
ṽ : velocity from GPS

INS mechanization equations
Now we note the kinematic equations of a rigid body moving in

3-dimensional space
They are:
I
p̂˙ = I v̂ (101)
I
v̂˙ = R̂b→I (b a − b̂a ) (102)
1
q̂˙ = − Ω(ω)q (103)
2
b̂˙ = 0 ω (104)
b̂˙ a = 0 (105)
(106)
 
0 P Q R
 −P 0 −R Q 
Ω(ω) = 
 −Q
 (107)
R 0 −P 
−R −Q P 0
Inertial measurement models
•Acceleration
ã = a − ba (108)
•Angular rates, ω = [P, Q, R]T
ω̃ = ω − bω (109)
So, P = p − bωp ; Q = q − bωq ; R = r − bωr

Attitude quaternion
Let the quaternion denoting the rotation from body to inertial frame
q = [q1 , q2 , q3 , q4 ], where q1 is the scalar part denoting the rotation and
[q2 , q3 , q4 ] is the axis of rotation
 2
q1 + q22 − q32 − q42

2(q2 q3 + q1 q4 ) 2(q2 q4 − q1 q3 )
Rb→I =  2(q2 q3 − q1 q4 ) q12 − q22 + q32 − q42 2(q3 q4 + q1 q2 ) 
2(q2 q4 + q1 q3 ) 2(q3 q4 − q1 q2 ) q12 − q22 − q32 + q42
(110)
And,
T
RI →b = Rb→I (111)
•Don’t forget to normalize the quaternion in your code every time-step, you
q
do this by setting q ← norm(q)

Linearization
∂ I p̂˙
Fpv = = I3×3 (112)
∂ I v̂
∂ I v̂˙ ∂Rb→I b
Fvq = I = a (113)
∂ q̂ ∂q
∂ I v̂˙
Fvba = = −Rb→I (114)
∂ I bˆa
∂ I q̂˙ 1
Fqq = I = − Ω(ω) (115)
∂ q̂ 2
∂ I q̂˙
Fqbω = (116)
∂ b̂ω

Linearization
Let us look at some of these equations in more detail
∂ I v̂˙ ∂Rb→I I
= a (117)
∂ I q̂ ∂q
 
2(q1 ax − q4 ay + q3 az ) 2(q2 ax + q3 ay + q4 az ) 2(−q3 ax + q2 ay + q1 az ) 2(−q4 ax − q1 ay + q2 az )
∂ I v̂˙
∂ I q̂
=  2(q4 ax + q1 ay − q2 az ) 2(q3 ax − q2 ay − q1 az ) 2(q2 ax + q3 ay + q4 az ) 2(q1 ax − q4 ay + q3 az ) 
2(−q3 ax + q2 ay + q1 az ) 2(q4 ax + q1 ay − q2 az ) 2(−q1 ax + q4 ay − q3 az ) 2(q2 ax + q3 ay + q4 az )
(118)
∂ I q̂˙ ∂Ω(ω)q
= −0.5 (119)
∂ b̂ω ∂bω
 
q2 q3 q4
1 −q 1 q 4 −q 3

=   (120)
2  −q4 −q1 q2 
q3 −q2 −q1

Putting together the linearized transition matrix
Here Z is a zero matrix, and I is the identity matrix

 
Z (3 × 3) I (3 × 3) Z (3 × 4) Z (3 × 3) Z (3 × 3)
 Z (3 × 3) Z (3 × 3) Fvq Z (3 × 3) Fvba 
 
A=  Z (4 × 3) Z (4 × 3) Fqq Fqbω Z (4 × 3) (121)


 Z (3 × 3) Z (3 × 3) Z (3 × 4) Z (3 × 3) Z (3 × 3) 
Z (3 × 3) Z (3 × 3) Z (3 × 4) Z (3 × 3) Z (3 × 3)
•Don’t forget to discretize A. If you want to use continuous A, i.e. 121 then
use Ṗ = AP + PAT + Q

Measurements
Our measurements are GPS position and velocity, so z = [x, y , z, vx , vy , vz ]T

Where z contains positions and velocities at the CG. However, the GPS is
mounted offset from the CG (this is very important) at the location rGPS ,
where r is in the body fixed frame, so
pCG = pGPS − Rb→I rGPS (122)

vCG = vGPS − Rb→I ω × rGPS (123)
Remember Rb→I is a function of the quaternions, so we need to linearize this

to get our H matrix. Now rGPS = [1.5, 0, 0]T for the dataset we are using.
So we can ignore the second and third column of Rb→I

Linearizing the measurement model
 
−rGPS (1)2q1 −rGPS (1)2q2 rGPS (1)2q3 rGPS (1)2q4
Hxq =  −rGPS (1)2q4 −rGPS (1)2q3 −rGPS (1)2q2 −rGPS (1)2q1 
rGPS (1)2q3 −rGPS (1)2q4 rGPS (1)2q1 −rGPS (1)2q2
(124)
 
rGPS (1)2q3 Q + rGPS (1)2q4 R rGPS (1)2q4 Q − rGPS (1)2q3 R rGPS (1)2q1 Q − rGPS (1)2q2 R rGPS (1)2q2 Q + rGPS (1)2q1 R
Hvq =  −rGPS (1)2q2 Q − rGPS (1)2q1 R rGPS (1)2q2 R − rGPS (1)2q1 Q rGPS (1)2q4 Q − rGPS (1)2q3 R rGPS (1)2q3 Q + rGPS (1)2q4 R 
rGPS (1)2q1 Q − rGPS (1)2q2 R −rGPS (1)2q2 Q − rGPS (1)2q1 R −rGPS (1)2q3 Q − rGPS (1)2q4 R rGPS (1)2q4 Q − rGPS (1)2q3 R
(125)

I (3 × 3) Z (3 × 3) Hxq Z (3 × 6)
H= (126)
Z (3 × 3) I (3 × 3) Hvq Z (3 × 6)

Error Quaternion
Quaternion: q = [q0 , q1 , q2 , q3 ] Define an error quaternion: δq = [1, s]T ,

such that
δq ◦ q̂ = q (127)
In practice, this error quaternion takes on very small values, so its ok to say
ŝ = 0 If we do this, we can simplify some calculations. See JFR paper by
Chowdhary et al.

Outline

3 Kalman Filters
Software Example
GPS+Encoders
AHRS
GPS-INS

System Identification or Machine Learning
•The challenge here is to estimate a set of unknown parameters θ of the

model ẋ = f (x, θ, u) using measurements y = g (x, θ, u)
•This is still somewhat of an open problem, especially when f , g are
nonlinear, and even when they are not, but g does not lead to full state
observations.
We can formulate the problem as:
ẋ = f (x, θ, u) (128)
θ̇ = 0 (129)
y = f (x, θ) (130)
•This is a nonlinear equation, because of interaction with θ

•Need to solve as an Extended Kalman filter, or an UKF

HW 3 Q2
To solve Q2, you need to simulate 2 different systems

The first without noise, with the real a
The second, with noise, with ũ
     
ṗ 0 1 0 p
 v̇  =  0 0 1 ∗ v  (131)
ḃ 0 0 0 b

HW3 Q3
Now you have a measurement, and it is of the form

 
p
y= 1 0 0  v  (132)
b
When you do the correction step, make sure you do it every second.
The prediction should happen as fast as the inertial sensors give you data,
assume 100Hz.

HW3 Q4
The states are now [x, y , θ, v , b] The linearized matrix is (before

discretization)
 
0 0 −v sin(θ) cos(θ) 0
 0 0 v cos(θ) sin(θ) 0 
 
A=  0 0 0 0 0  (133)
 0 0 0 0 0 
0 0 0 0 0
For the prediction step, integrate the continuous nonlinear dynamics first
For the correction step use the measurement matrix:
 
  x
1 0 0 0 0   y 

y= 0 1 0 0 0  θ 
  
 (134)
0 0 0 1 0  v 
b
The third measurement is the encoder

Observability
Observability tells us whether the measurements available (i.e. y ) are

sufficient to reconstruct the state (x).
For linear time invariant systems, this boils down to a simple condition on
the matrix pair (C , A):
Observability condition
The pair (C , A) is observable if and only if the matrix O has full column rank
 
C
 CA 
2
 
O =  CA  (135)
 
 .. 
 . 
CAn−1

Example of observability
Consider the HW system from Q3, the linearized matrices are given in 133
and the measurement matrix: 134
Let us use MATLAB to check the observability condition, try it for the
following cases:
θ = 450 , x, y measured
θ = 450 , x, y , v measured

Example of observability
Consider the HW system from Q3, the linearized matrices are given in 133
and the measurement matrix: 134
Let us use MATLAB to check the observability condition, try it for the
following cases:
θ = 450 , x, y , v measured

Case study: estimation of velocity
Consider a situation where you have position measurements x at sample rate

dt and you want to estimate velocity, ẋ Formulate the problem as:

ẋ 0 1 x 0
= +
ẍ 0 0 ẋ ω

y= 1 0 =ζ
Where ω is zero mean Gaussian white process noise and ζ is zero mean
Gaussian white measurement noise. This is a second order random walk
model, we are assuming that the acceleration is random, which isn’t really
true, but it works as an estimation model. You’ll notice that the
measurement y is nothing but noisy x.
To solve
the estimation problem, simply design a Kalman filter with
0 1
A= and C = 1 0 Its a bit tricky to tune it right, and you’ll
0 0
always have some delay in estimation of ẋ

Further Reading
Gelb Arthur, Applied Optimal Estimation, MIT Press, Chapter 4:

Optimal Linear Filtering
Slides on the Bayesian derivation of the Kalman Filtering
equations by Simo Särkkä:
http://becs.aalto.fi/~ssarkka/course_k2012/handout3.pdf
Simo’s book: Bayesian Filtering and Smoothing, Chapte 4, available
online: http:
//becs.aalto.fi/~ssarkka/pub/cup_book_online_20131111.pdf
Christophersen, Henrik B., et al. “A compact guidance, navigation,
and control system for unmanned aerial vehicles.” Journal of
aerospace computing, information, and communication 3.5 (2006):
187-213.
Chowdhary, Girish, et al. “GPS denied Indoor and Outdoor
Monocular Vision Aided Navigation and Control of Unmanned
Aircraft.” Journal of Field Robotics 30.3 (2013): 415-438.

M3 Kalman Filter Equations

Uploaded by

Copyright:

Available Formats

You might also like

M3 Kalman Filter Equations

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

M3 Kalman Filter Equations

Uploaded by

Copyright:

Available Formats

Kalman Filters and Sensor Fusion

March 16, 2021

Chowdhary www.daslab.illinois.edu March 16, 2021 1 / 105

Recall the basics of probability theory

Chowdhary www.daslab.illinois.edu March 16, 2021 2 / 105

p(A) ≥ 0 (Probability is a positive number assigned to the event)

P(C ) = 1 The probability of the certain event is 1

If A and B are mutually exclusive, then p(A + B) = p(A) + p(B)

Chowdhary www.daslab.illinois.edu March 16, 2021 3 / 105

Denote the probability of event X = x by p(X = x) or simply p(x)

here p(x) is the probability mass function

0 ≥ p(x) ≤ 1 (The probability of some event happening between zero

Chowdhary www.daslab.illinois.edu March 16, 2021 4 / 105

p(A) = lim nnA , where nA is the number of occurrences of A and n is

Classical definition ⇒ For the random variable X p(X = xi ) = ci /N

However, the classical definition gives weired results, so the frequency

Chowdhary www.daslab.illinois.edu March 16, 2021 5 / 105

If ci is the number of trials in which X = xi over a set of N trials, then

Joint probability: in the game of Craps, let X be the number of dots on

Here nij is the number of trials over which X = xi , Y = yi

Chowdhary www.daslab.illinois.edu March 16, 2021 6 / 105

Probability of union of two events A, B, i.e. probability of A or B

p(A ∨ B) = p(A) + p(B) − P(A ∧ B)

Joint event p(A, B) Product Rule

p(A, B) = p(A ∧ B) = p(A|B)p(B)

Given p(A, B) we define the marginal distribution over B

Chowdhary www.daslab.illinois.edu March 16, 2021 7 / 105

If we are only concerned about the probability of one variable, we can

Conditional probability p(Y = yj |X = xi )

Chowdhary www.daslab.illinois.edu March 16, 2021 8 / 105

Manipulating the Product Rule

The denominator can be expressed as the Total Probability

Chowdhary www.daslab.illinois.edu March 16, 2021 9 / 105

Example 2.2.3.1 from Murphy

p(x = 1|y = 1)p(y = 1)

Chowdhary www.daslab.illinois.edu March 16, 2021 10 / 105

Concept of probability can be extended to continuous variables using the

PDFs satisfy the rules of probability:

Sum and Product rules apply to pdfs:

Chowdhary www.daslab.illinois.edu March 16, 2021 11 / 105

Expectation: Average value of a function

Expectation of a continuous pdf

Notice LHS does not have x in it, why?

Expectation of function of several variables can be taken wrt a variable, e.g.

Chowdhary www.daslab.illinois.edu March 16, 2021 12 / 105

Variance known as the second moment

The Gaussian distribution:

hyperparameters: Mean: µ, variance σ 2 , std deviation σ Figuring out

Chowdhary www.daslab.illinois.edu March 16, 2021 13 / 105

Figure: Gaussian Distribution

Chowdhary www.daslab.illinois.edu March 16, 2021 14 / 105

Figure: Samples from Multi-variate Gaussian Distribution

State Space representation

Chowdhary www.daslab.illinois.edu March 16, 2021 16 / 105

State: The state of a dynamic system is the smallest set of variables

Chowdhary www.daslab.illinois.edu March 16, 2021 17 / 105

State Vector: The n-dimensional column vector, x, composed of the

Chowdhary www.daslab.illinois.edu March 16, 2021 18 / 105