Recursive Last Squares 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Module 1 | Lesson 3

Least Squares and Maximum


Likelihood
Least Squares and Maximum Likelihood
By the end of this video, you will be able to…


• State the connection between the method of least squares


and maximum likelihood with Gaussian random variables
Revisiting the Least Squares Criterion

• Recall the least squares criterion

ℒLS(x) = (y1 − x)2 + (y2 − x)2 + … + (ym − x)2

• We’ve said that the optimal estimate, x ̂ , is the one that minimizes this ‘loss’:
̂ = argmin x ℒLS(x) = argmin x (e12 + e22 + … + em2 )
xLS

Why squared errors?


Least Squares and Maximum Likelihood

Gauss showed the connection between his


method of least squares and maximum
likelihood with Gaussian measurement models

Karl Friedrich Gauss



‘Princeps mathematicorum’
The Method of Maximum Likelihood

• We can ask which x makes our measurement most likely. Or, in other words, 

which x maximizes the conditional probability of y :
x ̂ = argmax x p(y | x)

ymeas
p(y | xb)
Which x is the
most likely given
p(y | xa) the measurement?
p(y | xc)
p(y | xd)

y
Measurement Model

• Recall our simple measurement model:


y=x+v

• We can convert this into a conditional probability on our measurement, by


assuming some probability density for v. For example, if
2
v ∼ 𝒩(0, σ )

• Then:
p(y | x) = 𝒩(x, σ 2) p(y | x)

y=x
Least Squares and Maximum Likelihood
• Probability density function of a Gaussian is:

2 1 −(z − μ)2
𝒩(z; μ, σ ) = e 2σ2

σ 2π
• Our conditional measurement likelihood is
2
p(y | x) = 𝒩(y; x, σ )
1 −(y − x)2
= e 2σ2
2πσ 2

• If we have multiple independent measurements, then:


p(y | x) ∝ 𝒩(y1; x, σ 2)𝒩(y2; x, σ 2) × … × 𝒩(ym; x, σ 2)
m
− ∑i=1 (yi − x)2
( )
1
= exp
(2π) σ
m 2m 2σ 2
Least Squares and Maximum Likelihood
• The maximal likelihood estimate (MLE) is given by

̂
xMLE = argmax x p(y | x)

• Instead of trying to optimize the likelihood directly, we can take its logarithm:

̂
xMLE = argmax x p(y | x) The logarithm is
= argmax x log p(y | x) monotonically increasing!

• Resulting in:
1 We’ve combined
log p(y | x) = − 2 ((y1 − x)2 + … + (ym − x)2) + C several terms into a
2σ constant that does
not depend on x.
We can ignore it for
the purposes of
maximization
Least Squares and Maximum Likelihood
• Since

argmax z f(z) = argmin z (−f(z))

• The maximal likelihood problem can therefore be written as

̂
xMLE = argmin x − (log p(y | x))
1
= argmin x 2 ((y1 − x)2 + … + (ym − x)2)

Least Squares and Maximum Likelihood
• So:
1
̂
xMLE = argmin x 2 ((y1 − x) + … + (ym − x) )
2 2

• Finally, if we assume each measurement has a different variance, we can derive


2 2

2 ( σ12 )
1 (y1 − x) (ym − x)
̂
xMLE = argmin x + … +
σm2

In both cases,
̂
xMLE ̂ = argmin x ℒLS(x) = argmin x ℒMLE(x)
= xLS
The Central Limit Theorem
• In realistic systems like self driving cars, there are many sources of ‘noise’

Central Limit Theorem: When independent random variables are added, their
normalized sum tends towards a normal distribution.

• Why use the method of least squares?


1. Central Limit Theorem: sum of different errors will tend be ‘Gaussian’-ish
2. Least squares is equivalent to maximum likelihood under Gaussian noise
Least Squares | Some Caveats

• ‘Poor’ measurements (e.g. outliers) have a


significant effect on the method of least squares
Under the Gaussian
• It’s important to check that the measurements PDF, samples ‘far away’
from the mean are ‘very
roughly follow a Gaussian distribution improbable’

# Resistance (Ohms) # Resistance (Ohms)

1 1068 1 1068
2 988
2 988
3 1002
3 1002 4 996
4 996 5 (outlier) 1430
x ̂ = 1013.5 x ̂ = 1096.8
Summary | Least Squares and
Maximum Likelihood
• LS and WLS produce the same estimates as maximum likelihood assuming Gaussian
noise

• Central Limit Theorem states that complex errors will tend towards a Gaussian
distribution.
• Least squares estimates are significantly affected by outliers

You might also like