SSP 2 2 Estimation 2

Outline
1 Introduction
2 Parameter estimation
IN5340 / IN9340 Lecture 5 3 Cramér-Rao lower bound

Mathematical description
Parameter Estimation II General CRLB for signal in WGN
Cramér-Rao lower bound summarised
Example: Time delay estimation
4 Least Squares Estimation
Roy Edgar Hansen January 2022
Mathematical description
Least Squares Estimation summarised
Examples
5 Summary
1 / 49
What do we learn Additional information

Parameter estimation part II The course EE 522 by Prof. Mark Fowler at Binghamton University has excellent lecture
Ch 6 in Therrien and the whole book of Kay notes
S. M. Kay. “Google” for EE522 and download / read:

Fundamentals of Statistical Signal Processing: Estimation Theory, Volume I. Notes #4 Ch 3 Cramer Rao Bound Pt. A
Prentice Hall International Editions, 1993. Notes #5 Ch 3 Cramer Rao Bound Pt. B
Notes #8 Ch 3 CRLB Examples
C. W. Therrien.
Discrete Random Signals and Statistical Signal Processing. Notes #16 Ch 8A LSE
Prentice Hall, 1992. Note 4 (5,6,7) introduces the Cramér-Rao lower bound
Note 8 has the example of CRLB for time delay estimation
Note 16 (17,18,19) covers Least Squares Estimation
Least Squares Estimation is excellent covered in wikipedia
http://www.ws.binghamton.edu/fowler/fowler personal page/EE522.htm
2 / 49 3 / 49
Motivation Parameter estimation
Example: Bearing estimation Parameter estimation is to estimate a certain parameter from a (random) sequence
The task: Find an optimal estimator and its performance This can be mathematically formulated by the probability density function, where the data is
dependent on the unknown parameter
fx (x; θ)
As a function of the parameter, this is called the likelihood function
The performance of any estimator can be characterised by estimator properties such as bias
and variance
Parameter estimation grand goal
Finding an optimal estimator and assessing its performance
4 / 49 5 / 49
Optimal estimators Parameter estimation example

There are a number of different optimisation criteria typical model: signal with unknown parameters in noise
The optimal estimator does not always exsist... x [n] = s[n, θ] + w [n]
Maximizing the likelihood function leads to the maximum likelihood estimator Example: a constant unknown value A in AWGN:
x [n] = A + w [n]
Choosing the unbiased estimator with minimum variance leads the Minimum Variance
Unbiased estimator The PDF (or likelihood function) becomes
N
!
1 1 X
fx (x; θ) = exp − 2 (x [n] − A)2
(2πσx2 )N /2 2σx
n =1
We remember that the sample average
N
1 X
Â = x [n]
N
n =1
is an unbiased estimator that is consistent. It is also the maximum likelihood estimator
6 / 49 7 / 49
Cramér-Rao lower bound Applications of the Cramér-Rao lower bound
Cramér-Rao lower bound Modeling of sensor performance
The Cramér-Rao lower bound (CRLB) is the theoretical lower bound on the variance of any Benchmarking of new estimators
unbiased estimator Simulate a given scenario with known θ
Design and implement new estimator
Compare measured error with CRLB
If θ̂ is an unbiased estimator of θ
Var(θ̂) ≥ CRLB(θ) for all estimators θ̂ Feasibility studies:
Can some sensor actually meet the specifications
Note that the CRLB is only a function of the mathematical model. Has the system been oversold
The CRLB is not a function of the estimator Assess estimator performance - how close are we
The CRLB states the best possible performance any estimator can have. Can be used to check if an estimator is the MVU estimator
8 / 49 9 / 49
Intuitive description of Cramér-Rao lower bound Mathematical description of Cramér-Rao lower bound
Theorem
Recall that the maximum likelihood estimate was found by finding

∂ ln fx (x, θ) Assume that the regularity condition1 is met
=0
∂ ln fx (x, θ)

∂θ
θ=θ̂ml E =0 for all θ
∂θ
The sharpness of the likelihood
where E {·} is with respect to the likelihood function
function sets the accuracy
The variance for any unbiased estimator must satisfy
The sharpness can be measured
1
using curvature: Var(θ̂) ≥
∂ 2 ln fx (x, θ)

∂ 2 ln fx (x, θ)

−E
− ∂θ2
∂θ2 x=given data
θ=true value where the derivative is taken at the true value of θ
The expectation must be taken because the first and second derivatives are random
variables dependent on x
1
generally true except when the domain of the PDF for which it is nonzero depends on the unknown parameter θ
10 / 49 11 / 49
Cookbook for finding the Cramér-Rao lower bound Example: CRLB for estimating a constant in AWGN
1 Write the log likelihood function (LLF) (the PDF as a function θ) Example: Constant unknown value A in AWGN: x [n] = A + w [n]
ln fx (x, θ) The likelihood function is
Take the second derivative of LLF N
!
2
1 1 X
∂ 2 ln fx (x, θ) fx (x; θ) = exp − 2 (x [n] − A)2
(2πσx2 )N /2 2σx
n =1
∂θ2
3 If still a function of x, take expectation with respect to x The log likelihood function becomes
N
4 If still a function of θ, evaluate at desired value N 1 X
ln fx (x, A) = − ln(2πσx2 ) − 2 (x [n] − A)2
5 Negate and invert 2 2σx
n=1
The first derivative of the LLF
N
∂ ln fx (x, A) N 1 X A
= 2 x [n] − 2 N
∂A σx N σx
| n={z1 }
sample mean x̄
12 / 49 13 / 49
Example: CRLB for estimating a constant in AWGN Mathematical description of Cramér-Rao lower bound
Continuation of the theorem
The second
derivativeof the LLF becomes
∂ N N There exists an unbiased estimator that attains the Cramér-Rao lower bound if and only if
(x̄ − A) = − 2 ∂ ln fx (x, θ)
∂ A σx2 σx = I (θ)(g (x) − θ)
Final stage: negate and invert: ∂θ
for some function g and I
σx2
CRLB = That estimator, which is the MVU estimator is
N
The variance for all unbiased estimators of a constant value in AWGN must be θ̂ = g (x)
σx2 The minimum variance is
var(Â) ≥
N 1
CRLB(θ) =
The CRLB for estimating a constant value in AWGN decreases with increased observation I (θ)
set, and increases with increasing standard deviation I (θ) is called the Fisher Information
Ronald A. Fisher 1890 - 1962
British statistician and geneticist
From Wikipedia
14 / 49 15 / 49
Example: A constant value in AWGN revisited Example: A constant value in AWGN revisited
Example: Our signal in noise x [n] = A + w [n] Continuing...
The likelihood function is ∂ ln fx (x, A) N
= 2 (x̄ − A)
N
! ∂A σx
1 1 X
fx (x, A) = exp − 2 (x [n] − A)2 We immediately recognise the right hand side as
(2πσx2 )N /2 2σx
n =1
I (θ)(g (x) − θ)
The log likelihood function was found to be
where
N
N 1 X θ̂ = x̄
ln fx (x, A) = − ln(2πσx2 ) − 2 (x [n] − A)2
2 2σx
n=1 is the MVU estimator, and
And the first derivative of the log likelihood function was 1 σ2
CRLB(θ) = = x
∂ ln fx (x, A) N I (θ) N
= 2 (x̄ − A)
∂A σx
16 / 49 17 / 49
Alternative form of Cramér-Rao lower bound CRLB for a constant value in AWGN - again
Alternative forms of the Cramér-Rao lower bound
Our simple example. We recall (after regrouping)
Recall that N
1 ∂ ln fx (x, A) 1 X
CRLB(θ) = = 2 (x [n] − A)
I (θ) ∂A σx
n=1
I (θ) is called the Fisher Information The alternative form is 
( 2 ) N
!2 
Can be represented in two equivalent forms: ∂ ln fx (x, θ)  1 X 
( 2 ) I (θ) = E =E (x [n] − A)

∂ 2 ln fx (x, θ)

∂ ln fx (x, θ) ∂θ  σx2 
n =1
I (θ) = −E =E
∂θ2 ∂θ
| {z } Since the noise is AWGN, w [n] are uncorrelated and the cross terms becomes zero
alternative form N
1 X 1 N
E (x [n] − A)2 = N σx2 =

See appendix 3A in [Kay(1993)] for the proof I (θ) =
σx4 σx4 σx2
n=1
Hence, the Cramér-Rao lower bound is (again) Var(Â) ≥ 1/I (θ) = σx2 /N
18 / 49 19 / 49
General CRLB for signals in WGN (1) General CRLB for signals in WGN (2)
Since it is common to assume some signal in AWGN, it is fruitful to derive the generalised CRLB The second derivative becomes
for this case ( )
N
Assume some signal dependent of a parameter in WGN ∂ 1 X ∂ s[n; θ] ∂ s[n; θ]
2
x [ n ] − s [ n ; θ]
x [n] = s[n; θ] + w [n], n = 1, 2, . . . , N ∂θ σx ∂θ ∂θ
n=1
N 2 !
The likelihood function is ∂ 2 s[n; θ] ∂ 2 s[n; θ]

1 X ∂ s[n; θ]
N
! = x [n ] − s[n; θ] −
1 1 X σx2 ∂θ 2 ∂θ2 ∂θ
fx (x; θ) = exp − 2 (x [n] − s[n; θ])2 n =1
(2πσx2 )N /2 2σx N 2 !
∂ 2 s[n; θ]

n =1 1 X ∂ s[n; θ]
The derivative of the log likelihood function is
= (x [n] − s[n; θ]) −
σx2 ∂θ2 ∂θ
n=1
N
∂ ln fx (x; θ) 1 X ∂ s[n; θ]
= 2 (x [n] − s[n; θ]) remember the product rule (f (x )g (x ))′ = f ′ (x )g (x ) + f (x )g ′ (x )
∂θ σx ∂θ
n=1
remember the chain rule (f (g (x )))′ = f ′ (g (x ))g ′ (x )
20 / 49 21 / 49
General CRLB for signals in WGN (3) General CRLB for signals in WGN (4)
Finally, taking the expectation of the second derivative
The expectation of the second derivative of the log likelihood function then becomes
N 2
∂ 2 ln fx (x; θ)
( N 2 !)

∂ 2 ln fx (x; θ)

1 ∂ 2 s[n; θ]

∂ s[n; θ] 1 X ∂ s[n; θ]
=−
X
E =E (x [n] − s[n; θ]) − E
∂θ2 σx2 ∂θ2 ∂θ ∂θ2 σx2 ∂θ
n =1 n=1
N ! The Cramér-Rao lower bound (or the lower bound on variance becomes)
∂ 2 s[n; θ] ∂ s[n; θ] 2

1 X
= 2 E (x [n] − s[n; θ]) −E σx2
σx | {z } ∂θ2 ∂θ Var(θ̂) ≥
n =1 N
E {x [n]}=s[n;θ] ∂ s[n; θ] 2
X
N
∂ s[n; θ] 2 ∂θ

1 X n =1
=− 2 E
σx ∂θ Note the signal dependence. Signals that change rapidly will result in accurate estimators.
n =1
22 / 49 23 / 49
CRLB for a constant value in AWGN - again again Another example: Frequency estimation in WGN
OK. Let’s check our new approach on our classic example of a constant value in AWGN
Assume again a signal dependent of a parameter in WGN
The model
x [n] = s[n; θ] + w [n], n = 1, 2, . . . , N
x [n] = A + w [n]
where
This gives the rather simple expression s[n; θ] = A cos(2π nf0 + ϕ), 0 < f0 < 1/2
∂ s[n; θ] with known amplitude A, known phase ϕ and unknown frequency f0 = θ
s[n; θ] = θ ⇒ =1
∂θ
Hence, the CRLB becomes
σx2 σx2
Var(θ̂) ≥= N
=
X ∂ s[n; θ] 2
N
∂θ
n =1
24 / 49 25 / 49
Another example: Frequency estimation in WGN (2) Cramér-Rao lower bound summarised
Simple calculus gives us The CRLB is the theoretical bound on the variance any for any unbiased estimator
∂ s[n; f0 ] It is found by calculating the negative inverse of the expectation of the second derivative of
= −A2π n sin(2π nf0 + ϕ)
∂ f0 the log likelihood function
Using the general CRLB for a signal in AWGN, we get
Physically, this can be interpreted as the sharpness (or curvature) of the peak of the
σx2 σx2 likelihood function
Var(f̂0 ) ≥ N
= N
∂ s[n; f0 ] 2
X X
A2 [2π n sin(2π nf0 + ϕ)]2 If an unbiased estimator achieves a variance equal to the lower bound, it is said to be an
∂ f0 efficient estimator
n=1 n =1
It is then also the MVU estimator. Not vice versa.

The CRLB may not exist.
26 / 49 27 / 49
Cramér-Rao lower bound extensions Time delay estimation in sonar (1)
There are several extensions not mentioned here
Consider a transmitted pulse that is reflected by an object in unknown distance
The CRLB is possible to extend to complex random sequences
The range r to the target is equivalent to the two-way travel time of the acoustic pulse
The CRLB can be written in a similar manner for estimation of multiple parameters (vector
τ0 = 2r /c
notation)
where c is the sound velocity
In some cases where the theoretical derivation of the CRLB is difficult, it might be possible to
Challenge: given the sonar settings, what is the theoretical lower bound on time delay
calculate an asymptotic CRLB.
estimation (equivalent to range estimation)
These extensions are not too difficult to understand, but the mathematics are slightly more tricky
28 / 49 29 / 49
Time delay estimation in sonar (2) Time delay estimation in sonar (3)
We describe the time delay estimation problem mathematically as follows
We choose the noise bandwidth to be fully covered by the sampling system such that the
Consider a transmit pulse s(t ) and a received time series s(t − τ0 ) where τ0 is the wanted sampled version of the signal appears to be white
delay
x [t ] = s(n∆ − τ0 ) + w [t ] w [t ] WGN
Our signal model becomes ∆ is the sampling interval
x (t ) = s(t − τ0 ) + w (t )
The signal is only non-zero during the pulse length
where w (t ) is bandlimited additive Gaussian noise 
0 ≤ n ≤ n0 − 1
 w [n ]
The transmit signal is assumed to be non-zero during a time interval of Ts (the pulse length) x [n] = s(n∆ − τ0 ) + w [t ] n0 ≤ n ≤ n0 + M − 1
w [n ] n0 + M ≤ n ≤ N − 1

where the received echo has M non-zero samples starting at n0 and the discrete time delay
is given by τ0 = n0 ∆
30 / 49 31 / 49
We now apply the general CRLB for signals in WGN
To continue, we assume that the sampling interval is small such that the sum can be
Replace θ with the desired parameter τ0 replaced by an integral
M −1 2
∂ s(t ) 2
Z Ts
σx2 σx2
X ∂ s(t ) 1
Var(τ̂0 ) ≥ ≈ dt
N −1 2 = n0 + M −1 2 n =0
∂ t t =n∆ ∆ 0 ∂t
X ∂ s[n; τ0 ] X ∂ s(n∆ − τ0 )
∂τ0 ∂τ0 The signal energy is
n =0 n=n0 Z Ts
σx2 σx2 E= s2 (t )dt
= !2 = M −1
0
n0 +M −1 2
X ∂ s(t ) X ∂ s(t ) The RMS signal bandwidth (not to be confused with signal bandwidth) is defined as
∂t ∂ t t =n∆ Z ∞
t =n∆−τ0
n=n0 n =0
(2π f )2 |S (f )|2 df
2 −∞
Note that τ0 = n0 ∆ βrms = Z ∞
|S (f )|2 df
−∞
32 / 49 33 / 49
Parseval’s theorem For flat spectrum S (f ) = S0 and signal frequency [f1 , f2 ] = f0 + [−B /2, B /2]
Z ∞ Z ∞
2 2
|s(t )| dt = |S (f )| df Z ∞ Z f2
−∞ −∞ 2
(2π f ) |S (f )| df 2
(2π f )2 S02 df
Properties of the Fourier Transform : Time derivative 2 −∞ f1
βrms = Z ∞ = f2
d
Z
F s(t ) = j2π fS (f ) |S (f )|2 df S02 df
dt −∞ f1
2
f23 − f13 /3

Giving the RMS signal bandwidth 2 S0 (2π)2 f23 − f13
= (2π)2 f02 + B 2 /12

∞ Z Ts 2 = (2π) =
S02
Z
∂ s (t ) (f2 − f1 ) 3 f2 − f1
(2π f )2 |S (f )|2 df dt
2 −∞ 0 ∂t
βrms = Z ∞ = Z Ts
2
Or approximately βrms ≈ (2π)2 f02 for B < f
2
|S (f )| df s2 (t )dt
−∞ 0
34 / 49 35 / 49
Time delay estimation in sonar (8) Time delay estimation in sonar (9): Example
2
Using E and βrms , the lower bound becomes The signal level is approximately 7 (around the peak) and the noise level is approximately 0.5,
∆σx2 giving a SNR of 14.
Var(τ̂0 ) ≥ 2
Eβrms The center frequency 100 kHz ⇒ RMS bandwidth ≈ 6.28 × 105
Further, 1/∆ = 2B is the sampling frequency, and the variance of the noise is σx2 = N0 B (the 1 1
Var(τ̂0 ) ≥ = s2
noise level N0 times the total bandwidth) SNR × (2π)2 f02 14 × 4 × 1011
Using this, βrms ≈ 2π f0 , and realising that E/(N0 /2) is some kind of Signal to Noise Ratio The standard deviation (square root of the variance) gives a lower limit on the accuracy. In
(SNR), we obtain this case this limit is 0.4 µs. This equals 0.04 wave periods at the carrier frequency
1 1
Var(τ̂0 ) ≥ ≈
SNR × βrms 2 SNR × (2π)2 f02
The Cramér-Rao lower bound for time delay estimation decreases with increasing SNR and
increasing frequency
36 / 49 37 / 49
Least Squares Estimation Least Squares Estimation

All the previous cases we have studied requires a probabilistic model. We have needed
fx (x; θ).
We define the least squares cost function as
For cases with signal in noise, we actually required a signal model and a noise model N
X N
X
Least squares is not statistically based. J (θ) = ε2 [n] = (x [n] − s[n; θ])2
It does not require a model n=1 n =1
It does, however, require a deterministic signal model We seek to minimise J. The Least Squares Estimator (LSE)
chooses the value that s[n] closest to the observations x [n].
This approach is not new.
In 1795 Carl Friedrich Gauss used this Can be done by setting the partial derivative to zero

method to estimate planetary motions. ∂ J (θ)
=0
Very similar to regression in statistical analysis ∂θ
θ̂LS =θ
Carl Friedrich Gauss 1777-1855

German mathematician
From Wikipedia
38 / 49 39 / 49
Weighted Least Squares Estimation Linear Least Squares Estimation
The least squares approach is easily extended to a weighted version The linear least squares problem is when the parameter observation model is linear.
N
X Assume that we have p unknown variables to be estimated θ = [θ1 , θ2 , . . . , θp ]T
J (θ) = wn (x [n] − s[n; θ])2
n =1 The observation model is linear s = Hθ
where wn is some weight chosen to increase or decrease the individual importance of each H is the known N × p size observation matrix
sample in the observation.
Example: Fitting a linear line to the data
Typically, this weight is related to data quality or SNR  
1 0
 1 1
A word of caution

 A
 
 1 2
If the chosen model does not reflect the observations in a proper way, the estimate will be s[n] = An + B ⇒ 
 B

 .. ..
biased. This applies both for the weighted and ordinary LS estimation  . .  | {z }
θ
1 N −1
| {z }
H
40 / 49 41 / 49
Linear Least Squares Estimation Linear Least Squares Estimation

The cost function for linear least squares becomes The cost function written out
N
X J (θ) = xT x − xT Hθ − θ T HT x + θ T HT Hθ
J (θ) = (x [n] − s[n; θ])2 = (x − Hθ)T (x − Hθ)
n =1
The gradient
By solving ∂ J (θ)
= −2HT x + 2HT Hθ
∂ J (θ) ∂θ
=0 Setting equal to zero
∂θ
we obtain a closed form of the linear least squares estimator HT Hθ = HT x
which gives a closed form of the linear least squares estimator
θ̂LS = (HT H)−1 HT x
See Solving Linear Systems of Equations in the matlab manual
42 / 49 43 / 49
Summary of Least Squares Estimation Example: Fine time delay estimation
We will now calculate the linear least squares estimator for a simple case of time delay
Least Squares Estimation (LSE) is essentially to fit a model s(θ) to a data set x by
estimation
minimising the squared error
Consider the time delay example in the previous section
The sum of the squared error is called the cost function
Assume that the received echo around the maximum value is a Gauss
LSE does not require any probabilistic information
x [n] = I0 exp(−(t − τ0 )2 /2στ2 ) t = n∆
LSE does however require a deterministic model
This equation has three unknown parameters we wish to estimate: peak intensity I0 , time
If the model can be described as a linear combination of the wanted parameters, linear LSE delay τ0 , and pulse width στ
can be applied
We realise that this model becomes a quadratic function if we take the logarithm
The linear LSE provides a closed form solution
x ′ [n] = ln x [n] = ln I0 − (t − τ0 )2 /2στ2
There are numerous extensions of the least squares approach
We can reorganise this equation to fit the form
If the model does not represent the observations properly, the estimates will be poorer and θ0 + θ1 t + θ2 t 2
probably have bias
44 / 49 45 / 49
Example: Fine time delay estimation Example: Fine time delay estimation
The observation matrix becomes then The least squares estimated values
1 t [1] t [1]2
 
θ̂LS = [θ̂0 , θ̂1 , θ̂2 ]T
 1 t [2] t [2]2 
 
2 
can be reorganised into the wanted parameters (by simple algebra) as follows
H =  1 t [3] t [3] 

 .. 2 1
.. ..  σ̂τ = −
 . . .  2θ̂2
2
1 t [N ] t [N ]
θ̂1
The least squares estimator is
τ̂0 = −
2θ̂2
θ̂LS = (HT H)−1 HT x
!
θ̂12
Î0 = exp θ̂0 −
This is simply theta = H\x in matlab 4θ̂2
See Solving Linear Systems of Equations in the matlab manual
46 / 49 47 / 49
Fine time delay estimation - Algorithm Summary of lecture V
List of terms from this lecture
Upsample the timeseries to get a decent number of samples on the peak (a factor of 4 or 8)
Parameter estimation
Locate the sample delay for the maximum value in the timeseries
Likelihood function
Select the samples around peak that has sample values above the half-value of peak
Maximum likelihood estimation
Run the linear least squares method to estimate amplitude, time delay and pulse width
Log likelihood function (LLF)
The cost function can be used as a measure for model misfit
Cramér-Rao lower bound (CRLB)
Fisher Information
σ̂τ = 0.0194 ms Efficient estimator
τ̂0 = 14.39 ms Least Squares Estimator
Î0 = 6.73
48 / 49 49 / 49
Roy Edgar Hansen
IN5340 / IN9340 Lecture 5

Parameter Estimation II

SSP 2 2 Estimation 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSP 2 2 Estimation 2

Uploaded by

Copyright:

Available Formats

Outline

IN5340 / IN9340 Lecture 5 3 Cramér-Rao lower bound

What do we learn Additional information

S. M. Kay. “Google” for EE522 and download / read:

Optimal estimators Parameter estimation example

It is then also the MVU estimator. Not vice versa.

Least Squares Estimation Least Squares Estimation

Carl Friedrich Gauss 1777-1855

Linear Least Squares Estimation Linear Least Squares Estimation

Roy Edgar Hansen

IN5340 / IN9340 Lecture 5

You might also like