Professional Documents
Culture Documents
VE564 Summer 2023: Lecture 3-1: Maximum Likelihood Estimation and Least Squares
VE564 Summer 2023: Lecture 3-1: Maximum Likelihood Estimation and Least Squares
Prof. H. Qiao
UM-SJTU Joint Institute
May 24, 2023
Outline
Least Squares
Basic Least Squares
Variant 1: Least Squares with Unknown Model Order
Variant 2: Least Squares with Incoming Data
Variant 3: Least Squares with Constraints
Variant 4: Nonlinear Least Squares
Maximum Likelihood Estimator
The maximum likelihood (ML) estimator is an alternative to the MVU
estimator. The ML principle is most popular in designing practical
estimation. The ML estimator has optimal performance when a large
volume of data is available.
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Examples
xrns “ A ` w rns, n “ 0, 1, ¨ ¨ ¨ , N ´ 1
A2
VarpÂq ě
NpA ` 12 q
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Examples
We first try to find the optimal estimator that may achieve the CRLB
N´1 N´1
B ln ppx; Aq N 1 ÿ 1 ÿ
“´ ` pxrns ´ Aq ` pxrns ´ Aq2
BA 2A A n“0 2A2 n“0
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Examples
řN´1
Then T pxq “ n“0 x 2 rns is a sufficient statistic. However, it is not
obvious to find a function hp¨q of T such that
˜ ¸
N´1
ÿ
EA hp x 2 rnsq “ A
n“0
´ř ¯
N´1 2 2
as E n“0 x rns “ NpA ` A q. We can also compute Epxr0s|T q for
the simple unbiased estimator xr0s. But the computation is formidable.
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Examples
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Examples
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example
B ln ppx;Aq
By setting BA “ 0, we have the MLE (positive one)
g
f N´1
1 f 1 ÿ 2 1
 “ ´ ` e x rns `
2 N n“0 4
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Property of MLE
θ̂ „ N pθ, I ´1 pθqq
where I pθq is the Fisher information evaluated at the true value of the
unknown parameter.
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: MLE of the Sinusoidal Phase
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: MLE of the Sinusoidal Phase
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: MLE of the Sinusoidal Phase
Now we fix the data record length and vary the SNR.
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
MLE for Transformed Parameters
α̂ “ g pθ̂q
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: Transformed DC Level in WGN
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: Power of WGN in dB
P “ 10 log10 σ 2
And then
N´1
1 ÿ 2
P̂ “ 10 log10 σ̂ 2 “ 10 log10 x rns
N n“0
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Extension to a Vector Parameter
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Extension to a Vector Parameter
α̂ “ gpθ̂q
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: Signal in Non-Gaussian Noise
where w rns is zero mean i.i.d noise with the Laplacian PDF
„
1 1
ppw rnsq “ exp ´ |w rns|
4 2
All signal samples tsr0s, sr1s, ¨ ¨ ¨ , srN ´ 1su are to be estimated. The
PDF of the data is
N´1
ź 1 „
1
ppx; θq “ exp ´ |xrns ´ srns|
n“0
4 2
ŝrns “ xrns
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: Signal in Non-Gaussian Noise
θ̂ “ x
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
MLE for the Linear (Affine) Model
x “ Hθ ` w
where H P RNˆp has full column rank p and w „ N p0, Cq, then the
MLE of θ is
Suppose that the observations txk u8 k“1 are i.i.d sequence of random
variables with density ppx; θq. Define
B ln ppxk ; θq
ψpxk ; θq “ , Jpθ; θ1 q “ Eθ pψpx1 ; θ1 qq
Bθ
2
V. Poor, Proposition IV.D.1
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
3
Asymptotic Normality of MLEs*
Suppose that the observations txk u8 k“1 are i.i.d sequence of random
variables with density ppx; θq. Also assume that tθ̂n u8n“1 is a consistent
sequence of roots of the likelihood equation. If ψ satisfies the following
regularity conditions
Bψpx1 ; θ1 q B 2 ψpx1 ; θ1 q
ψ 1 px1 ; θ1 q fi , ψ 2 px1 ; θ1 q fi
Bθ1 Bθ12
exist
(3) There is function Mpx1 q such that |ψ 2 px1 ; θ1 q| ď Mpx1 q for @θ1 P Ω
and Eθ rMpx1 qs ă 8
3
V. Poor, Proposition IV.D.2
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
3
Asymptotic Normality of MLEs*
(4) Jpθ; θq “ 0
(5)
B 2 ppx; θq B2
ż ż
µpdxq “ ppx; θqµpdxq
Bθ2 Bθ2
3
V. Poor, Proposition IV.D.2
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Example: One-parameter exponential family
As
d
rEη rT pxj qss “ Varη T pxj q ą 0
dη
The MLE solution is unique. Furthermore, we have
?
npη̂ ´ ηq Ñ N p0, VarpT qq
MLE for Scalar Parameter MLE for Vector of Parameters Advanced Topic of MLE: Consistency and Efficiency
Least Squares
Least Squares
Pros:
Cons:
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Least Squares
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Least Squares
C. F. Gauss claimed that he came up with the least squares method in 1795.
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
The Least Squares Approach
Note that we do not take any probabilistic assumptions about the data
xrns.
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Examples
1
řN´1
which is ÂLS “ x̄ “ N n“0 xrns.
Remark
Note that ÂLS may not be optimal in any sense as we do not make any
statistical assumptions. It is MVU for the WGN case but will be biased
if the noise is not zero-mean.
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Examples
s “ Hθ
where the observation matrix H P RNˆp with full rank p. Different from
previous discussions, we do not assume that the noise follows a particular
distribution. The LSE is found by minimizing
N´1
ÿ
Jpθq “ pxrns ´ srnsq2 “ px ´ HθqT px ´ Hθq
n“0
θ̂ “ pHT WHq´1 HT Wx
with error
` ˘
Jmin “ xT W ´ WHpHT WHq´1 HT W x
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Weighted Least Squares
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Geometrical Interpretation
If H has full column rank, each vector in the column space corresponds
to a unique parameter θ.
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Geometrical Interpretation
px ´ ŝq K th1 , h2 u
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Geometrical Interpretation
K CpHq ô T H “ 0
θ̂ “ pHT Hq´1 HT x
Note that
P “ HpHT Hq´1 HT
In many cases, we do not know the signal models exactly. And we may
try different models in a particular order. For example, we try to fit a set
of data
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
s1 rns “ A
Â1 “ x̄
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
θ̂k “ pHT ´1 T
k Hk q Hk x
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
Let Hk`1 “ rHk , hk`1 s P RNˆpk`1q . To update θ̂k and Jmin,k , we use
pHT H q´1 HT h hT K
» fi
k`1 Pk x
θ̂k ´ k k hT Pk Kk`1h
k`1 k k`1
θ̂k`1 “ – hT PK x
fl
k`1 k
hT K
k`1 Pk hk`1
T ´1 T T
where PK
k “ I ´ Hk pHk Hk q Hk . To avoid inverting Hk Hk , we let
Dk “ pHT
k Hk q
´1
T
where PK
k “ I ´ Hk Dk Hk . The minimum LS error is updated as
phT K 2
k`1 Pk xq
Jmin,k`1 “ Jmin,k ´ T K
hk`1 Pk hk`1
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 1: Order-Recursive Least Squares
This recursive procedure determines the LSE for all lower-order models.
We make several observations:
2
where the coefficient rk`1 is defined as
K 2
2 xPK
k hk`1 , Pk xy
0 ď rk`1 “ 2
ď1
K
}Pk hk`1 }2 }PK
k x}2
If PK K
k x and Pk hk`1 are collinear, then the residual can be perfectly
modeled by hk`1 and rk`1 “ 1.
pI ´ Pk qhk`1 hTk`1 pI ´ Pk q
Pk`1 “ Pk `
hTk`1 pI ´ P k qhk`1
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 2: Sequential Least Squares
In earlier discussions, we are already given the full data. But in many
cases, the samples may come sequentially and it is not desirable to wait
for all the data before computing the LSE.
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 2: Sequential Least Squares
and let θ̂rns be the LSE of θ based on xrns. The LSE is given by
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 2: Sequential Least Squares
Estimator Update:
´ ¯
θ̂rns “ θ̂rn ´ 1s ` Krns xrns ´ hT rnsθ̂rn ´ 1s
where
Σrn ´ 1shrns
Krns “
σn2 ` hT rnsΣrn ´ 1shrns
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 2: Sequential Least Squares
Covariance Update:
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Example: Fourier Analysis
To initialize the algorithm, we first acquire two samples xr0s, xr1s and
compute
ˆ ˙´1
T I I
θ̂r1s “ H r1s 2 Hr1s HT r1s 2 xr1s “ pHT r1sHr1sq´1 HT r1sxr1s
σ σ
where
« ff
1 0 T
Hr1s “ , xr1s “ rxr0s, xr1ss
cosp2πf0 q sinp2πf0 q
Σr1s “ σ 2 pHT r1sHr1sq´1
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Example: Fourier Analysis
Σr1shr2s
Kr2s “ , hr2s “ rcosp4πf0 q, sinp4πf0 qs
σ2 ` hT r2sΣr1shr2s
with variance
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 3: Constrained Least Squares
So far, we have not imposed any restrictions on the values of θ and the
geometrical explanations apply. Here, we assume the parameter satisfies
the linear constraint
Aθ “ b
where A P Rr ˆp has full rank r ă p. Full rank means the constraints are
linearly independent.
Aθ̂ c “ b
λ “ ‰´1
“ ApHT Hq´1 AT pAθ̂ ´ bq
2
Eventually, we have
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
4
Example: Piecewise-polynomial fitting
4
S. Boyd, EE 103
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
4
Example: Piecewise-polynomial fitting
4
S. Boyd, EE 103
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
5
Example: Least norm problem
5
S. Boyd, EE 103
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
5
Example: Least norm problem
Force sequence:
5
S. Boyd, EE 103
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
5
Example: Least norm problem
5
S. Boyd, EE 103
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
5
Example: Least norm problem
5
S. Boyd, EE 103
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 4: Nonlinear Least Squares
where spθq is the signal model for x that depends on θ. In the linear LS
problem, we take the convenient model spθq “ Hθ which has a simple
closed-form solution. But in many cases, spθq
The are two methods that can reduce the complexity of the problem:
1. transformation of parameters
2. separability of parameters
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 4: Nonlinear Least Squares
θ̂ “ g´1 pα̂q
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 4: Nonlinear Least Squares
α1 “ A cos φ, α2 “ ´A sin φ
Then s “ Hα with H
» fi
1 0
cosp2πf0 q sinp2πf0 q
— ffi
— ffi
H“—
— .. .. ffi
ffi
– . . fl
cosp2πf0 pN ´ 1qq sinp2πf0 pN ´ 1qq
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 4: Nonlinear Least Squares
s “ Hpαqβ
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
Variant 4: Nonlinear Least Squares
srns “ A1 r n ` A2 r 2n ` A3 r 3n
over 0 ă r ă 1 where
» fi
1 1 1
r r2 r3
— ffi
— ffi
Hpr q “ —
— .. .. .. ffi
ffi
– . . . fl
r N´1 r 2pN´1q r 3pN´1q
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
6
Algorithm for NLS: Levenberg-Marquardt Algorithm
6
S. Boyd, EE 103, Stanford
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
6
Algorithm for NLS: Levenberg-Marquardt Algorithm
Main structure:
for a λpkq ą 0
• We impose the regularizer (second term) because we need the affine
approximation to hold
6
S. Boyd, EE 103, Stanford
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
6
Algorithm for NLS: Levenberg-Marquardt Algorithm
Adjusting λ:
1. idea:
• if λpkq is too big, x pk`1q is close to x pkq , the progress is slow
• if λpkq is too small, x pk`1q is far from x pkq , the linear approximation
is poor
2. practical update mechanism:
• If }f px pk`1q q}22 ă }f px pkq q}22 , accept the update and reduce
λ : λpk`1q “ 0.8λpkq
• otherwise, increase λ and do not update:
6
S. Boyd, EE 103, Stanford
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
6
Algorithm for NLS: Levenberg-Marquardt Algorithm
• The update:
f 1 px pkq q
x pk`1q “ x pkq ´ f px pkq q
λpkq ` pf 1 px pkq qq2
6
S. Boyd, EE 103, Stanford
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
7
Example: Location from range measurements
7
S. Boyd, EE 103, Stanford
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
7
Example: Location from range measurements
7
S. Boyd, EE 103, Stanford
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const
7
Example: Location from range measurements
7
S. Boyd, EE 103, Stanford
Basic Least Squares Variant 1: Least Squares with Unknown Model Order Variant 2: Least Squares with Incoming Data Variant 3: Least Squares with Const