Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Notes on Maximum Likelihood Estimator

Usual inference setup: I have a variable Y on a population of interest, with


distribution Y ∼ f (y; θ). The θ parameter (scalar or vector) is unknown.
In an effor to estimate it:

ˆ I collect a simple random sample from Y , which is Y = (Y1 , Y2 , . . . , YN ).

ˆ I consider the joint density of the whole sample, it is f (y1 , y2 , . . . , yN ; θ).

Once i have my observations, I can change the perspective and focus on


the unknown parameter

f (θ; y1 , y2 , . . . , yN )

This is called likelihood function, it strictly depends on the value of θ,


given that i have observed my sample. I have to look for the value of θ that
maximizes the probability of observing my sample, hence the term maxi-
mum likelihood.
The function can be expressed omitting the sample from the notation, fo-
cusing on the unknown parameter: L(θ).
Furthermore, given that I have a random sample in which all the compo-
nents are independent and identically distributed (i.i.d.) as Y , the joint
density can be factored as
N
Y
f (y1 , y2 , . . . , yN ; θ) = f (yi ; θ)
i=1

so that
N
Y
L(θ) = f (yi ; θ)
i=1

Since I have to find the maximum of the function, I need to compute the
first derivatives of L(θ) with respect to the unknown θ and set it to zero,
solving for θ. It is equivalent to maximize l(θ) = log L(θ), often for algebraic
simplification.

1
Example
Y ∼ N (µ, σ 2 ), both parameters unknown.
 2
yi −µ
1 − 12
f (yi ; µ, σ 2 ) = √ e σ

2πσ 2
The likelihood is
N 
yi −µ
2
2
Y 1 − 12
L(µ, σ ) = √ e σ

i=1 2πσ 2
The log-likelihood is
N  
yi −µ
2  N N √ N
2
X 1 − 21
X X  1X
l(µ, σ ) = log √ e σ
= log 1 − log 2π − log σ 2 +
2πσ 2 2
i=1 i=1 i=1 i=1
N  
y −µ 2
 
− 21 iσ
X
+ log e =
i=1
√ N
N 1 X
= −N log 2π − 2
log σ − 2 (yi − µ)2
2 2σ
i=1

Computing the derivative with respect to µ, with further simplifications and


solving for µ:
N
∂l(µ, σ 2 ) 1 X
=− 2 (yi − µ) (−1) = 0
∂µ σ
i=1
N
X
→ yi − N µ = 0
i=1
N
1 X
→ µ̂ = yi
N
i=1

Computing the derivative with respect to σ 2 , with further simplifications


and solving for σ 2 :
N
∂l(µ, σ 2 ) N 1X 1
2
=− 2 − (yi − µ)2 4 (−1) = 0
∂σ 2σ 2 σ
i=1
N
σ4 1 X
→− 2
+ (yi − µ)2 = 0
σ N
i=1
N
1 X
c2 =
→σ (yi − µ)2
N
i=1

It can be proved that (µ, σ 2 ) represent a global maximum for the likelihood.

2
Remarks
ˆ The maximum likelihood estimator is generally not unbiased (see that
the estimator for σ 2 is the biased one!).

ˆ It can be proved that the maximum likelihood estimator is generally


consistent.

ˆ It can be proved that the maximum likelihood estimator is asymptot-


ically Gaussian.

ˆ Invariance principle. For any one-to-one function g(.) the maximum


likelihood estimator for g(θ) is g(θ̂).

Exercise. Given the density function (Exponential random variable)

f (y; θ) = θe−θy y>0 θ>0

find the maximum likelihood estimator for θ and 1θ .

You might also like