Professional Documents
Culture Documents
Lecture 03 Maximum Likelihood Estimation
Lecture 03 Maximum Likelihood Estimation
Lecture 03 Maximum Likelihood Estimation
TECHNOLOGY ENGINEERING
AUTOMATION
2 /21
Outline
3 /21
Outline
4 /21
Maximum Likelihood Estimation
The Maximum Likelihood Estimation (MLE) method is an estimation procedure that, given
a probabilistic model, estimates its parameters in such a way that they are most
consistent with the observed data
Assume to have 6 i. i. d. observations 𝒟 = 𝑦 1 , 𝑦 2 , … , 𝑦 6 , where 𝑦 𝑖 ∼ 𝒩 𝜇, 𝜎 2
𝑦 1 𝑦 3 𝑦 4 𝑦 5 𝑦 6
5 /21
Maximum Likelihood Estimation
Defined the data vector 𝑌 = 𝑦 1 , 𝑦 2 , … , 𝑦 𝑁 ⊤. The joint pdf of the data vector 𝑌 is
𝑁 𝑁
𝑓𝑌 𝑦 1 , 𝑦 2 , … , 𝑦 𝑁 |𝜇, 𝜎 2 = ෑ 𝑓𝑦 𝑦 𝑖 𝜇, 𝜎 2 = ෑ 𝒩 𝑦 𝑖 𝜇, 𝜎 2
𝑖=1 𝑖=1
The value assumed by the joint pdf 𝑓𝑌 𝑌|𝜇, 𝜎 2 , with known 𝜇 and 𝜎 2 , evaluated using the
data 𝒟, is the product of the blue dots in the previous example, where we had 𝑁 = 6
observations
6 /21
Maximum Likelihood Estimation
If function of the data 𝑌, the joint pdf is a multivariable distribution. But we know the
value of 𝑌, since we observed those data
If we also knew 𝜇 and 𝜎, we could compute the probability of having observed 𝑌. But we
do not know 𝜇 and 𝜎! That's exactly what we want to estimate!
When 𝑓𝑌 (𝑌|𝜇, 𝜎 2 ) (the joint pdf) is seen as function of the parameters 𝜇 and 𝜎, it is
called likelihood ℒ 𝜇, 𝜎 2 𝑌
7 /21
Maximum Likelihood Estimation
Summary
Not known
variables KNOWN parameters
8 /21
Maximum Likelihood Estimation
The MLE is that value of the parameters vector 𝜽 who maximizes the likelihood ℒ 𝜽 𝑌
9 /21
QUIZ!
In this example, the maximum likelihood estimate is:
𝑓𝑦 𝑦|𝜇 = 1, 1 𝑓𝑦 𝑦|𝜇 = 2, 1
❑ 𝜇Ƹ = 2𝑦ത
❑ 𝜇Ƹ = 𝑦ത
❑ 𝜇Ƹ = 2
ℒ 𝜇 = 2|𝑦 1 = 𝑦ത
ℒ 𝜇 = 1|𝑦 1 = 𝑦ത
𝑦
𝑦ത
10 /21
Maximum Likelihood Estimation
The maximum likelihood estimate of the previous example can be expressed as:
𝑁
𝜇Ƹ
𝜽ML = 2 = arg max ℒ 𝜽|𝑌 = arg max ෑ 𝒩 𝑦 𝑖 𝜇, 𝜎 2
2×1 𝜎ො 𝜽 𝜽
𝑖=1
11 /21
Maximum Likelihood Estimation
Often, instead of maximizing ℒ 𝜽|𝑌 , we maximize its natural logarithm
• Since the logarithm is an increasing monotone function, ln ℒ 𝜽|𝑌 has the same
maximum of ℒ 𝜽|𝑌
Unless special lucky cases, the optimization is carried out with iterative methods
12 /21
Outline
13 /21
Maximum Likelihood Estimate: properties
The maximum likelihood estimator has good properties. In fact, it is:
1. Asymptotically correct : ML = 𝜽0
lim 𝔼 𝜽
𝑁→+∞
The estimator can be biased. For example, the maximum likelihood estimator of the variance
of a Guassian population is biased
14 /21
Outline
15 /21
MLE of the mean of a Gaussian distribution
Let us consider the case in which we want to estimate the mean 𝜇 of a population of
i.i.d. Gaussian random variables, assuming we know the variance of the distributions
2
1 1 𝑦 𝑖 −𝜇 1 1
𝑓𝑦 𝑦 𝑖 |𝜇, 𝜎 2 = exp − = exp − 𝑦 𝑖 − 𝜇 2
2𝜋𝜎 2 2 𝜎 2𝜋 2
16 /21
MLE of the mean of a Gaussian distribution
The value assumed by the pdf in correspondence of the two observations is:
𝑦 1 =4 𝑦 2 =6
1 1 1 1
𝑓𝑦 𝑦 1 = 4 𝜇, 𝜎 2 =1 = exp − 4 − 𝜇 2
𝑓𝑦 𝑦 2 = 6 𝜇, 𝜎 2 =1 = exp − 6 − 𝜇 2
2𝜋 2 2𝜋 2
The joint distribution is the product of the two single pdfs (since the data are i.i.d.)
1 1 1 1
𝑝 𝑦 1 ,𝑦 2 |𝜇, 𝜎 2 =1 = exp − 4 − 𝜇 2 ⋅ exp − 6 − 𝜇 2
2𝜋 2 2𝜋 2
17 /21
MLE of the mean of a Gaussian distribution
The joint pdf is only a function of 𝜇, since the value of the data is known. With this
interpretation, the joint pdf is the likelihood function
1 1 2
1 1 2
ℒ 𝜇 𝑦 1 = 4, 𝑦 2 = 6 = exp − 4 − 𝜇 ⋅ exp − 6 − 𝜇
2𝜋 2 2𝜋 2
18 /21
MLE of the mean of a Gaussian distribution
It is more convenient to maximize the log of the likelihood. This new function (the log-
likelihood) has the same maximum of the likelihood
1 1 2
1 1 2
ln ℒ = ln exp − 4 − 𝜇 ⋅ exp − 6 − 𝜇
2𝜋 2 2𝜋 2
1 1 2
1 1 2
= ln exp − 4 − 𝜇 + ln exp − 6 − 𝜇
2𝜋 2 2𝜋 2
1 1 2
1 1 2
= ln + ln exp − 4 − 𝜇 + ln + ln exp − 6 − 𝜇
2𝜋 2 2𝜋 2
11 2
1 2
= 2 ⋅ ln − 4−𝜇 − 6−𝜇
2𝜋 2 2
19 /21
MLE of the mean of a Gaussian distribution
By maximizing the expression obtained with respect to 𝜇 we get:
𝜕lnℒ 4+6
=0⇒ 4−𝜇 + 6−𝜇 =0⇒ 𝜇Ƹ ML = = 5
𝜕𝜇 2
The maximum likelihood estimate of the parameter 𝜇 for the Gaussian model is equal to
the estimate obtained using the sample mean estimator!
This result, although not generalizable, makes the maximum likelihood estimator very
interpretable and intuitive
20 /21
MLE of the mean of a Gaussian distribution
Observation: maximizing the «log-likelihood» is equivalent to minimizing the «minus log-
likelihood»
21 /21