Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

p
0.4

0.3

0.2

0.1

0.0
0 1 2 3 4 5 6 7 8 m
L
0.06

0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
This sequence introduces the principle of maximum likelihood estimation and illustrates it
with some simple examples.

1
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4

0.3

0.2

0.1

0.0
0 1 2 3 4 5 6 7 8 m
L
0.06

0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
Suppose that you have a normally-distributed random variable X with unknown population
mean m and standard deviation s, and that you have a sample of two observations, 4 and 6.
For the time being, we will assume that s is equal to 1.
2
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4
0.3521
0.3

0.2
m p(4) p(6)
0.1
0.0175 3.5 0.3521 0.0175
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06

0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
Suppose initially you consider the hypothesis m = 3.5. Under this hypothesis the probability
density at 4 would be 0.3521 and that at 6 would be 0.0175.

3
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4
0.3521
0.3

0.2
m p(4) p(6) L
0.1
0.0175 3.5 0.3521 0.0175 0.0062
0.0
0 1 2 3 4 5 6 7 8 m
L
0.06

0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
The joint probability density, shown in the bottom chart, is the product of these, 0.0062.

4
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4 0.3989

0.3

0.2
m p(4) p(6) L
0.1
0.0540 3.5 0.3521 0.0175 0.0062
0.0 4.0 0.3989 0.0540 0.0215
0 1 2 3 4 5 6 7 8 m
L
0.06

0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
Next consider the hypothesis m = 4.0. Under this hypothesis the probability densities
associated with the two observations are 0.3989 and 0.0540, and the joint probability
density is 0.0215.
5
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4
0.3521
0.3

0.2
0.1295 m p(4) p(6) L
0.1
3.5 0.3521 0.0175 0.0062
0.0 4.0 0.3989 0.0540 0.0215
0 1 2 3 4 5 6 7 8 m
4.5 0.3521 0.1295 0.0456
L
0.06

0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
Under the hypothesis m = 4.5, the probability densities are 0.3521 and 0.1295, and the joint
probability density is 0.0456.

6
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4

0.3
0.2420 0.2420
0.2
m p(4) p(6) L
0.1
3.5 0.3521 0.0175 0.0062
0.0 4.0 0.3989 0.0540 0.0215
0 1 2 3 4 5 6 7 8 m
4.5 0.3521 0.1295 0.0456
L
0.06 5.0 0.2420 0.2420 0.0585

0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
Under the hypothesis m = 5.0, the probability densities are both 0.2420 and the joint
probability density is 0.0585.

7
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4
0.3521
0.3

0.2
0.1295 m p(4) p(6) L
0.1
3.5 0.3521 0.0175 0.0062
0.0 4.0 0.3989 0.0540 0.0215
0 1 2 3 4 5 6 7 8 m
4.5 0.3521 0.1295 0.0456
L
0.06 5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
Under the hypothesis m = 5.5, the probability densities are 0.1295 and 0.3521 and the joint
probability density is 0.0456.

8
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4
0.3521
0.3

0.2
0.1295 m p(4) p(6) L
0.1
3.5 0.3521 0.0175 0.0062
0.0 4.0 0.3989 0.0540 0.0215
0 1 2 3 4 5 6 7 8 m
4.5 0.3521 0.1295 0.0456
L
0.06 5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
The complete joint density function for all values of m has now been plotted in the lower
diagram. We see that it peaks at m = 5.

9
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  
2 s 
f (X ) = e
s 2

Now we will look at the mathematics of the example. If X is normally distributed with mean
m and standard deviation s, its density function is as shown.

10
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  
2 s 
f (X ) = e
s 2
1 2
1 − ( X −m )
f (X ) = e 2
2

For the time being, we are assuming s is equal to 1, so the density function simplifies to the
second expression.

11
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  
2 s 
f (X ) = e
s 2
1 2
1 − ( X −m )
f (X ) = e 2
2

1 2 1 2
1 − ( 4− m ) 1 − ( 6− m )
f ( 4) = e 2
f ( 6) = e 2
2 2

Hence we obtain the probability densities for the observations where X = 4 and X = 6.

12
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  
2 s 
f (X ) = e
s 2
1 2
1 − ( X −m )
f (X ) = e 2
2

1 2 1 2
1 − ( 4− m ) 1 − ( 6− m )
f ( 4) = e 2
f ( 6) = e 2
2 2

 1 − 1 ( 4− m ) 2  1 − 1 ( 6− m ) 2 
joint density =  e 2  e 2 
 2  2 
  

The joint probability density for the two observations in the sample is just the product of
their individual densities.

13
• Maximum likelihood estimation begins with writing a mathematical
expression known as the Likelihood Function of the sample data.
– Loosely speaking, the likelihood of a set of data is the probability of
obtaining that particular set of data, given the chosen probability
distribution model.

• This expression contains the unknown model parameters.

• The values of these parameters that maximize the sample likelihood


are known as the Maximum Likelihood Estimates or MLE's
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  
2 s 
f (X ) = e
s 2
1 2
1 − ( X −m )
f (X ) = e 2
2

1 2 1 2
1 − ( 4− m ) 1 − ( 6− m )
f ( 4) = e 2
f ( 6) = e 2
2 2

 1 − 1 ( 4− m ) 2  1 − 1 ( 6− m ) 2 
joint density =  e 2  e 2 
 2  2 
  

In maximum likelihood estimation we choose as our estimate of m the value that gives us the
greatest joint density for the observations in our sample. This value is associated with the
greatest probability, or maximum likelihood, of obtaining the observations in the sample.
14
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.4
0.3521
0.3

0.2
0.1295 m p(4) p(6) L
0.1
3.5 0.3521 0.0175 0.0062
0.0 4.0 0.3989 0.0540 0.0215
0 1 2 3 4 5 6 7 8 m
4.5 0.3521 0.1295 0.0456
L
0.06 5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.04

0.02

0.00
0 1 2 3 4 5 6 7 8 m
In the graphical treatment we saw that this occurs when m is equal to 5. We will prove this
must be the case mathematically.

15
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  

To do this, we treat the sample values X = 4 and X = 6 as given and we use the calculus to
determine the value of m that maximizes the expression.

16
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  

When it is regarded in this way, the expression is called the likelihood function for m, given
the sample observations 4 and 6. This is the meaning of L(m | 4,6).

17
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  

To maximize the expression, we could differentiate with respect to m and set the result equal
to 0. This would be a little laborious. Fortunately, we can simplify the problem with a trick.

18
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  
 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2  
log L = log  e 2  e 2 
 2  2



 1 − 1 ( 4− m ) 2   1 − 1 ( 6− m ) 2 
= log e 2  + log e 2 
 2   2 
   
 1   −
1
( − m )
2
  1   −
1
( − m )
2

= log 
 + log e 2
4
 + log 
 + log e 2
6

 2    2  
   
 1  1 1
= 2 log  − (4 − m ) − (6 − m )
2 2

 2  2 2
log L is a monotonically increasing function of L (meaning that log L increases if L
increases and decreases if L decreases).

19
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  
 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2  
log L = log  e 2  e 2 
 2  2



 1 − 1 ( 4− m ) 2   1 − 1 ( 6− m ) 2 
= log e 2  + log e 2 
 2   2 
   
 1   −
1
( − m )
2
  1   −
1
( − m )
2

= log 
 + log e 2
4
 + log 
 + log e 2
6

 2    2  
   
 1  1 1
= 2 log  − (4 − m ) − (6 − m )
2 2

 2  2 2
It follows that the value of m which maximizes log L is the same as the one that maximizes L.
As it so happens, it is easier to maximize log L with respect to m than it is to maximize L.

20
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  
 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2  
log L = log  e 2  e 2 
 2  2



 1 − 1 ( 4− m ) 2   1 − 1 ( 6− m ) 2 
= log e 2  + log e 2 
 2   2 
   
 1   −
1
( − m )
2
  1   −
1
( − m )
2

= log 
 + log e 2
4
 + log 
 + log e 2
6

 2    2  
   
 1  1 1
= 2 log  − (4 − m ) − (6 − m )
2 2

 2  2 2
The logarithm of the product of the density functions can be decomposed as the sum of
their logarithms.

21
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  
 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2  
log L = log  e 2  e 2 
 2  2



 1 − 1 ( 4− m ) 2   1 − 1 ( 6− m ) 2 
= log e 2  + log e 2 
 2   2 
   
 1   −
1
( − m )
2
  1   −
1
( − m )
2

= log 
 + log e 2
4
 + log 
 + log e 2
6

 2    2  
   
 1  1 1
= 2 log  − (4 − m ) − (6 − m )
2 2

 2  2 2
Using the product rule a second time, we can decompose each term as shown.

22
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2 
L( m | 4,6) =  e 2  e 2 
 2  2 
  
 1 − 1 ( 4 − m ) 2  1 − 1 ( 6 − m ) 2  
log L = log  e 2  e 2 
 2  2



 1 − 1 ( 4− m ) 2   1 − 1 ( 6− m ) 2 
= log e 2  + log e 2 
 2   2 
   
 1   −
1
( − m )
2
  1   −
1
( − m )
2

= log 
 + log e 2
4
 + log 
 + log e 2
6

 2    2  
   
 1  1 1
= 2 log  − (4 − m ) − (6 − m )
2 2

 2  2 2
We will now choose m so as to maximize this expression.

26
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1  1 1
log L = 2 log  − (4 − m )2
− (6 − m )2

 2  2 2

− (a − m ) = − (a − 2am + m ) = − a + am − m
1 2 1 2 2 1 2 1 2
2 2 2 2
d  1 2
 − (a − m ) =a−m
dm  2 
d log L
= (4 − m ) + (6 − m )
dm

d log L
= 0  mˆ = 5
dm
Thus from the first order condition we confirm that 5 is the value of m that maximizes the
log-likelihood function, and hence the likelihood function.

30
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1  1 1
log L = 2 log  − (4 − m )2
− (6 − m )2

 2  2 2

− (a − m ) = − (a − 2am + m ) = − a + am − m
1 2 1 2 2 1 2 1 2
2 2 2 2
d  1 2
 − (a − m ) =a−m
dm  2 
d log L
= (4 − m ) + (6 − m )
dm

d log L
= 0  mˆ = 5
dm
Note also that the second differential of log L with respect to m is -2. Since this is negative,
we have found a maximum, not a minimum.

32
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
1 2
1 − ( X i −m )
f (Xi ) = e 2
2

We will generalize this result to a sample of n observations X1,...,Xn.


The probability density for Xi is given by the first line.
33
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
1 2
1 − ( X i −m )
f (Xi ) = e 2
2
 1 − 1 ( X 1 − m )2   1 − 1 ( X n − m )2 
 e 2   ...   e 2 
 2   2 
   

The joint density function for a sample of n observations is the product of their individual
densities.

34
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
1 2
1 − ( X i −m )
f (Xi ) = e 2
2
 1 − 1 ( X 1 − m )2   1 − 1 ( X n − m )2 
L( m | X 1 ,..., X n ) =  e 2   ...   e 2 
 2   2 
   

Now treating the sample values as fixed, we can re-interpret the joint density function as the
likelihood function for m, given this sample.
We will find the value of m that maximizes it.
35
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
1 2
1 − ( X i −m )
f (Xi ) = e 2
2
 1 − 1 ( X 1 − m )2   1 − 1 ( X n − m )2 
L( m | X 1 ,..., X n ) =  e 2   ...   e 2 
 2   2 
   
 1 − 1 ( X 1 − m ) 2   1 − 1 ( X n − m )2  
log L = log  e 2   ...   e 2 
 2 

 2



 1 − 1 ( X 1 − m )2   1 − 1 ( X n − m )2 
= log e 2  + ... + log e 2 
 2   2 
   
 1  1 1
= n log  − ( X − m )2
− ... − ( X − m )2

 2  2
1 n
2

We will do this indirectly, as before, by maximizing log L with respect to m.


The logarithm decomposes as shown.
36
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1  1 1
log L = n log  − ( X − m )2
− ... − ( X − m )2

 2  2
1 n
2

d log L
= ( X 1 − m ) + ... + ( X n − m )
dm

We differentiate log L with respect to m.

37
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1  1 1
log L = n log  − ( X − m )2
− ... − ( X − m )2

 2  2
1 n
2

d log L
= ( X 1 − m ) + ... + ( X n − m )
dm

d log L
dm
=0  X i − nmˆ = 0

The first order condition is that the differential be equal to zero.

38
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1  1 1
log L = n log  − ( X − m )2
− ... − ( X − m )2

 2  2
1 n
2

d log L
= ( X 1 − m ) + ... + ( X n − m )
dm

d log L
dm
=0  X i − nmˆ = 0

1
 m̂ =
n
 Xi = X

Thus we have demonstrated that the maximum likelihood estimator of m is the sample
mean.
The second differential, -n, is negative, confirming that we have maximized log L.
39
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  i 
2 s 
f (Xi ) = e
s 2

So far we have assumed that s, the standard deviation of the distribution of X, is equal to 1.
We will now relax this assumption and find the maximum likelihood estimator of it.

40
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.8

0.6

0.4

0.2

0.0
0 1 2 3 4 5 6 7 8 9 m
L
0.06

0.04

0.02

0
0 1 2 3 4 s
We will illustrate the process graphically with the two-observation example, keeping m fixed
at 5. We will start with s equal to 2.
41
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.8

0.6

0.4
s p(4) p(6) L
0.2
2.0 0.1760 0.1760 0.0310
0.0
0 1 2 3 4 5 6 7 8 9 m
L
0.06

0.04

0.02

0
0 1 2 3 4 s
With s equal to 2, the probability density is 0.1760 for both X = 4 and X = 6, and the joint
density is 0.0310.
42
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.8

0.6

0.4
s p(4) p(6) L
0.2
2.0 0.1760 0.1760 0.0310
0.0 1.0 0.2420 0.2420 0.0586
0 1 2 3 4 5 6 7 8 9 m
L
0.06

0.04

0.02

0
0 1 2 3 4 s

Now try s equal to 1. The individual densities are 0.2420 and so the joint density, 0.0586,
has increased.
43
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.8

0.6

0.4
s p(4) p(6) L
0.2
2.0 0.1760 0.1760 0.0310
0.0 1.0 0.2420 0.2420 0.0586
0 1 2 3 4 5 6 7 8 9 m
0.5 0.1080 0.1080 0.0117
L
0.06

0.04

0.02

0
0 1 2 3 4 s
Now try putting s equal to 0.5. The individual densities have fallen and the joint density is
only 0.0117.
44
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
p
0.8

0.6

0.4
s p(4) p(6) L
0.2
2.0 0.1760 0.1760 0.0310
0.0 1.0 0.2420 0.2420 0.0586
0 1 2 3 4 5 6 7 8 9 m
0.5 0.1080 0.1080 0.0117
L
0.06

0.04

0.02

0
0 1 2 3 4 s
The joint density has now been plotted as a function of s in the lower diagram. You can see
that in this example it is greatest for s equal to 1.
45
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  i 
2 s 
f (Xi ) = e
s 2

We will now look at this mathematically, starting with the probability density function for X
given m and s.

46
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  i 
2 s 
f (Xi ) = e
s 2

 1 1 X − m  2 
−  1  −
1 X n − m  2 
 e 2 s  

 ...   1
e 2 s  
 

 s 2   s 2 
   

The joint density function for the sample of n observations is given by the second line.

47
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  i 
2 s 
f (Xi ) = e
s 2

 1 1 X − m  2 
−  1  −
1 X n − m  2 

L( m ,s | X 1 ,..., X n ) =  e 2 s  

 ...   1
e 2 s  
 

 s 2   s 2 
   

As before, we can re-interpret this function as the likelihood function for m and s, given the
sample of observations.

48
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1 X − m  2
1 −  i 
2 s 
f (Xi ) = e
s 2

 1 1 X − m  2 
−  1  −
1 X n − m  2 

L( m ,s | X 1 ,..., X n ) =  e 2 s  

 ...   1
e 2 s  
 

 s 2   s 2 
   

 1 1 X1 − m  2 
−   1 X n − m  2  
− 
log L = log   e 2 s  

 ...   1
e 2 s  


 s 2 

 s 2




We will find the values of m and s that maximize this function.


We will do this indirectly by maximizing log L.
49
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

 1 1 X1 − m  2 
−   1 X n − m  2  
− 
log L = log   e 2 s  

 ...   1
e 2 s  


 s 2 

 s 2




 1 1 X1 − m  2 
−   1 X n − m  2 
− 
= log  e 2 s  

+ ... + log  1
e 2 s  

 s 2   s 2 
   
 1  1  X1 − m  1 Xn − m 
2 2

= n log  −   − ... −  
 s 2  2  s  2 s 
1  1  1  1 1 2
= n log  + n log  + 2  − ( X 1 − m ) − ... − ( X n − m ) 
2

s   2  s  2 2 

We can decompose the logarithm as shown.


To maximize it, we will set the partial derivatives with respect to m and s equal to zero.
50
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2

log L 1  2 1 2  2 i


= − nlog s + − ( − m )2
n log X 1 2
= 2  − ( X − m ) − ... − ( X − m ) 
m s m  2 1 n
2 

=
1
( X 1 − m ) + ... + ( X n − m )
s 2

=
1
( X − nm )
s 2 i

When differentiating with respect to m, the first two terms disappear.


We have already seen how to differentiate the other terms.
51
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2

log L 1  2 1 2  2 i


= − nlog s + − ( − m )2
n log X 1 2
= 2  − ( X − m ) − ... − ( X − m ) 
m s m  2 1 n
2 

=
1
( X 1 − m ) + ... + ( X n − m )
s 2

=
1
( X − nm )
s 2 i

 log L
= 0  mˆ = X
m

Setting the first differential equal to 0, the maximum likelihood estimate of m is the sample
mean, as before.

52
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2
= − n log s + n log −  ( − m )2
 X
 2  2
i

Next, we take the partial differential of the log-likelihood function with respect to s.

53
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2
= − n log s + n log −  ( − m )2
 X
 2  2
i

 log L n
= − + s −3  ( X i − m )2
s s
 log L n
= 0  − + sˆ −3  ( X i − mˆ ) 2 = 0
s sˆ

Setting the first derivative of log L to zero gives us a condition that must be satisfied by the
maximum likelihood estimator.

56
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2
= − n log s + n log −  ( − m )2
 X
 2  2
i

 log L n
= − + s −3  ( X i − m )2
s s
 log L n
= 0  − + sˆ −3  ( X i − mˆ ) 2 = 0
s sˆ
 − nsˆ 2 +  ( X i − X )2 = 0

We have already demonstrated that the maximum likelihood estimator of m is the sample
mean.

57
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2
= − n log s + n log −  ( − m )2
 X
 2  2
i

 log L n
= − + s −3  ( X i − m )2
s s
 log L n
= 0  − + sˆ −3  ( X i − mˆ ) 2 = 0
s sˆ
 − nsˆ 2 +  ( X i − X )2 = 0
1
 sˆ =  ( X i − X ) 2
2

n
Hence the maximum likelihood estimator of the population variance is the mean square
deviation of X.

58
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2
= − n log s + n log −  ( − m )2
 X
 2  2
i

 log L n
= − + s −3  ( X i − m )2
s s
 log L n
= 0  − + sˆ −3  ( X i − mˆ ) 2 = 0
s sˆ
 − nsˆ 2 +  ( X i − X )2 = 0
1
 sˆ =  ( X i − X ) 2
2

n
Note that it is biased. The unbiased estimator is obtained by dividing by n – 1, not n.

59
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION

1  1  1  1 1 2
log L = n log  + n log  + 2 
− ( X − m )2
− ... − ( X − m ) 
s   2  s  2
1 n
2 
 1  s −2
= − n log s + n log −  ( − m )2
 X
 2  2
i

 log L n
= − + s −3  ( X i − m )2
s s
 log L n
= 0  − + sˆ −3  ( X i − mˆ ) 2 = 0
s sˆ
 − nsˆ 2 +  ( X i − X )2 = 0
1
 sˆ =  ( X i − X ) 2
2

n
However it can be shown that the maximum likelihood estimator is asymptotically efficient,
in the sense of having a smaller mean square error than the unbiased estimator in large
samples.
60
MLE-Problems
Question No. 1: Suppose X1, X2, …, Xn are i.i.d. random variables
with density function

f(x|σ) = exp (-|x| /σ) / 2σ

find the maximum likelihood estimate of σ.


Question No. 1: Suppose X1, X2, …, Xn are i.i.d. random variables with density
function
f(x|σ) = exp (-|x| /σ) / 2σ

find the maximum likelihood estimate of σ.


Question No. 2: Suppose that X is a discrete random variable with
P (X = 1) = θ and P (X = 2) = 1 - θ. Three independent observations
of X are made: x1 = 1, x2 = 2, x3 = 2.
– What is the likelihood function?
– What is the mle of θ?
2. Suppose that X is a discrete random variable with P (X = 1) = θ
and P (X = 2) = 1 - θ. Three independent observations of X are
made: x1 = 1, x2 = 2, x3 = 2.
– What is the likelihood function?
– What is the mle of θ?
Question No. 3: A sample of 3 observations, (x1 = 0.4, x2 = 0.7, x3
= 0.9) is collected from a continuous distribution with density f(x)
= θxθ−1 for 0 < x < 1.Estimate θ by the method of maximum
likelihood
3. A sample of 3 observations, (x1 = 0.4, x2 = 0.7, x3 = 0.9) is
collected from a continuous distribution with density f(x) = θxθ−1
for 0 < x < 1.Estimate θ by the method of maximum likelihood
THANK YOU

You might also like