Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

Statistics 512 Notes 18:

Multiparameter maximum likelihood estimation


We consider
1
, ,
n
X X K
iid with pdf
( ; ), f x
where
1
( , , )
p
K
is p-dimensional.
As before,
1
1
1
1
( ) ( ; , , )
( ) log ( ) log ( ; , , )
n
i p
i
n
i p
i
L f x
l L f x

K
K
The maximum likelihood estimate is

arg max ( ) arg max ( )


MLE
L l




We can find critical points of the likelihood function by
solving the vector equation
1
1
1
2
1
( , , ) 0
( , , ) 0
( , , ) 0
p
p
p
p
l
l
l

K
K
M
K
We need to then verify that the critical point is a global
maximum.
Example 1: Normal distribution
1
, ,
n
X X K
iid
2
( , ) N
2
2
1
1
( ) 1 1
( , , ; , ) exp
2
2
n
i
n
i
x
f x x

1


1

]
,

K
2
2
1
1
( , ) log log 2 ( )
2 2
n
i
i
n
l n X

The partials with respect to

and

are
2
1
3 2
1
1
( )
( )
n
i
i
n
i
i
l
X
l n
X

Setting the first partial equal to zero and solving for the
mle, we obtain

MLE
X
Setting the second partial equal to zero and substituting the
mle for

, we find that the mle for

is
2
1
1
( )
n
MLE i
i
X X
n

.
To verify that this critical point is a maximum, we need to
check the following second derivative conditions:
(1) The two second-order partial derivatives are negative:
2
2
,
0
MLE MLE
l

<

and
2
2
,
0
MLE MLE
l

<

(2) The Jacobian of the second-order partial derivatives is


positive,
2 2
2
2 2
2
,

0

MLE MLE
l l
l l






>


See additional sheet for verification of (1) and (2) for
normal distribution.
Example 2: Gamma distribution
1 /
1
, 0
( ) ( ; , )
0, elsewhere
x
x e x
f x

< <


'

[ ]
1
( , ) log ( ) log ( 1) log /
n
i i
i
l X X

The partial derivatives are


1
2
1
'( )
log log
( )
n
i
i
n
i
i
l
X
X l

1
+
1

]
1
+
1

Setting the second partial derivative equal to zero, we find


1

n
i
i
MLE
MLE
X
n


When this solution is substituted into the first partial
derivative, we obtain a nonlinear equation for the MLE of

:
1
1
'( )

log log log 0


( )
n
n
i
i
MLE i
i
X
n n n X
n

+ +

This equation cannot be solved in closed form. Newtons


method or another iterative method can be used.
digamma(x) = function in R that computes the derivative of
the log of the gamma function of x,
'( )
( )
x
x

.
uniroot(f,interval) = function in R that finds the
approximate zero of a function in the interval. There
should be only one zero and the lower and upper points of
the interval should have opposite signs.
alphahatfunc=function(alpha,xvec){
n=length(xvec);
eq=-n*digamma(alpha)-n*log(mean(xvec))+n*log(alpha)
+sum(log(xvec));
eq;
}
> alphahatfunc(.3779155,illinoisrainfall)
[1] 65.25308
> alphahatfunc(.5,illinoisrainfall)
[1] -45.27781
alpharoot=uniroot(alphahatfunc,interval=c(.377,.5),xvec=ill
inoisrainfall)
> alpharoot
$root
[1] 0.4407967
$f.root
[1] -0.004515694
$iter
[1] 4
$estim.prec
[1] 6.103516e-05
betahatmle=mean(illinoisrainfall)/.4407967
[1] 0.5090602

.4408

.5091
MLE
MLE

Consistency, asymptotic distribution and optimality of


MLE for multiparameter estimation
Theorem 6.4.1: Let
1
, ,
n
X X K
be iid with pdf
1
( ; ( , , ))
p
f x K
for . Assume the regularity
conditions (R6-R9) hold [similar to (R0)-(R5), assumptions
that the log likelihood is smooth]. Then
(a)

P
MLE

(b)
1

( ) (0, ( ))
D
n p
n N I


where
( ) I
is the Fisher information matrix of ,
1
( ) log ( ; ), , log ( ; )
p
I Cov f x f x

_




,
K
.
As in the univariate case, the Fisher information matrix can
be expressed in terms of the second derivative of the log
likelihood function under the regularity conditions:
2
log ( ; ), log ( ; ) log ( ; )
jk
j k j k
I Cov f X f X E f X



_ 1


1


1
, ]
Corollary 6.4.1: Let
1
, ,
n
X X K
be iid with pdf
1
( ; ( , , ))
p
f x K
for . Assume the regularity
conditions (R6-R9) hold. Then

MLE
is an asymptotically
efficient estimate in the sense that the covariance matrix of
any other consistent estimate is at least as large (in
particular, the variance for each component of is at least
as large).
Note on practical use of theorem:
It is also true that

( )( ) (0, identity matrix)
D
MLE n p
nI N
Thus,
1
0

( ) , ( )
0
MLE MLE
N I

_ _




, ,
M
, which can be used to form
approximate confidence intervals.
Example 1:
1
, ,
n
X X K
iid
2
( , ) N .
2
2 4
2
,
2
4 4 6
1 1
- ( )
( , )
1 1 1
- ( ) ( )
2
X
I E
X X





1

1
1
1

1
]
=
2
2
1
0
2
0

1
1
1
1
1
]
Thus,
2
1 2
2
0
( , )
0
2
I


1
1

1
1
]
Thus,
2
2
0
( , ) ,
0
2
MLE MLE
n
N
n


_
1

1
_

1



1
,

1
]
,
To form approximate confidence intervals in practice, we
can substitute the MLE estimates into the covariance
matrix:
2
2

0
( , ) ,

0
2
MLE
MLE MLE
MLE
n
N
n


_
1

1
_

1



1
,

1
]
,
Thus, an approximate 95% confidence interval for

is

1.96
MLE
n

t
and an approximate 95% confidence interval
for

is

1.96
2
MLE
n

t
.
Example 2:
Gamma distribution:
( )
( )
( )
( )
2
2
,
2 3
2
2
2
''( ) ( ) '( )
1
-
( )
( , )
1 2
-
''( ) ( ) '( )
1

( )
1

I E
X


1
+
1
1

1
1

1
]
1

1
1
1
1
1
]
For the Illinois rainfall data,

.4408

.5091
MLE
MLE

Thus,
( )
( )
2
2
2
''(.4408) (.4408) '(.4408)
1

.5091
(.4408)

( , )
1 .4408

.5091 .5091
6.133 1.964
1.964 1.701
MLE MLE
I
1

1
1

1
1
1
]
1
1
]
infmat=matrix(c(6.133,1.964,1.964,1.704),ncol=2)
> invinfmat=solve(infmat)
> invinfmat
[,1] [,2]
[1,] 0.2584428 -0.2978765
[2,] -0.2978765 0.9301816
Thus,
0.259 -0.298

0
227 227

( , ) ,
0 0.298 0.259

227 227
MLE MLE
N
_
1

1
_
1


, 1

1
]
,
Thus, approximate 95% confidence intervals for

and

are
0.259
: 0.441 1.96 (0.375, 0.507)
227
0.930
: 0.509 1.96 (0.384, 0.634)
227

t
t
Note: We can also use observed Fisher information to form
confidence intervals based on maximum likelihood
estimates where in place of the information matrix, we use
the observed information matrix O where
2
1

log ( )
MLE
n
i
ij
i
i j
f X
O

We could also use the parametric bootstrap to form


confidence intervals based on maximum likelihood
estimates where we resample from

( ; )
MLE
f x

You might also like