Estimadores Extremos: Máxima Verossimilhança (MLE)

Examples Conditional MLE: Introduction Identication Asymptotic Normality Hypothesis Testing
Estimadores Extremos
Mxima Verossimilhana (MLE)
Cristine Campos de Xavier Pinto
CEDEPLAR/UFMG
Maio/2010
Cristine Campos de Xavier Pinto Institute
Instead of using conditional mean and variance assumptions,
we are going to use a full distributional assumption.
We assume that we have an i.i.d sample x
i
, y
i
N
i =1
, where
x
i
R
K
and y
i
R
G
, and we are interested in estimating a
model for the conditional distribution of Y
i
given X
i
.
Assumption: The density of y
i
given x
i
is known up to a
nite number of unknown parameters.
We impose a parametric model for the conditional density.
The vector y
i
can be continuous or discrete, or it can have
both continuous and discrete characteristics.
Example 1: Suppose we have a latent variable y
+
i
that follows
the linear model
y
+
i
= x
i
+
i
where
i
is independent of x
i
.
x
i
is a 1xK vector with the rst element equals to the unity.
is a Kx1 vector of parameters.
i
~ A (0, 1)
Instead of observing y
+
i
, we observe only a binary variable that
equals the sign of y
+
i
y
i
=
_
1 if y
+
i
> 0
0 if y
+
i
_ 0
Using the assumptions above, we need to obtain the
distribution of y
i
given x
i
Pr [ y
i
= 1[ x
i
] = Pr [ y
+
i
> 0[ x
i
]
= Pr [ x
i
+
i
> 0[ x
i
]
= Pr [
i
> x
i
[ x
i
]
= 1 (x
i
) = (x
i
)
and
Pr [ y
i
= 0[ x
i
] = 1 (x
i
)
Using the information above, the density of y
i
given x
i
is
f (y
i
[ x
i
) = [(x
i
)]
y
[1 (x
i
)]
1y
, y = 0, 1
Given the support conditions, f (y
i
[ x
i
) is zero if y / 0, 1
Example 2: Let y
i
N
i =1
be independent with common
distribution dened by
f (y
i
) =
_
A
_
1
,
2
1
_
with probability
A
_
2
,
2
2
_
with probability 1
In this case, we are doing unconditional MLE, and we are
interested in estimating =
_
1
,
2
1
,
2
,
2
2
_
In this case the density of y
i
is
f (y
i
) =

_
2
1
exp
_
(y
i

1
)
2
2
2
1
_
+
1
_
2
2
exp
_
(y
i

2
)
2
2
2
2
_
p
0
(Y[ X): true conditional density of Y
i
given X
i
= x.
A R
K
: all possible values for X
i
: all possible values for Y
i
A and : are the supports of the random vectors X
i
and Y
i
For all x A, we assume that the p
0
(.[ X) is a density with
respect to a nite measure, denoted by v (dy) .
We can choose v (dy) in such a way that Y
i
can be discrete,
continuous or a mixture.
In MLE, we minimize the distance from the conditional
density of Y from the true density.
Conditional Kullback-Leibler Information Inequality: For
any nonnegative function f (.[ X) such that
_
f (y[ x) v (dy) = 1 for all x A

the Kullback-Leibler information inequality is
/(f ; x) =
_
log
_
p
0
(y[ x)
f (y[ x)
_
p
0
(y[ x) v (dy) _ 0, for all x A
Note that this integral is equal to zero for f = p
0
.
For each x, /(f ; x) is minimized at f = p
0
.
Lets apply this inequality to a parametric model for p
0
(y[ x) .
A parametric model for p
0
(y[ x) can be dened as
_
f (.[ x; ) , , R
P
_
which
_
f (y[ x; ) v (dy) = 1 for each x A and each

This parametric model is correct specied model of the
conditional density p
0
(.[ .) if for some
0
,
f (.[ x;
0
) = p
0
(.[ x) for all x A
Notice that for each x A, we can write /(f ; x) as
E[ log [p
0
(y
i
[ X
i
)][ X
i
= x] E[ log [f (Y
i
[ X
i
; )][ X
i
= x]
and if the parametric model is corrected specied, we have
E[ log [f (y
i
[ X
i
;
0
)][ X
i
= x] _ E[ log [f (Y
i
[ X
i
; )][ X
i
= x]
Conditional log-likelihood for observation i:
E[ l
i
(
0
)[ X
i
= x] _ E[ l
i
()[ X
i
= x]
where
l
i
() = l (y
i
, X
i
, ) = log [f (y
i
[ X
i
; )]
Taking the expectation of the expression above, we can see
that
0
solves
max
E[log [f (y[ X; )]]

. .
Q
0
()
Using the sample analog, the CMLE estimator

maximizes
Q
N
() =
1
N
N
i =1
log [f (y
i
[ X
i
; )]
0
is identied if for any ,=
0
, implies that
f (y[ X; ) ,= f (y[ X;
0
) .
Information Inequality: If
0
is identied and
E[[log [f (y[ X; )][] < for all then
Q
0
() = E[log [f (y[ X; )]] has a unique maximum at
0
.
Consistency: Let (x
i
, y
i
) : i = 1, 2, ... be a random sample
with x
i
A and y
i
. Let = R
p
be the parameter set
and denote the parametric model of the conditional density
f (.[ x, ) : x A, . Assume that:
(i ) f (.[ x, ) is the true density with respect to the
measure v (dy) for all x and .
(ii ) For some
0
, p
0
(.[ x) = f (.[ x,
0
) for all
x A, and if ,=
0
, then
f (Y[ X; ) ,= f (Y[ X;
0
)
(iii ) is compact
(iv) log f (Y[ X; ) is continuos at each with
probability one
(v) E[sup
[log f (Y[ X; )[] <

then

0
Example: Back to our rst example.
In this case, log-likelihood function for observation i is
l
i
() = y
i
log (x
i
) + (1 y
i
) log [1 (x
i
)]
solves the following maximization problem

max
1
N
N
i =1
y
i
log (x
i
) + (1 y
i
) log [1 (x
i
)]
Note that this function is continuous in .
MLE only works with the density is corrected specied.
If the latent model is not linear or if is not independent of x
i
and normally distributed, the density of y
i
given x
i
is incorrect.
To get the asymptotic linear representation of the MLE, we
need to assume that
0
is in the interior of and that l
i
() is
twice continuously dierentiable on the interior of .
The score of the log-likelihood for each observation is
s
i
() = \
l
i
()
/
=
_
l
i
()
1
,
l
i
()
2
, ,
l
i
()
p
_
/
Under some regularity conditions, we can show that
E[s
i
(
0
)] = 0
Lets try to show this.
Using the denition of expectation
E[ s
i
(
0
)[ x
i
] =
_
s (y, x
i
, ) f (y[ x; ) v (dy)
If integration and dierentiation can be interchanged on
int()
\
_
_
f (y[ x; ) v (dy)
_
=
_
f (y[ x; ) v (dy)
for all x
i
A, int () .
Since
_
f (y[ x; ) v (dy) = 1 for all ,

\
_
_
f (y[ x; ) v (dy)
_
= 0, and
_
f (y[ x; ) v (dy) = 0
Notice that
\
f (y[ x; ) = \
log f (y[ x; ) f (y[ x; )

and
_
log f (y[ x; ) f (y[ x; ) v (dy) = 0

evaluating this expression at
0
and transposing this
expression, we have
_
s (y, x
i
, ) f (y[ x; ) v (dy) = 0
Example: Lets get the score for the rst example
\
l
i
() = \
(y
i
log (x
i
) + (1 y
i
) log [1 (x
i
)])
In this case,
\
l
i
() = y
i
\
log (x
i
) + (1 y
i
)\
log [1 (x
i
)]
Notice that
\
log (x
i
) =
(x
i
) x
/
i
(x
i
)
, \
log [1 (x
i
)] =
(x
i
) x
/
i
1 (x
i
)
At the end,
\
l
i
() =
(x
i
) x
/
i
(y
i
(x
i
))
(x
i
) (1 (x
i
))
Lets show that E[\
l
i
(
0
)] = 0.
Lets dene u
i
= y
i
(x
i
0
) = y
i
E[ y
i
[ x
i
]
s
i
(
0
) =
(x
i
0
) x
/
i
u
i
(x
i
0
) (1 (x
i
0
))
Notice that E[ u
i
[ x
i
] = 0, so
E[ s
i
(
0
)[ x
i
] =
(x
i
0
) x
/
i
E[ u
i
[ x
i
]
(x
i
0
) (1 (x
i
0
))
= 0
with implies that
E[s
i
(
0
)] = 0
The Hessian for each observation i is a PxP matrix of second
partial derivatives of l
i
() ,
H
i
() = \
s
i
() = \
2
/ l
i
()
Lets try to get the asymptotic linear representation of the
MLE.
First, we need to do the expectation of the rst order
condition
1
_
N
N
i =1
\
l
i
_
_
=
1
_
N
N
i =1
\
l
i
(
0
)
+
_
1
N
N
i =1
\
2
/ l
i
_
_
_
_
N
_

0
_
Under some conditions,
_
1
N
N
i =1
\
2
/ l
i
_
_
_

p
E[H
i
(
0
)] = H
0
We can write
_
N
_

0
_
=
H
1
0
_
N
N
i =1
s
i
(
0
) +o
p
(1)
We know that E[s
i
(
0
)] = 0 and
Var [s
i
(
0
)] = E
_
s
i
(
0
) s
i
(
0
)
/
= I (
0
) <
I
0
= I (
0
): Information matrix
Under standard regularity conditions,
_
N
_

0
_

d
A
_
0, H
1
0
I
0
H
1
0
_
Note that
\
_
_
s
i
(
0
) f (y[ x
i
;
0
) v (dy)
_
= 0
Assuming that dierentiation under the integral is allowed,
\
_
_
s
i
() f (y[ x
i
; ) v (dy)
_
=
_
(s
i
() f (y[ x
i
; )) v (dy)
=
_
s
i
() f (y[ x
i
; ) v (dy) +
_
s
i
() \
f (y[ x
i
; )
/
v (dy)
=
_
s
i
() f (y[ x
i
; ) v (dy)
+
_
s
i
() \
log f (y[ x; )
/
f (y[ x; ) v (dy)
=
_
s
i
() f (y[ x
i
; ) v (dy) +
_
s
i
() s
i
()
/
f (y[ x; ) v (dy)
At the end,
E[ \
s
i
()[ x
i
] = E
_
s
i
() s
i
()
/
x
i
_
Conditional Information Inequality:
E[ H
i
(
0
)[ x
i
] = E
_
s
i
(
0
) s
i
(
0
)
/
x
i
_
Using the Law of iterated expectation, we have the
Information Inequality
E[H
i
(
0
)] = E
_
s
i
(
0
) s
i
(
0
)
/
_
In other words,
H
0
= I (
0
)
Asymptotic Normality MLE: Suppose we have a random
sample (X
i
, Y
i
)
N
i =1
and the hypothesis in the consistency
theorem are satised. If
(a)
0
interior ()
(b) f (y[ x; ) is twice continuously dierentiable
and f (y[ x; ) > 0 in a neighborhood Aof
0
(c)
_
sup
A
|\
f (y[ x; )| v (dy) < ,

_
sup
A
_
_
_\
2
/ f (y[ x; )
_
_
_ v (dy) <
(d) I = E
_
\
log f (y[ x; ) (\
log f (y[ x; ))
/
_
exists and is nonsingular
(e) E
_
sup
A
_
_
_\
2
/ log f (y[ x; )
_
_
_
_
<
Then,
_
N
_

0
_

d
N
_
0, I (
0
)
1
_
To estimate the asymptotic variance, we need to estimate I
0
There are several ways to estimate this matrix.
Using the sample analog of the moments
I
1
=
1
N
N
i =1
s
_
y
i
, x
i
,
_
s
_
y
i
, x
i
,
_/
I
2
=
1
N
N
i =1
\
2
/ log f
_
y
i
[ x
i
;
_
Another possible estimator is the sample average of the
conditional information matrix. Let
I (x; ) = E
_
s
i
() s
i
()
/
x
_
,using the law of iterated
expectation and the sample analog
I
3
=
1
N
N
i =1
_
x;
_
The regularity conditions for consistency of each of these
estimators are weak, and in general they will be consistent
when the likelihood function is twice dierentiable.
There are some properties of these estimators that help to
decide which one to use.
I
1
is easier to compute than

I
2
that is easier to compute than
I
3
.
I
1
is always positive denite, but it can behave poorly in nite
samples.
I
2
is not guaranteed to be positive denite.
I
3
is positive denite if it exists and has better small sample
properties than

I
1
.
All these estimator are not consistent if the conditional
density of y on x is misspecied. In this case, we need to use
the general extremum estimator formula.
Example:
\
l
i
() =
(x
i
) x
/
i
(y
i
(x
i
))
(x
i
) (1 (x
i
))
The MLE is the solution of the system of equations
1
N
N
i =1
_
x
i
_
x
/
i
_
y
i

_
x
i
__
_
x
i
_
1
_
x
i
__
= 0
For each observation i, the second derivative is
(
x
i
)
2
x
/
i
x
i
_
x
i
_
1
_
x
i
__

_
x
i
_
x
/
i
_
y
i

_
x
i
__ _
_
x
i
_ _
1 2
_
x
i
___
_
x
i
_
2
_
1
_
x
i
__
2
This expression is very long, however we take the conditional
expectation evaluated at
0
E[ H
i
(
0
)[ x
i
] =

(
x
i
)
2
x
/
i
x
i
_
x
i
_
1
_
x
i
__
and the asymptotic variance of MLE in this example is
1
N
N
i =1
(
x
i
)
2
x
/
i
x
i
_
x
i
_
1
_
x
i
__
which is always positive denite when the inverse exists.
We can use Wald, LM or QLR tests in this case.
In the MLE set-up, if the information inequality holds, these
tests have the same limiting distribution.
We will go back to the properties of this test and ecient of
MLE when we talk about GMM.
However, since MLE is based on distributional assumptions, it
is important to have a specication test that can be used in
this context.
One way to think about these specication tests is to test
moment conditions implied by the conditional density
specication. Let z
i
= (x
i
, y
i
), and suppose that when
f (.[ x; ) is correctly specied,
H
0
: E[g (w
i
,
0
)] = 0
where g (w
i
,
0
) is a Qx1 vector.
Note that g (w
i
,
0
) cannot contain elements of the score.
One test is based on how far the sample average of g
_
w
i
,
_
is from zero.
The t-statistics will be based on the equality
1
_
N
N
i =1
g
_
w
i
,
_
=
1
_
N
N
i =1
_
g
_
w
i
,
_
s
i
_
_

0
_
where

N
i =1
s
i
_
_
= 0 and
0
=
_
E
_
s
i
(
0
) s
i
(
0
)
/
__
1
_
E
_
s
i
(
0
) g
i
(
0
)
/
__
0
is a PxQ matrix of population coecients from a
regression of g
i
(
0
)
/
on s
i
(
0
)
/
.
Doing a mean value expansion around
0
1
_
N
N
i =1
_
g
_
w
i
,
_
s
i
_
_

0
_
=
1
_
N
N
i =1
[g (w
i
,
0
) s
i
(
0
)
0
]
+E[\
g (w
i
,
0
) \
s
i
(
0
)
0
]
_
N
_

0
_
+o
p
(1)
If the density is corrected specied,
E[\
g (w
i
,
0
) \
s
i
(
0
)
0
] = 0 since
\
s
i
(
0
)
0
=
_
E
_
s
i
(
0
) g
i
(
0
)
/
__
and using the same argument as in the conditional
information inequality
E[ \
g (w
i
,
0
)[ x
i
] = E
_
g
i
(
0
) s
i
(
0
)
/
x
i
_
Using the results above,
1
_
N
N
i =1
_
g
_
w
i
,
_
s
i
_
_

0
_
=
1
_
N
N
i =1
[g (w
i
,
0
) s
i
(
0
)
0
] +o
p
(1)
We an get a consistent estimator for
0
=
_
1
N
N
i =1
s
i
_
_
s
i
_
_/
_
1
_
1
N
N
i =1
s
i
_
_
g
i
_
_/
_
and the asymptotic variance of
1
_
N

N
i =1
_
g
_
w
i
,
_
s
i
_
_

0
_
can be estimated by
1
_
N
N
i =1
_
g
_
w
i
,
_
s
i
_
_ _
g
_
w
i
,
_
s
i
_
_/
The Newey-Tauchen-White statistics (NTW) is
NTW =
_
N
i =1
g
_
w
i
,
_
_/
_
N
i =1
_
g
_
w
i
,
_
s
i
_
_ _
g
_
w
i
,
_
s
i
_
_
_
1
_
N
i =1
g
_
w
i
,
_
_
Under the null that the density is correctly specied,
NTW ~ A
2
Q
Quasi-MLE
In general, we do not know the true (conditional) distribution
function.
In QMLE, we use a normal density function to approximate
the distribution when we do not know that distribution.
In this case, since the model is not corrected specied,
I (
0
) ,= E[H (
0
)], in this case the asymptotic variance of
MLE is
H
1
0
I
0
H
1
0
N
In this case, QMLE is not the true value that maximizes Q
0
(),
and we are solving the following maximizing problem
max
E
0
[l ()] = max
_
log f (y[ x; ) f (y[ x;
0
) v (dy)
Using the Kullback-Lieber information Criterion, we are
minimizing the distance between the true density function and
a parametric function
/(f ; x) =
_
log
_
p
0
(y[ x)
f (y[ x)
_
p
0
(y[ x) v (dy)
and we try to nd the pseudo-true parameter value that
makes the parametric density close to the true density.
Consistency of this estimator happens if true data density
belongs to the linear exponential family, and the conditional
mean is corrected specied.
References
Amemya: 4
Wooldridge:13
Rudd: 14 and15
Newey, W. and D. McFadden (1994). "Large Sample
Estimation and Hypothesis Testing", Handbook of
Econometrics, Volume IV, chapter 36.

Estimadores Extremos: Máxima Verossimilhança (MLE)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimadores Extremos: Máxima Verossimilhança (MLE)

Uploaded by

Copyright:

Available Formats

Examples Conditional MLE: Introduction Identication Asymptotic Normality Hypothesis Testing

f (y[ x) v (dy) = 1 for all x A

f (y[ x; ) v (dy) = 1 for each x A and each

E[log [f (y[ X; )]]

[log f (Y[ X; )[] <

solves the following maximization problem

f (y[ x; ) v (dy) = 1 for all ,

log f (y[ x; ) f (y[ x; )

log f (y[ x; ) f (y[ x; ) v (dy) = 0

f (y[ x; )| v (dy) < ,

You might also like