Modelos Escolha Discreta

Binary Response Models Multinomial Response Models Truncated and Censored Models
Modelos de Escolha Discreta

Cristine Campos de Xavier Pinto
University of Michigan
Winter 2010
Cristine Campos de Xavier Pinto Institute
There are some economic behavior that the continuous
approximation for the dependent variable is not a good one.
Examples: When we try to model individuals decision:
whether to go to college, number of children, what brand of
automobile to purchase, etc.
In the qualitative models, y take a nite number of outcomes.
The simplest case, y is a binary variables: y = 1 (success) ;
y = 0 (failure)
In this binary response models, we are interested in the
response probability:
p (x) = Pr [ y = 1[ x] = Pr [ y = 1[ x
1
, ..., x
K
]
For a continuous variable, x
j
, the partial eect of x
j
on the
response probability is
Pr [ y = 1[ x]
x
j
For binary variable, x
j
, we calculate the responses probabilities
Pr [x
1
, x
2
, ..., x
k1
, 1] Pr [x
1
, x
2
, ..., x
k1
, 0]
Univariate binary response model:
Pr [ y = 1[ x] = F
_
x
/
i
_
, i = 1, ..., N
y
i
is a sequence of independent binary random variables
taking values 0 or 1
x
i
is a Kx1 vector of explanatory variables
0
is a Kx1 vector of parameters
F is a known function
The functional forms of F most used are:
Linear Probability Model
F (x) = x
Probit Model
F (x) = (x) =
_
x
1
_
2
exp
_
t
2
2
_
dt
Logit Model
F (x) = (x) =
e
x
1 +e
x
Linear Probability Model:
1 F is not constrained to lie between 0 and 1.
2 In this model,
Pr [ y = 1[ x]
x
j
=
j
3 In this model, heteroskedasticity is present since
Var [ y[ x] = x
/
_
1 x
/
i
_
4 In this model, a ceteris paribus unit increase in x
j
always
change Pr [ y = 1[ x] by the same amount, regardless the value
of x
j
. If we keep increasing x
j
, eventually Pr [ y = 1[ x] will be
outside the interval [0, 1]
Probit and Logit Models:
1 If x
j
is continuous,
Pr [ y = 1[ x]
x
j
= f
_
x
/
j
where f (x
/
) =
dF (z)
dz
.
2 F (.) is a strictly increasing function, f (z) > 0 for all z. The
sign of the eect is given by the sign of
j
.
3 We can calculate the relative eects
Pr[ y =1[x]
x
j
Pr[ y =1[x]
x
h
=

j
h
4. The index model can be derived from can be derived from
latent variable model:
y
+
= x
/
+e
y = 1 y
+
> 0
where e is continuously distributed variable independent of x
with cdf F (.) and the distribution is symmetric about zero.
Since the distribution is symmetric about zero,
1 F (z) = F (z)
and
Pr [ y = 1[ x] = Pr [ y
+
> 0[ x]
= Pr
_
e > x
/
= 1 F
_
x
/
_
= F
_
x
/
_
Lets model the decision of a person regarding whether she
drives a car or take a bus to work. We assume that the utility
associated with each model of transportation is a function of:
the mode characteristics z
the individual socioeconomic characteristics w
an unobservable term
We dene:
U
1i
: persons indirect utility associate with driving a car
U
0i
: persons indirect utility associate with taking a bus
U
0i
=
0
+z
/
0i
+w
/
i

0
+
0i
U
1i
=
1
+z
/
1i
+w
/
i

1
+
1i
Basic assumption: The person will drive a car if U
1i
> U
0i
;
and it will drive a bus if U
1i
< U
0i
.
Assuming that
0i
and
1i
are continuous random variable, the
indecision U
1i
= U
0i
happens with zero probability.
Y
i
= 1 if the person drives a car,x = (z, w)
Pr [ y = 1[ x]
= Pr [ U
1i
> U
0i
[ x]
= Pr
_
0i

1i
< (
1
0
) + (z
1i
z
0i
)
/
+w
/
i
(
1
0
)
= F
_
(
1
0
) + (z
1i
z
0i
)
/
+w
/
i
(
1
0
)
where F is a distribution function of

0i

1i
.
How can we estimate the parameters in this model?
Assuming that we have a random sample of size N, the
log-likelihood function is
log L =
N
i =1
y
i
log F
_
x
/
i
_
+
N
i =1
(1 y
i
) log
_
1 F
_
x
/
i
_
The MLE estimator is a solution (if it exits) of:
log L
=
N
i =1
y
i
F (x
/
i
)
F (x
/
i
) (1 F (x
/
i
))
f
_
x
/
i
_
x
i
= 0
We need to use a F that is twice dierentiable, and assume
that the parameter space is compact and that x
i
is
uniformly bounded in i and E[x
i
x
/
i
] is a nite nonsingular
matrix.
To show consistency, we need to show that all the
assumptions necessary to show consistency of MLE hold.
The second derivative is
2
log L
/
=
N
i =1
_
y
i
F (x
/
i
)
F (x
/
i
) (1 F (x
/
i
))
_
2
f
_
x
/
i
_
2
x
i
x
/
i
+
N
i =1
_
y
i
F (x
/
i
)
F (x
/
i
) (1 F (x
/
i
))
_
f
_
x
/
i
_
/
x
i
x
/
i
In this case,
E[ H
i
()[ x] =
f (x
/
i
)
2
x
i
x
/
i
F (x
/
i
) (1 F (x
/
i
))
= A(x
i
, )
which is positive semidenite matrix. In the case of the logit
and probit, and assuming that E[x
i
x
/
i
] is a nite nonsingular
matrix, this matrix is positive denite.
Under the general conditions of MLE,
_
N
_
d
N
_
0, J
1
_
where J = E[A(x
i
, )]
Since in the logit and probit cases, we have global concavity,
computing the MLE using the iteration procedures is very
simple. We can use the Newton-Raphson algorithm, and get
2
=

2
log L
1
_
1
_
log L
1
_
At the end,
2
=
_
_
N
i =1
f
_
x
/
i
1
_
2
x
i
x
/
i
F
_
x
/
i
1
_ _
1 F
_
x
/
i
1
__
_
_
1
_
_
N
i =1
f
_
x
/
i
1
_
x
i
_
y
i
F
_
x
/
i
1
_
+f
_
x
/
i
1
_
x
/
i
1
_
F
_
x
/
i
1
_ _
1 F
_
x
/
i
1
__
_
_
Interpretation: Weighted Least Squares Estimator with
weights equal to
1
F
(
x
/
i
1
)(
1F
(
x
/
i
1
))
.
Neglected Heterogeneity
Now, we deal with endogenous variables and neglected
Heterogeneity in the qualitative models.
Suppose that the structural model of interested is
Pr [ y = 1[ x, c] =
_
x
/
+ c
_
where x is a vector Kx1 with x
1
= 1 and c is a scalar.
Object of Interest: partial eects of x
j
on the response
probability, holding c constant.
The latent model has the form
y
+
= x
/
+ c +e
y = 1 y
+
> 0
e[ x,c ~ A (0, 1)
Suppose that c is independent of x, and
c ~ A
_
0,
2
_
Under these two assumptions, c +e is independent of x , and
c +e ~ A
_
0,
2
2
+ 1
_
In this case,
Pr [ y = 1[ x] = Pr
_
c +e > x
/
=
_
x
/
_
where
2
=
2
2
+ 1.
Even when the omitted heterogeneity is independent of x, the
probit coecients are inconsistent
p lim
j
=

j
However, if we are interested in the partial eects,

j
gives
the right direction.
For continuous x
j
,
Pr [ y = 1[ x, c]
x
j
=
j
_
x
/
+ c
_
for various values of c and x.
Because c is not observed, we cannot estimate .
If c is normalized so that E[c] = 0, so we may be interested
in the partial eects evaluated at c = 0.
However, what is consistently estimate from the probit of y
on x is
_
x
/
_
which is dierent from the object of interest.
Another parameter of interested: Average Partial Eect
(APE)
APE: For given x, we average the partial eect across de
distribution of c in the population. Let x
0
be a specic value
of the vector of explanatory variables,
E
_
_
x
0/
+ c
_
_
=
_
_
x
0/
_
The probit of y on x consistently estimate the average partial
eects.
Endogenous Explanatory Variable
Lets assume that the continuous explanatory variable is
correlated with x.
Consider the following model:
y
+
1
= z
1
1
+
1
y
2
+u
1
y
2
= z
1
21
+z
2
22
+v
2
y
1
= 1 [y
+
1
> 0]
where (u
1
, v
2
) has a zero mean, bivariate normal distribution
and is independent of z = (z
1
, z
2
) .
y
2
is endogenous if u
1
and v
2
are correlated.
In this example, y
2
is a continuous random variable (Why?)
We need a normalization to interpret the parameter in this
equation as an average partial eect,
Var [u
1
] = 1
Lets try to understand why the normalization is necessary.
Consider the outcome y
1
at two dierent outcomes of y
2
(y
2
and y
2
+ 1). Holding all the other factors constant, the
dierence at the response functions are:
1 [z
1
1
+
1
(y
2
+ 1) +u
1
_ 0] 1 [z
1
1
+
1
y
2
+u
1
_ 0]
Because u
1
is not observed, we cannot estimate the dierence
in response for a given population unit. However,
u
1
~ A (0, 1) and we can average across the distribution of
u
1
,
(z
1
1
+
1
(y
2
+ 1)) (z
1
1
+
1
y
2
)
In this case, the parameters in APE are
1
and
1
. However, if
we do not normalize, Var [u
1
] = , and APE will depend on
and

1
.
Under the joint normality of (u
1
, v
2
) with Var [u
1
] = 1, we
can write
u
1
=
1
v
2
+e
1
where
1
=
Cov (v
2
,u
1
)
Var [v
2
]
=

1
2
2
e
1
is independent of z and v
2
, and is normally with mean 0
and variance 1
2
1
, where
2
1
= Corr (v
2
, u
1
) .
We can write the model as
y
+
1
= z
1
1
+
1
y
2
+
1
v
2
+e
1
e
1
[ z,y
2
, v
2
~ A
_
0, 1
2
1
_
In this case,
Pr [ y
1
= 1[ z,y
2
, v
2
] =
_
_
z
1
1
+
1
y
2
+
1
v
2
_
1
2
1
_
_
The probit of y
1
on z
1
, y
2
and v
2
, consistently estimate
1
_
1
2
1
,

1
_
1
2
1
and

1
_
1
2
1
.
However, we do not know v
2
and we need to estimate it in a
rst step.
We can think about a two step procedure:
STEP 1: run the OLS regression of y
2
on z, and save the
residuals v
2
.
STEP 2: Run the probit of y
1
on z
1
, y
2
and v
2
, and get
consistent estimators to

1
_
1
2
1
,

1
_
1
2
1
and

1
_
1
2
1
.
To derive the asymptotic variance of this two step-estimator,
we need to use the derivation of a variance of a two-step
procedure for an extremum estimator.
Using this procedure, we can consistently estimate APE. The
APE is taking derivatives of
E
v
2
_
_
_
_
z
1
1
+
1
y
2
+
1
v
2
_
1
2
1
_
_
_
_
=
_
z
1
+
1
y
2
_
where
=

1
_
_
1
2
1
_
__
_

2
1
1
2
1
2
2
+ 1
_
_,
2
2
= Var [v
2
]
1
=

1
_
_
1
2
1
_
__
_

2
1
1
2
1
2
2
+ 1
_
_,
2
2
= Var [v
2
]
After the two step procedure, we just divide each coecient
by
__
_

2
1
1
2
1
2
2
+ 1
_
_
.
Another way to estimate this latent model is to use
conditional MLE. Note that
f (y
1
, y
2
[ z) = f (y
1
[ y
2
, z) f (y
2
[ z)
Using the assumptions above, y
2
[ z ~A
_
z
2
,
2
2
_
.
Since v
2
= y
2
z
2
,
Pr [ y
1
= 1[ y
2
, z] =
_
_
_
_
_
_
_
z
1
1
+
1
y
2
+
_
2
_
(y
2
z
2
)
_
1
2
1
. .
=w
_
_
_
_
_
_
_
Using the derivation above,
f (y
1
, y
2
[ z) = (w)
y
1
1 (w)
1y
1
_
1
2
_
_
y
2
z
2
2
_
and the log-likelihood function
N
i =1
y
1i
log ((w
i
)) + (1 y
1i
) log (1 (w
i
))
1
2
log
_
2
2
_
1
2
(y
2i
z
i
2
)
2
2
2
MLE is more ecient than two-step procedure. (Why?)
We get estimates of
1
and
1
.
However, the iteration algorithm do not work well when
2
1
tend to 1 or 1.
Lets assume that the dependent variable y
i
takes m
i
+ 1
values 0, 1, 2, ..., m
i
. The multinomial response model is
dened as
Pr [ y
i
= j [ x] = F
ij
(x, )
Note that Pr [ y
i
= 0[ x] = F
i 0
(x, ) does not need to be
specied since it is going to be equal to one minus the sum of
m
i
other probabilities.
To dene the MLE of , we need to dene

N
i =1
m
i
+ 1 binary
random variables
y
ij
=
_
1 if y
i
= j
0 if y
i
,= j
for i = 1, 2, ..., N and j = 0, 1, ..., m
i
.
The log-likelihood is
log L =
N
i =1
m
i
j =0
y
ij
log F
ij
Multinomial Logit Model
In this case, the order of the responses do not matter.
Lets assume that y
i
is a random variable that can assume
values 1, ..., J for J a positive integer.
We have a random sample of (x
i
, y
i
) from a certain
population.
In the multinomial logit model (MNL), the responses
probabilities are
p
j
(x, ) = Pr [ y = j [ x] =
exp
_
x
/
j
_
1 +
J
h=1
exp (x
/
h
)
, j = 1, ..., J
Since the probabilities sum to one
Pr [ y = 0[ x] =
1
1 +
J
h=1
exp (x
/
h
)
The partial eects for a continuous x
k
are
Pr [ y = j [ x]
x
k
= Pr [ y = j [ x]
_
_
_
jk

J
h=1

hk
exp (x
/
h
)
1 +
J
h=1
exp
_
x
/
j
_
_
_
_
where
hk
is the kth element of
h
.
Note that
jk
does not determine the sign of the eect.
Multinomial logit model: The explanatory variables do not
change across alternatives. For each i , x
i
is specic to
individuals, but not to alternatives. These variable can aect
dierent eects on the relative probabilities between any two
choices.
Interpretation: Think about the relative eects
log
_
p
j
(x, )
p
h
(x, )
_
= x
_
j

h
_
Since we have fully specied the density function,
log L =
N
i =1
J
j =0
1 [y
i
= j ] log [p
j
(x, )]
We obtain the MLE estimator and in this case, the
log-likelihood is globally concave.
McFadden (1974) showed that a model closed to the
multinomial logit model can be derived from a utility
maximization.
Suppose that for an individual i from the population, the
utility from choosing alternative j is:
y
+
ij
= x
ij
+a
ij
, j = 1, ..., J
where a
ij
are unobservable aecting tastes, j = 0, ..., J.
x
ij
cannot contain elements that vary only across i , and not
across j .
We assume that the vector a
i
is independent of
x
i
= [x
ij
, j = 1, ..., J] .
Let y
i
denote the choice of individual i that maximizes utility
y
i
= arg max (y
+
i 0
, ..., y
+
iJ
)
Notice that y
i
takes values in 0, 1, ..., J .
If a
ij
, j = 0, ..., J are independently distributed with the
type I extreme value distribution,
F (a) = exp [exp (a)]
then
p
j
(x) = Pr [ y
i
= j [ x] =
exp (x
ij
)
J
h=0
exp (x
ih
)
This is called the conditional logit model.
To derive this formula, suppose that J = 2,
Pr [ y
i
= 2[ x]
= Pr [ y
+
i 2
> y
+
i 1
, y
+
i 1
> y
+
i 2
[ x]
= Pr [x
i 2
x
i 1
+a
i 2
> a
i 1
, x
i 2
x
i 0
+a
i 2
> a
i 0
]
=
_

f (a
i 2
)
_
x
i 2
x
i 1
+a
i 2
f (a
i 1
) da
i 1
_
x
i 2
x
i 0
+a
i 2
f (a
i 0
) da
i 0
da
i 2
=
_

exp [a
2
] exp[exp [a
2
]
exp [exp [x
i 2
+x
i 1
a
i 2
]]
exp [exp [x
i 2
+x
i 0
a
i 2
]] da
i 2
=
exp [x
i 2
]
exp [x
i 2
] + exp [x
i 1
] + exp [x
i 0
]
The marginal eects are given by
p
j
(x)
x
jk
= p
j
(x) [1 p
j
(x)]
k
, j = 0, ..., J, k = 1, ..., K
p
j
(x)
x
hk
= p
j
(x) p
h
(x)
k
, j ,= h, k = 1, ..., K
Conditional logit model: The explanatory variables can
change from choice to choice, but the eect of each variable
is the same for all the alternatives. The parameter is common
for all the choices, .
One important restriction is that
p
j
(x
j
)
p
h
(x
h
)
=
exp
_
x
/
j
_
exp (x
/
h
)
= exp
_
(x
j
x
h
)
/

The relative probabilities only depend on the attributes of

those two alternatives (Independence from Irrelevant
Alternatives, IIA)
Many models relax this assumption:
1 Multinomial Probit Model: a
i
has a multivariate normal
distribution with arbitrary correlations between a
ij
and a
ih
, for
j ,= h.
Disadvantage: The response probability involves
(J + 1) dimensional integral and computation is a problem.
2 Hierarchical model (Nested logit model): Aggregate the
alternatives into S groups of similar alternatives. In the rst
level, the probability of y being in a group. In the second level,
we pick the actual alternatives within each group.
Ordered Response Models
The values that y takes corresponds to a partition of the real
line.
Suppose we have a latent variable y
+
. In this case,
y = j if and only if
j
< y
+
<
j +1
, j = 0, 1, ..., J
and,
Pr [ y = j [ x, ] = F
_
j +1
x
/
_
F
_
j
x
/
_
In the order probit,
y
+
= x
/
+e, e[ x ~ A (0, 1)
In this case,
Pr [ y = 0[ x] = Pr [ y
+
_
1
[ x]
=
_
1
x
/
_
Pr [ y = 1[ x] = Pr [
1
< y
+
_
2
[ x]
=
_
2
x
/
1
x
/
_
until we get
Pr [ y = J[ x] = Pr [ y
+
>
J
[ x]
= 1
_
J
x
/
_
The parameters and can be estimated by MLE. The
log-likelihood function is
log L =
N
i =1
1 [y
i
= 0] log
_
1
x
/
i
_
+1 [y
i
= 1] log
_
2
x
/
i
1
x
/
i
_
+... +1 [y
i
= J] log
_
1
_
J
x
/
i
_
We can use a logistic distribution for e, and we have the
ordered logit model.
For the order probit model, the marginal eects are
p
0
(x)
x
k
=
k
1
x
/
_
p
J
(x)
x
k
=
k
J
x
/
_
p
j
(x)
x
k
=
k
_
j 1
x
/
j
x
/
_
, 0 < j < J
The sign of do not always determine the direction of the
eect, only at the extremes.
Limited Dependent Variable Models: the dependent variable is
constrained in some way.
Truncated models: observations outside a specic range is
totally lost.
Censored models: we can observe at least the exogenous
variables.
Examples: data censoring, corn solution outcomes (rm
expenditures, insure plan, etc.) and survival and duration
models.
Example: A household is assumed to maximize utility subject
to a budge constraint
y +z _ R
and the boundary constraint y _ y
0
or y = 0.
Suppose that y
+
is the solution of the maximization subject to
the budget constraint only, and we assume that
y
+
=
1
+
2
x +u
The solution for this problem is
y =
y
+
if y
+
> y
0
0 or y
0
if y
+
_ y
0
To solve this example, we assume that u is a random variable
and y
0
is known.
Given a random sample of size N, and obtain the loglikelihood
L =
0
F
i
(y
0i
)
1
f
i
(y
i
)
where
0
: product over those i for which y
+
i
_ y
0
1
: product over those i for which y
+
i
> y
0
Standard Tobit Model (or Type I model):
y
+
i
= x
/
i
+u
i
, u
i
[ x
i
~ A
_
0,
2
_
y
i
= max (0, y
+
i
)
where x includes a column of ones.
Objects of interested:
Censored Models: E[ y
+
[ x] = x
Corn solutions: E[ y[ x] or E[ y[ x, y > 0]
What do we know about bound for E[ y[ x]?
Using Jenens inequality
E[ y[ x] _ max (0, E[ y
+
[ x])
since g (z) = max (0, z) is a convex function.
In addition, we can write
E[ y[ x] = Pr [ y = 0[ x] 0 + Pr [ y > 0[ x] E[ y[ x, y > 0]
= Pr [ y > 0[ x] E[ y[ x, y > 0]
Lets dene w = 1 if y
+
> 0, and w = 0 if y
+
< 0.
Pr [ y > 0[ x] = Pr [ w = 1[ x]
= Pr [ y
+
> 0[ x]
= Pr
_
u > x
/
= Pr
_
u
>
x
/
x
_
=
_
x
_
A probit of w on x consistently estimate =

.
Recall that if z ~ A (0, 1), then for a constant c
E[ z[ z > c] =
(c)
1 (c)
Note that
E[ y[ x, y > 0] = x
/
+E[ u[ u > x]
= x
/
+
_
_

_
x
/
_
1
_
x
/
_
_
_
= x
/
+
_
_
_
x
/
_
x
/
_
_
_
Inverse Mills Ratio: (c) =
(c)
(c)
If x
j
is a continuously explanatory variable,
E[ y[ x, y > 0]
x
j
=
j
+
j
_
_
d
_
x
/
_
dc
_
_
=
j
_
1
_
x
/
__
x
/
+
_
x
/
___
Using the properties of normal, we can show that
_
1
_
x
/
_ _
x
/
+
_
x
/
___
> 0, so the sign of
j
gives
the direction of the impact.
Using the above results,
E[ y[ x] =
_
x
/
_
_
_
x +
_
_
_
x
/
_
x
/
_
_
_
_
_
_
The marginal eect of x
j
is
E[ y[ x]
x
j
=
Pr [ y > 0[ x]
x
j
E[ y[ x, y > 0]
+Pr [ y > 0[ x]
E[ y[ x, y > 0]
x
j
=
_
x
/
j
What is the interpretation of the adjustment factor?
Consider two estimators:
1 Probit Maximum Likelihood
2 Least Squares
3 Heckman two-step least squares
4 Tobit Maximum Likelihood
Random Sample of (y
i
, x
i
) of size N. However y
+
i
is
unobserved if y
+
i
_ 0.
Assumptions: x
i
are uniformly bounded and
lim
N
1
N

N
i =1
x
/
i
x
i
is positive denite. The parameter space
of and
2
is compact.
We need to derive the density of y
i
conditional on x
i
.
From above, we know that
Pr [ y
i
= 0[ x
i
] = 1
_
x
/
i

_
For c > 0
Pr [ y
i
_ c[ x
i
] = Pr [ y
+
i
_ c[ x
i
]
so
f (c[ x
i
) = f
+
(c[ x
i
)
By assumption y
+
[ x ~ A
_
x,
2
_
, and
f
+
(c[ x
i
) =
1
_
c x
i
_
The density of y
i
conditional on x
i
is
f (c[ x
i
) =
_
1
_
x
/
i

__
1y
i
=0
_
1
_
c x
i
__
1y
i
>0
Probit Maximum Likelihood
The log-likelihood for the censored model can be written
L () =
N
i =1
_
1
_
x
/
i

__
1y
i
=0
_
x
/
i

_
1y
i
>0
N
i =1
_
_
x
/
i

_
1
_
1y
i
>0 _
1
_
y
i
x
/
i

__
1y
i
>0
The rst part is a likelihood function of a probit model, and
the last part is the likelihood of a truncated probit.
The Probit MLE estimator of =

is obtained by
maximizing only the logarithm of the rst part.
This method cannot be ecient, since it uses only the values
of y
+
i
and not the value of y
i
when we observe.
The estimator is consistent, but inecient.
Using the same derivation as we did for MLE, we can show
that

p
_
X
/
D
1
X
_
1
X
/
D
1
D
1
0
(wE [w])
where
D
1
is a diagonal matrix NxN with the elements (x
/

)
D
0
is a diagonal matrix NxN with the elements
_
x
/
i
_
1
_
1
_
x
/
i
_
1
(x
/

)
2
w is the vector with w
i
Least Squares Estimator
We can use OLS in the entire sample (incluing the
observations with zero) and in the sample for which y
i
> 0.
Both estimators are going to be inconsistents.
Using the results above,
E[ y[ x, y > 0] = x
/
+
_
_
_
x
/
_
x
/
_
_
_
= x +
_
x
/
_
so we can write
y
i
= x
/
i
+
_
x
/
i
_
+e
i
E[ e
i
[ x
i
, y
i
> 0] = 0
Lets dene =

.
If we run OLS of y on x
i
using the sample with y
i
> 0, we
omit
i
= inconsistent of OLS estimator
If we run OLS of y on x using the full sample, OLS is also
inconsistent. E[ y[ x] is a NONLINEAR function of x, and
.
Heckmans Two-Step Estimator
Lets go back to the model
y
i
= x
/
i
+
_
x
/
i

_
+e
i
The variance of e
i
is
Var [ e
i
[ x
i
] =
2
2
x
/
i
_
x
/
i
_
x
/
i
_
2
We have a nonlinear regression model.
The estimation proposed by Heckman has 2 steps:
STEP 1: Estimate by the probit MLE.
STEP 2: Regress y
i
on x
i
and
_
x
/
i
_
by least squares, using
only the sample in which y
i
> 0.
To derive the properties of Heckmans estimator, lets rewrite
the model as
y
i
= x
/
i
+
_
x
/
i
_
+e
i
+
i
where
i
=
_
_
x
/
i
_
x
/
i
_
Lets dene

Z
i
= (x
i
, (x
/
i
)) and =
_
/
,
_
/
. In addition,
N
1
as size of the sample in which y
i
> 0.In this case,
=
_
N
1
i =1
Z
/
i

Z
i
_
1
_
N
1
i =1
Z
/
i
y
i
_
Is consistent?
Lets try to derive the asymptotic distribution of . We can
write,
_
N
1
( ) =
_
1
N
1
N
1
i =1
Z
/
i

Z
i
_
1
_
1
_
N
1
N
1
i =1
Z
/
i
e
i
+
1
_
N
1
N
1
i =1
Z
/
i

i
_
From before, we know the probit is consistent, so
p lim
N
1
1
N
1
N
1
i =1
Z
/
i

Z
i
= p lim
N
1
1
N
1
N
1
i =1
Z
/
i
Z
i
= E
_
Z
/
i
Z
i
where Z
i
= (x
i
, (x
/
i
)) .
Under the assumption that u
i
is i.i.d with A
_
0,
2
_
, we
can show that
1
_
N
1
N
1
i =1
Z
/
i
e
i

d
A
_
0,
2
E
_
Z
/
i
Z
i
_
where E[ e
i
e
/
i
[ x
i
] =
2
.
Doing a mean valued expansion of (x
/
i
) around (x
/
i
)
_
x
/
i
_
=
_
x
/
i
_
+
(x
/
i
)
( )
so
i
=
(x
/
i
)
( )
Under the assumptions above,
1
_
N
1
N
1
i =1
Z
/
i

i

d
A (0, V)
where
V =
2
E
_
Z
/
(I ) x
/
_
x
/
D
1
x
_
1
x
/
(I ) Z
_
At the end,
_
N
1
( ) converges to a normal with mean
zero and a nite variance.
Tobit Maximum Likelihood Estimator
The Tobit MLE will maximize the log-likelihood function,
= (, ):
l () =
N
i =1
_
1 y
i
= 0 log
_
1
_
x
/
i

__
+1 y
i
> 0
_
log
_
_
y
i
x
/
i

__
log
_
2
_
2
__
The rst derivative of this problem is
l ()
=
N
i =1
_
_
_
1 y
i
= 0

_
x
/
i

_
x
i
1
_
x
/
i

_
+1 y
i
> 0
_
(y
i
x
/
i
) x
i
2
__
l ()
2
=
N
i =1
_
_
_
1 y
i
= 0

_
x
/
i

_
(x
/
i
)
2
2
_
1
_
x
/
i

__
+1 y
i
> 0
_
_
_
y
i
x
/
i

_
2
2
4

1
2
2
_
_
_
_
To derive the asymptotic distribution, we need the hessian.
Lets get
A(x
i
, ) = E[ H
i
()[ x
i
]
A(x
i
, ) =
_
a
i
x
/
i
x
i
b
i
x
/
i
b
i
x
i
c
i
_
where
a
i
=
2
_
_
x
/
i
_
x
/
i
_

_
x
/
i
_
2
1
_
x
/
i
_
_
x
/
i
_
_
_
_
_
b
i
=
1
2
3
_
_
x
/
i
_
2
_
x
/
i
_
+
_
x
/
i
_
_
x
/
i
_

_
x
/
i
_
2
1
_
x
/
i
_
_
_
_
_
c
i
=
1
4
4
_
_
x
/
i
_
3
_
x
/
i
_
+
_
x
/
i
_
x
/
i
_
_
x
/
i
_

_
x
/
i
_
2
1
_
x
/
i
_
_
_
2
_
x
/
i
_
_
_
Since its MLE estimator, its consistenty and asymptotic
normality with asymptotic variance-covariance matrix equals
to E[A(x
i
, )]
1
.
Computation: Convergence is assured by global concavity,
however a choice of a good estimator for the mills ration
improve speed.
To compute the Tobit model, we can use the EM algorithm.
References
Amemya: 9 e 10
Rudd: 27
Wooldridge: 15 e 16

Modelos Escolha Discreta

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modelos Escolha Discreta

Uploaded by

Copyright:

Available Formats

Binary Response Models Multinomial Response Models Truncated and Censored Models

Modelos de Escolha Discreta

where F is a distribution function of

However, if we are interested in the partial eects,

The relative probabilities only depend on the attributes of

You might also like