Modelos Lineares: Variável Omitida e Erro de Medida

Omitted Variable Measurement Error
Modelos Lineares
Varivel Omitida e Erro de Medida
Cristine Campos de Xavier Pinto
CEDEPLAR/UFMG
Maio/2010
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
The presence of endogenous variables is one the possible
violations of assumption A3. In this case, the error term () is
going to be correlated with the explanatory variables, and
OLS is going to be inconsistent.
In the applied world, generally endogeneity can arise in three
dierent situations:
1 Omitted Variables: We want to control for one or more
additional variables, but we do not observe these variables.
Suppose we want to estimate the following conditional
expectation
E[ Yj X, Q]
but we do not observe Q.
This only a problem is Q is correlated with X.
Modelos Lineares
2 Measurement Error: In this case, we observe an imperfect
measure of the variable of interest. We observe X
k
, but we
are interested in X
k
, and
X
k
= f (X
k
) +u
where u is an unobservable component.
3 Simultaneity: At least one of the explanatory variables is
determined simultaneously along with the dependent variable
(Y). In general, one of the explanatory variable (X
k
) is
determined partly as a function of Y.
Modelos Lineares
When omitted variable is a problem?
Suppose that the conditional expectation function of interest
is
m(x, q) = E[ Yj X = x, Q = q]
where the factor Q is unobservable.
Example: We are interested in the average earnings (Y) for
someone with X = x years of schooling holding unobserved
ability xed at Q == q.
Since Q is not observable, we cannot identify m(x, q) in
general.
Modelos Lineares
But maybe we can identify the marginal distribution of X
m(x) = E
Q
[ Yj X = x, Q = q]
Example: We average over the marginal distribution of Q, we
can get the population average earnings if everyone had
exactly X = x years of education.
We can get the population average eect of a marginal
change in X
k
at X = x
m(x)
x
k
= E
Q
_
m(x, Q)
x
k
_
Modelos Lineares
Note that m(x) is very dierent from E[ Yj X = x] .
E[ Yj X = x] is the average earnings for the subpopulation
that does in fact have X = x years of schooling.
E[ Yj X = x] =
_
m(x, q) f
QjX
(qj x) dq
= E
QjX=x
[m(x, Q)]
m(x) =
_
m(x, q) f
Q
(q) dq
= E
Q
[m(x, Q)]
From the two expressions, we can see that sucient
condition for E[ Yj X = x] identify m(x) is
f
QjX
(qj x) = f
Q
(q)
Modelos Lineares
In the absence of independence between X and Q,
subpopulations (dened by specic values of X) will have
dierent underlying distributions of unobserved heterogeneity.
We have selection.
In this case, we have selection bias given by
E[ Yj X = x] m(x) =
_
m(x, q)
_
f
QjX
(qj x) f
Q
(q)
_
dq
Example: Individuals with 12 years of education may have a
dierent underlying ability distribution from the individuals
with 9 years of schooling. In this case, the average earnings
for individuals with 12 years of schooling does not identify the
counterfactual average level of earnings that would be
observed is everyone in the population had 12 years of
education.
Modelos Lineares
Back to our linear model
E[ Yj X
1
= x
1
, ..., X
k
= x
k
, Q = q] =
0
+
1
x
1
+...
k
x
k
+q
We can write this model
Y =
0
+
1
X
1
+
2
X
2
+ ... +
k
X
k
+u
where u = q +v, where E[ vj X
1
, ..., X
k
, Q] = 0.
Object of interest:
k
. We want a causal interpretation of the
returns of education: the expected proportional variation in
wage if someone from the working population is
EXOGENOUSLY given one year of education.
Modelos Lineares
Omitted Variable Problem
We can write the projection of Q onto space spanned by the
columns of X
Q =
0
+
1
X
1
+ .... +
k
X
k
+
where by denition of linear projection E[ j X] = 0.
Plugging the projection into our linear model:
Y = (
0
+
0
) + (
1
+
1
) X
1
+... + (
k
+
k
) X
k
+u +
where E[ u + j X] = 0.
In this case,
p lim
j
=
j
+
j
Modelos Lineares
Suppose that all the coecients in the linear projection of Q
are zero, except the coecient related to X
k
and the
intercept. In this case,
p lim
j
=
j
+
Cov (X
k
, Q)
Var [X
k
]
With this formula, we can have an idea of the sign of the bias.
In the absence of independence of X and Q, one approach to
identication is to assume that there is a vector Jx1 of proxy
variables for Q, W
OV1: The rst condition we need is that W is redundant for
Y once we condition on X and Q
E[ Yj X, W, Q] = E[ Yj X, Q]
Modelos Lineares
w a v.a. proxy
OV2: The second is that conditional on a specic realization
of W, X and Q are independent
Q ? Xj W = w , w 2 W
Under assumptions OV1 and OV2, and using the law of
iterated expectations
E[ Yj X = x, W = w]
= E[ E[ Yj X = x, W = w, Q]j X = x, W = w]
= E[ E[ Yj X = x, Q]j X = x, W = w]
= E[ m(x, Q)j X = x, W = w]
= E[ m(x, Q)j W = w]
Modelos Lineares
Hence
E[m(x, Q)] = E[E[ m(x, Q)j W]]
= E[E[ Yj X = x, W]]
= m(x)
We can identify m(x) by the partial average of
E[ Yj X = x, W] over the marginal distribution of the proxy
variable.
And the marginal derivatives can be identied as
m(x)
x
k
= E
W
_
E[ Yj X = x, W]
x
k
_
Modelos Lineares
Back to our linear model: Under assumptions OV1 and OV2,
we can write Q as a function of the linear term and an error:
Q =
0
+
1
W +
where E[ j W] = 0.
Replacing the projection into the equation:
Y = (
0
+
0
) +
1
X
1
+ ... +
k
X
k
+
1
W + (u + )
Under assumptions OV1 and OV2,
E[ u + j W, X] = 0
Modelos Lineares
Example: Short and Long Linear Regressions
Consider the log-linear model of earnings
ln Y = + S + A +
E[ j S, A] = 0
where Y is earnings,S years of schooling, A ability and is the
unobservable component.
In this case, the function of interest is given by
m(s) = E
A
[E[ ln Yj S = s, A]]
= + s + E
A
[A]
The partial eect is given by the long regression coecient:
m
0
(s) = E
A
_
E[ ln Yj S = s, A]
s
_
=
Modelos Lineares
The short linear regression is given by
E[ ln Yj S] = ( + ) + ( +
AS
) S
where
E[ Aj S] = +
AS
S
If we do not take care of the omitted variable problem, the
selection bias is
E[ ln Yj S = s] m(s)
= ( + ) + ( +
AS
) s + s + E[A]
= +
AS
s E[A]
= [E[ Aj S = s] E[A]]
If we estimate the partial eect of schooling by the coecient of
short regression, we would get
short
= +
AS
. .
ability bias
6=
Modelos Lineares
Assume that IQ (Q) is a proxy measure of ability. In this case, we
need to assume that
E[ ln Yj S, A, Q] = E[ ln Yj S, A]
and more strongly we need
E[ Aj S, Q] = E[ Aj Q]
Under these condition, the linear regression is
E[ ln Yj S, Q] = + S + E[ Aj S, Q]
= + S + E[ Aj Q]
where E[ Aj Q] = +
AQ
Q
If we average this function over the marginal distribution of IQ, we
get the function of interest
m(s) = E
Q
[E[ ln Yj S, Q]]
= + S + E[E[ Aj Q]]
Modelos Lineares
Dependent Variable
Assume that the dependent variable is measured with error.
We think about the following population model:
Y
=
0
+
1
X
1
+ ... +
K
X
K
+
We assume that assumptions A1, A2**, A3** hold, so
E[ X] = 0 and E[] = 0
We do not observe Y
. We observe the following measure:

Y = Y
+
where is the measurement error.
Our estimate model is
Y =
0
+
1
X
1
+ ... +
K
X
K
+ +
When OLS is going to produce consistent estimators of ?
Modelos Lineares
Dependent Variable
For consistent, we need
E[
i
X
i
] = 0
In general, we assume the normalization that: E[] = 0.
We can assume the homoskedastic assumption, and if and
are unrelated, we have
Var [ + ] =
2
+
2
>
2
With measurement error, we have a higher variance.

If the measurement error is systematically related to X, then
we have a problem.
Modelos Lineares
Explanatory Variable
We consider the following population model
Y =
0
+
1
X
1
+
2
X
2
+ ... +
k
X
k
+
under the assumptions A1, A2, A3.
We do not observe X
k
.
We observe
X
k
= X
k
+
k
with E[
k
] = 0.
We assume that
E[ j X
k
] = 0
and a redundancy condition
E[ Yj X
1
, ..., X
k
, X
k
] = E[ Yj X
1
, ..., X
k
]
Modelos Lineares
All the properties of OLS depend on relationship of the
measurement error with the explanatory variables.
The rst assumption is that (ME1):
E[
k
X
j
] = 0 for j = 1, .., k 1
We also need to assume that
k
is uncorrelated with X
k
(ME2)
E[
k
X
k
] = 0
Modelos Lineares
We can write a model that includes the measurement error
Y =
0
+
1
X
1
+
2
X
2
+ ... +
k
X
k
+ (
k
k
)
Under the assumptions above
E[(
k
k
) X
j
] = 0 for all j
E[
k
k
] = 0
Since is uncorrelated with
k
, we have
Var [
k
k
] = Var [] +
2
k
Var [
k
]
> Var []
Modelos Lineares
In the classical error-in-variable assumption, we replace ME2
by
E[
k
X
k
] = 0
In this case, X
k
and
k
are correlated
E[
k
X
k
] = E[
k
X
k
] +E
_
2
k
=
2
k
In this case, OLS estimator is inconsistent.
In some situations, we do not have classical measurement error
and we need to deal with other types of measurement error.
Modelos Lineares
References
Goldberger: 17
Hayashi: 3
Rudd: 20
Wooldridge: 4
Modelos Lineares

Modelos Lineares: Variável Omitida e Erro de Medida

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modelos Lineares: Variável Omitida e Erro de Medida

Uploaded by

Copyright:

Available Formats

Omitted Variable Measurement Error

. We observe the following measure:

With measurement error, we have a higher variance.

You might also like