Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Omitted Variable Measurement Error

Modelos Lineares
Varivel Omitida e Erro de Medida
Cristine Campos de Xavier Pinto
CEDEPLAR/UFMG
Maio/2010
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
The presence of endogenous variables is one the possible
violations of assumption A3. In this case, the error term () is
going to be correlated with the explanatory variables, and
OLS is going to be inconsistent.
In the applied world, generally endogeneity can arise in three
dierent situations:
1 Omitted Variables: We want to control for one or more
additional variables, but we do not observe these variables.
Suppose we want to estimate the following conditional
expectation
E[ Yj X, Q]
but we do not observe Q.
This only a problem is Q is correlated with X.
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
2 Measurement Error: In this case, we observe an imperfect
measure of the variable of interest. We observe X
k
, but we
are interested in X

k
, and
X
k
= f (X

k
) +u
where u is an unobservable component.
3 Simultaneity: At least one of the explanatory variables is
determined simultaneously along with the dependent variable
(Y). In general, one of the explanatory variable (X
k
) is
determined partly as a function of Y.
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
When omitted variable is a problem?
Suppose that the conditional expectation function of interest
is
m(x, q) = E[ Yj X = x, Q = q]
where the factor Q is unobservable.
Example: We are interested in the average earnings (Y) for
someone with X = x years of schooling holding unobserved
ability xed at Q == q.
Since Q is not observable, we cannot identify m(x, q) in
general.
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
But maybe we can identify the marginal distribution of X
m(x) = E
Q
[ Yj X = x, Q = q]
Example: We average over the marginal distribution of Q, we
can get the population average earnings if everyone had
exactly X = x years of education.
We can get the population average eect of a marginal
change in X
k
at X = x
m(x)
x
k
= E
Q
_
m(x, Q)
x
k
_
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Note that m(x) is very dierent from E[ Yj X = x] .
E[ Yj X = x] is the average earnings for the subpopulation
that does in fact have X = x years of schooling.
E[ Yj X = x] =
_
m(x, q) f
QjX
(qj x) dq
= E
QjX=x
[m(x, Q)]
m(x) =
_
m(x, q) f
Q
(q) dq
= E
Q
[m(x, Q)]
From the two expressions, we can see that sucient
condition for E[ Yj X = x] identify m(x) is
f
QjX
(qj x) = f
Q
(q)
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
In the absence of independence between X and Q,
subpopulations (dened by specic values of X) will have
dierent underlying distributions of unobserved heterogeneity.
We have selection.
In this case, we have selection bias given by
E[ Yj X = x] m(x) =
_
m(x, q)
_
f
QjX
(qj x) f
Q
(q)
_
dq
Example: Individuals with 12 years of education may have a
dierent underlying ability distribution from the individuals
with 9 years of schooling. In this case, the average earnings
for individuals with 12 years of schooling does not identify the
counterfactual average level of earnings that would be
observed is everyone in the population had 12 years of
education.
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Back to our linear model
E[ Yj X
1
= x
1
, ..., X
k
= x
k
, Q = q] =
0
+
1
x
1
+...
k
x
k
+q
We can write this model
Y =
0
+
1
X
1
+
2
X
2
+ ... +
k
X
k
+u
where u = q +v, where E[ vj X
1
, ..., X
k
, Q] = 0.
Object of interest:
k
. We want a causal interpretation of the
returns of education: the expected proportional variation in
wage if someone from the working population is
EXOGENOUSLY given one year of education.
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Omitted Variable Problem
We can write the projection of Q onto space spanned by the
columns of X
Q =
0
+
1
X
1
+ .... +
k
X
k
+
where by denition of linear projection E[ j X] = 0.
Plugging the projection into our linear model:
Y = (
0
+
0
) + (
1
+
1
) X
1
+... + (
k
+
k
) X
k
+u +
where E[ u + j X] = 0.
In this case,
p lim

j
=
j
+
j
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Suppose that all the coecients in the linear projection of Q
are zero, except the coecient related to X
k
and the
intercept. In this case,
p lim

j
=
j
+
Cov (X
k
, Q)
Var [X
k
]
With this formula, we can have an idea of the sign of the bias.
In the absence of independence of X and Q, one approach to
identication is to assume that there is a vector Jx1 of proxy
variables for Q, W
OV1: The rst condition we need is that W is redundant for
Y once we condition on X and Q
E[ Yj X, W, Q] = E[ Yj X, Q]
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
w a v.a. proxy
Omitted Variable Measurement Error
OV2: The second is that conditional on a specic realization
of W, X and Q are independent
Q ? Xj W = w , w 2 W
Under assumptions OV1 and OV2, and using the law of
iterated expectations
E[ Yj X = x, W = w]
= E[ E[ Yj X = x, W = w, Q]j X = x, W = w]
= E[ E[ Yj X = x, Q]j X = x, W = w]
= E[ m(x, Q)j X = x, W = w]
= E[ m(x, Q)j W = w]
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Hence
E[m(x, Q)] = E[E[ m(x, Q)j W]]
= E[E[ Yj X = x, W]]
= m(x)
We can identify m(x) by the partial average of
E[ Yj X = x, W] over the marginal distribution of the proxy
variable.
And the marginal derivatives can be identied as
m(x)
x
k
= E
W
_
E[ Yj X = x, W]
x
k
_
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Back to our linear model: Under assumptions OV1 and OV2,
we can write Q as a function of the linear term and an error:
Q =
0
+
1
W +
where E[ j W] = 0.
Replacing the projection into the equation:
Y = (
0
+
0
) +
1
X
1
+ ... +
k
X
k
+
1
W + (u + )
Under assumptions OV1 and OV2,
E[ u + j W, X] = 0
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Example: Short and Long Linear Regressions
Consider the log-linear model of earnings
ln Y = + S + A +
E[ j S, A] = 0
where Y is earnings,S years of schooling, A ability and is the
unobservable component.
In this case, the function of interest is given by
m(s) = E
A
[E[ ln Yj S = s, A]]
= + s + E
A
[A]
The partial eect is given by the long regression coecient:
m
0
(s) = E
A
_
E[ ln Yj S = s, A]
s
_
=
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
The short linear regression is given by
E[ ln Yj S] = ( + ) + ( +
AS
) S
where
E[ Aj S] = +
AS
S
If we do not take care of the omitted variable problem, the
selection bias is
E[ ln Yj S = s] m(s)
= ( + ) + ( +
AS
) s + s + E[A]
= +
AS
s E[A]
= [E[ Aj S = s] E[A]]
If we estimate the partial eect of schooling by the coecient of
short regression, we would get

short
= +
AS
. .
ability bias
6=
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Assume that IQ (Q) is a proxy measure of ability. In this case, we
need to assume that
E[ ln Yj S, A, Q] = E[ ln Yj S, A]
and more strongly we need
E[ Aj S, Q] = E[ Aj Q]
Under these condition, the linear regression is
E[ ln Yj S, Q] = + S + E[ Aj S, Q]
= + S + E[ Aj Q]
where E[ Aj Q] = +
AQ
Q
If we average this function over the marginal distribution of IQ, we
get the function of interest
m(s) = E
Q
[E[ ln Yj S, Q]]
= + S + E[E[ Aj Q]]
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Dependent Variable
Assume that the dependent variable is measured with error.
We think about the following population model:
Y

=
0
+
1
X
1
+ ... +
K
X
K
+
We assume that assumptions A1, A2**, A3** hold, so
E[ X] = 0 and E[] = 0
We do not observe Y

. We observe the following measure:


Y = Y

+
where is the measurement error.
Our estimate model is
Y =
0
+
1
X
1
+ ... +
K
X
K
+ +
When OLS is going to produce consistent estimators of ?
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Dependent Variable
For consistent, we need
E[
i
X
i
] = 0
In general, we assume the normalization that: E[] = 0.
We can assume the homoskedastic assumption, and if and
are unrelated, we have
Var [ + ] =
2

+
2

>
2

With measurement error, we have a higher variance.


If the measurement error is systematically related to X, then
we have a problem.
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Explanatory Variable
We consider the following population model
Y =
0
+
1
X
1
+
2
X
2
+ ... +
k
X

k
+
under the assumptions A1, A2, A3.
We do not observe X

k
.
We observe
X
k
= X

k
+
k
with E[
k
] = 0.
We assume that
E[ j X
k
] = 0
and a redundancy condition
E[ Yj X
1
, ..., X

k
, X
k
] = E[ Yj X
1
, ..., X

k
]
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Explanatory Variable
All the properties of OLS depend on relationship of the
measurement error with the explanatory variables.
The rst assumption is that (ME1):
E[
k
X
j
] = 0 for j = 1, .., k 1
We also need to assume that
k
is uncorrelated with X
k
(ME2)
E[
k
X
k
] = 0
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Explanatory Variable
We can write a model that includes the measurement error
Y =
0
+
1
X
1
+
2
X
2
+ ... +
k
X
k
+ (
k

k
)
Under the assumptions above
E[(
k

k
) X
j
] = 0 for all j
E[
k

k
] = 0
Since is uncorrelated with
k
, we have
Var [
k

k
] = Var [] +
2
k
Var [
k
]
> Var []
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
Explanatory Variable
In the classical error-in-variable assumption, we replace ME2
by
E[
k
X

k
] = 0
In this case, X
k
and
k
are correlated
E[
k
X
k
] = E[
k
X

k
] +E
_

2
k

=
2

k
In this case, OLS estimator is inconsistent.
In some situations, we do not have classical measurement error
and we need to deal with other types of measurement error.
Cristine Campos de Xavier Pinto Institute
Modelos Lineares
Omitted Variable Measurement Error
References
Goldberger: 17
Hayashi: 3
Rudd: 20
Wooldridge: 4
Cristine Campos de Xavier Pinto Institute
Modelos Lineares

You might also like