Professional Documents
Culture Documents
Proxy Variables
Proxy Variables
William Matcham
February 29, 2016
Reference: Slide Set 1, 3033 and Wooldridge 298303 (Wooldridge adds in an extra regressor in the
wage example, but the slides and the text below omit this term in order to make the discussion clearer.)
Introduction
A very common problem in econometrics is to not observe (i.e. not have data) on covariates that
are considered important to the analysis.
Proxy variables provide one way to mitigate the problems that arise when we cannot include in
our regression a covariate that we would like to.
Motivating example: suppose we wish to understand how education influences the wage of an
individual. A factor that affects the wage of an individual, that is correlated with education level,
is inherent natural ability.
Ability is therefore a confounding factor in the model.
The regression we may consider is
(1)
Inherent ability is very difficult, if not impossible, to measure, so without any better options we
may just leave out ability and run
log(wage) = 0 + 1 educ + w,
w = 3 abil + u
(2)
log(wage) = 0 + 1 educ + 2 IQ + e
(3)
This seems logical, but how do we know that it works? The rest of this handout explains the
conditions that a proxy needs to satisfy in order for an OLS regression on (3) to provide consistent
a consistent estimator of 1
y = 0 + 1 x1 + 2 x2 + u
(4)
Where MLR1-5 hold on this model. We have data on y and x1 . The variable x2 is unobserved,
but we have data on a proxy for x2 , denoted x2 .
2.0
Requirement 0
The first requirement is that x2 should have a relationship (correlation) with x2 . In other words,
in the regression
x2 = 0 + 2 x2 + v
(5)
we should have 2 6= 0.
The reason why the 0 term exists in (4) is because the proxy x2 and x2 may have different units,
and the v exists to represent the notion that the proxy and the unobserved variable are not exactly
the same.
So far, the long and short is that if 2 = 0, then x2 cannot be a proxy for x2 . This is a somewhat
obvious, but necessary, condition.
2.1
Requirement 1
The next requirement is that x2 should not be in the main regression (4), given that x1 and x2
are already in the regression. In other words, x2 is the factor that directly affects y and not x2 .
This is a bit like the instrumental variable exclusion restriction. The only channel from the proxy
x2 into y, is through x2 .
In the wage regression example, we have to believe that a higher IQ score in itself will not lead
to a higher wage (people dont tend to put IQ score on their CV anyway). We can (and must)
have however that a higher IQ will be associated with a higher innate ability, and then the higher
ability will be associated with a higher wage.
The mathematical way of stating the above is that we require
E(y | x1 , x2 ) = E(y | x1 , x2 , x2 )
(6)
Which in words says that the explanatory power of x1 and x2 in explaining the mean of y is exactly
the same as the explanatory power of x1 , x2 and x2 in explaining the mean value of y.
1
That is, once we control for x1 and x2 , x2 cannot improve our prediction of the mean value of y.
NOTE: by substituting y from (4) into both sides of (6), we obtain
E(0 + 1 x1 + 2 x2 + u | x1 , x2 ) = E(0 + 1 x1 + 2 x2 + u | x1 , x2 , x2 )
m
0 + 1 x1 + 2 x2 + E(u | x1 , x2 ) = 0 + 1 x1 + 2 x2 + E(u | x1 , x2 , x2 )
m
E(u | x1 , x2 ) = E(u | x1 , x2 , x2 )
Note that since MLR1-5 hold on (4), E(u | x1 , x2 ) = 0 and therefore the above derivation shows
us that
E(u | x1 , x2 , x2 ) = 0
In other words, requirement 1 ensures that the error term u is not only uncorrelated with x1 and
x2 , but also x2 .
The above result implies that E(u | x1 , x2 ) = 0.2
2.2
Requirement 2
Now go back and consider the proxy regression (5). The second requirement related to the proxy
is that once x2 is controlled for, the mean value x2 shouldnt depend upon x1 .
In other words, x2 should have no correlation with x1 , once x2 is partialled out. Another way of
seeing this is that if we considered
x2 = 0 + 1 x1 + 2 x2 + v
Then 1 = 0 should hold, so we get back to obtaining (5).
In mathematics, this requirement is given by
E(x2 | x1 , x2 ) = E(x2 | x2 )
Similar to above, substituting (5) into (7) gives
E(0 + 2 x2 + v | x1 , x2 ) = E(0 + 2 x2 + v | x2 )
m
0 + 2 x2 + E(v | x1 , x2 ) = 0 + 2 x2 + E(v | x2 )
m
E(v | x1 , x2 ) = E(v | x2 )
Since E(v | x2 ) = 0, we thus obtain E(v | x1 , x2 ) = 0
v should not just be uncorrelated with x2 , but also x1 .
2
Tower
(7)
To cast this requirement in the wage regression example, we are saying that
y = 0 + 1 x1 + 2 x2 + u
(8)
x2 = 0 + 2 x2 + v
(9)
and
Substituting (9) into (8), we obtain
y = 0 + 1 x1 + 2 0 + 2 2 x2 + 2 v + u
Which leaves
y = 0 + 1 x1 + 2 x2 + e
(10)
Where
1. 0 = 0 + 2 0
2. 2 = 2 2
3. e = 2 v + u
Note that now we cannot identify 2 , the marginal effect of x2 on y, but in many settings, identifying
2 will be more interesting anyway: in the wage regression example, we may be more interested
in the marginal effect of one more IQ point, rather than the marginal effect of one more unit of
innate ability, which is a vague notion at best.
Consider running OLS on (10). For unbiased estimation of 1 and 2 , we need that E(e | x1 , x2 ) =
0. Observe, noting the two results in red in the above text, that