Professional Documents
Culture Documents
Econometrics II Lecture 5: Instrumental Variables Part II: Måns Söderbom
Econometrics II Lecture 5: Instrumental Variables Part II: Måns Söderbom
Econometrics II Lecture 5: Instrumental Variables Part II: Måns Söderbom
13 April 2011
1
i
,
^
1
i
Co
^
1
i
,
^
1
i
=
Co
1
i
,
^
1
i
Co
1
i
,
^
1
i
,
hence
j
2SLS
=
11
Co (1
i
, .
1i
)
Co
^
1
i
,
^
1
i
+
12
Co (1
i
, .
2i
)
Co
^
1
i
,
^
1
i
=
11
Co (1
i
, .
1i
)
Co
^
1
i
,
^
1
i
Co (1
i
, .
1i
)
Co (1
i
, .
1i
)
+
12
Co (1
i
, .
2i
)
Co
^
1
i
,
^
1
i
Co (1
i
, .
2i
)
Co (1
i
, .
2i
)
j
2SLS
= cj
1
+ (1 c) j
2
,
where
c =
11
Co (1
i
, .
1i
)
11
Co (1
i
, .
1i
) +
12
Co (1
i
, .
2i
)
is a weight bounded between zero and one.
Note that c depends on the relative strength of each instrument in the rst stage. In other words,
2SLS is a weighted average of the causal eects for instrument-specic compliant subpopulations (the
two LATEs).
3.2. Covariates in the heterogeneous eects model
The main reason covariates are included in a regression is that the conditional independence and exclusion
restrictions underlying IV estimation may be more likely valid after conditioning on covariates.
For example, in the case of draft eligibility, older cohorts were more likely to be eligible be design,
11
and because earnings covary with age (experience), draft eligibility is a valid instrument only conditional
on year of birth.
IV estimation with covariates in the present framework may be justied by a conditional independence
assumption of the following type:
1
i0
, 1
i1
, 1
i0
, 1
i1
independent of .
i
jA
i
,
i.e. think of the IV as being as good as randomly assigned, conditional on A
i
.
If we maintain the assumption that the causal eect of interest is constant, we are of course in very
familiar territory: just include A
i
as a control vector in the rst and second stage of your 2SLS
estimator.
To relax the constant eects assumption, we might be prepared to specify the causal eect as
1
i1
1
i0
= j (A) ,
i.e. as dependent on observable characteristics. This model can be estimated by adding interactions
between 1 and A to the second stage and interactions between . and A in the rst stage. You
would then have more than one rst-stage regression:
1
i
= A
0
i
00
+
01
.
i
+.
i
A
0
i
02
+
0i
1
i
A
1
i
= A
0
i
00
+
01
.
i
+.
i
A
0
i
02
+
0i
(...)
1
i
A
K
i
= A
0
i
00
+
01
.
i
+.
i
A
0
i
02
+
0i
12
assuming there are 1 covariates. The second stage would in this case be specied as
1
i
= c
0
A
i
+j
0
1
i
+1
i
A
0
i
j
1
+j
i
,
which implies
j (A) = j
0
+A
0
i
j
1
The heterogeneous eects model underlying the LATE theorem can also be extended to allow for
covariates. Unfortunately, interpretation becomes less straightforward. We now have covariate-
specic LATEs of the form
`(A
i
) 1 [1
1i
1
0i
jA, 1
1i
1
0i
] .
If we work with a rst-stage in which dummies for each value taken by the X-variable are used as
predictors, the 2SLS estimator will produce a weighted average of these covariate-specic LATEs,
where the weights will be higher for covariate values where the instrument creates more variation
in tted values.
This estimator is not very attractive. Almost certainly, the rst stage will contain too many
instrument resulting in bias towards the OLS estimator. Moreover, interpretation of the 2SLS
estimate of the causal eect is not straightforward (a weighted average of.... what exactly?).
If we modify the rst stage, perhaps using just A rather than dummy variables for each value of
A, this may be an acceptable approximation. See Theorem 4.5.2 in AP for details.
4. Variable Treatment Intensity
Now consider a treatment variable that can take more than two values, e.g. schooling. Dene potential
outcomes as
1
si
)
i
(o) .
13
Suppose o
i
takes on values in the set
0, 1, ...,
o
,
implying there are
o causal eects, 1
si
1
s1;i
. A linear causal model assumes these are the same for
all : and i, "...obviously unrealistic assumptions" according to AP.
However, you can still justify using 2SLS and a linear specication of the form
1
i
= c +jo
i
+j
i
.
It can be shown that j
2SLS
in this case is a weighted average of unit causal eects, where the weights
are determined by how the compliers are distributed over the range of o
i
.
For example, returns to schooling estimated using quarter of birth come from shifts in the distribution
of grades completed in high school (why?). Other instruments, such as distance to school, act elsewhere
on the schooling distribution and therefore capture a dierent sort of return.
So, you see, its all about interpretation!
14
PhD Programme: Econometrics II
Department of Economics, University of Gothenburg
Appendix: Lecture 5
Mns Sderbom
The Local Average Treatment Effect
1. Simulating LATE in Stata
Stata code:
cl ear
set seed 54687
set obs 20000
/ * f i r st , r andoml y assi gn t he i nst r ument - say hal f - hal f */
ge z = uni f or m( ) >. 5
/ * t hen, gener at e never - t aker s ( d00) , al ways- t aker s ( d11) and compl i er s
( d01) , i ndependent of z */
ge d00=( _n<=5000)
ge d11=( _n>5000 & _n<=10000)
ge d01=( _n>10000)
/ * obser ved out comes: al ways zer o f or never - t aker s, al ways one f or
al ways- t aker s, depends on t he I V f or compl i er s */
ge D=d11+z*d01
/ * now gi ve t he t hr ee gr oups di f f er ent LATE. Wi t hout l oss of
gener al i t y, assume wi t hi n gr oup homogenei t y. */
ge l at e=- 1 i f d00==1
r epl ace l at e=0 i f d11==1
r epl ace l at e=1 i f d01==1
/ * next gener at e pot ent i al out comes y0, y1 */
ge y0=0. 25*i nvnor m( uni f or m( ) )
ge y1=y0+l at e
/ * act ual out come depends on t r eat ment st at us */
ge y = D*y1+( 1- D) *y0
/ * t he aver age t r eat ment ef f ect i s si mpl y t he sampl e mean of l at e */
suml at e
/ * OLS doesn' t gi ve you ATE or LATE */
r eg y D
/ * I V gi ves you t he LATE f or t he compl i er s */
i vr eg y ( D=z)
exi t
1
2
Results:
. / * t he aver age t r eat ment ef f ect i s si mpl y t he sampl e mean of l at e */
. suml at e
Var i abl e | Obs Mean St d. Dev. Mi n Max
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
l at e | 20000 . 25 . 8291769 - 1 1
.
. / * OLS doesn' t gi ve you ATE or LATE */
. r eg y D
Sour ce | SS df MS Number of obs = 20000
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 1, 19998) = 6631. 34
Model | 1246. 68658 1 1246. 68658 Pr ob > F = 0. 0000
Resi dual | 3759. 60651 19998 . 187999125 R- squar ed = 0. 2490
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = 0. 2490
Tot al | 5006. 29309 19999 . 250327171 Root MSE = . 43359
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
y | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
D | . 4993491 . 006132 81. 43 0. 000 . 4873298 . 5113684
_cons | . 0005316 . 0043511 0. 12 0. 903 - . 007997 . 0090602
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
.
. / * I V gi ves you t he LATE f or t he compl i er s */
. i vr eg y ( D=z)
I nst r ument al var i abl es ( 2SLS) r egr essi on
Sour ce | SS df MS Number of obs = 20000
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - F( 1, 19998) = 5048. 40
Model | - 86. 6843722 1 - 86. 6843722 Pr ob > F = 0. 0000
Resi dual | 5092. 97746 19998 . 25467434 R- squar ed = .
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Adj R- squar ed = .
Tot al | 5006. 29309 19999 . 250327171 Root MSE = . 50465
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
y | Coef . St d. Er r . t P>| t | [ 95%Conf . I nt er val ]
- - - - - - - - - - - - - +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
D | 1. 015767 . 0142961 71. 05 0. 000 . 9877453 1. 043788
_cons | - . 2594847 . 0080341 - 32. 30 0. 000 - . 2752322 - . 2437373
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I nst r ument ed: D
I nst r ument s: z
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Recall: Treatment effect is 1.0 for the compliers, 0.0 for the always-takers and -1.0 for
the never-takers.