Instrumental Variable Homogenous Effect

Instrumental Variable

Gumilang Aryo Sahadewo

Universitas Gadjah Mada

October 18, 2020

1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Motivation 1

We often use nonexperimental data to conduct empirical

The simple regression model would be:

log (wage)i = β0 + β1 educi + ui

What is the main assumptions to obtain an unbiased estimate

of β1 ?
What would be the problem with this model?
Motivation 1

We often use nonexperimental data to conduct empirical

The simple regression model would be:

log (wage)i = β0 + β1 educi + ui

What would be the problem with this problem?

Suppose that we extend the model to include control variables:

log (wage)i = β0 +β1 educi +γ1 womeni +γ2 agei +. . .+γk Xik +ui

Would that solve the previous problem?

Motivation 1

Unfortunately, not necessary.

The problem of omitted variable bias (OVB) can be quite
Often times important personal variables cannot be observed
e.g. ability, motivation
The unobservables are correlated with the explanatory variable
of interest, educ, or E [u | educ] 6= 0.
Thus, educ is endogenous.
Motivation 2

Consider the following regression model:

depressioni = α0 + α1 alcoholi + ΓXi + ui

What would be the issue in the estimation of the model

Motivation 3

Suppose that the true model is:

Yi = β0 + β1 X1i∗ + ui

In reality, we can only observe some noisy measurement of

X ∗, X :
Xi = Xi∗ + ei
This is a measurement error problem.
Motivation 3

σX2 ∗
plimβ1 = β1
σX2 ∗ + σe2

and !
σX2 ∗
0≤ ≤1
σX2 ∗ + σe2
The estimated β1 is close to 0.
This is referred to attenuation bias.
Correlation between unobservables and explanatory

variable (Bollen, 2012)
Military service and outcome (Angrist, 1990)

Police force size and crime rates (Levitt, 1997)
Institutional quality and GDP per capita (Acemoglu et al,
Schooling and income (Duflo, 2001)
Number of children and mothers’ labor supply (Angrist, 1998)
The consequence of an endogenous T

Recall the key assumption of zero conditional mean:

E [u | T ] = 0

If T is endogenous, then:

cov (T , u) 6= 0 =⇒ E [u | T ] 6= 0

Thus, the estimated coefficient is biased

h i
E β̂1 6= β1
Solution to endogeneity

What would be the ideal solution?

An Instrumental Variable

A solution to the endogeneity problem is to find an

instrumental variable (IV)
“IV methods solve the problem of missing or unknown control
variables, much as a randomized trial obviates extensive
controls in a regression.” (Angrist and Pischke 2009, 115).
Consider a simple model:

Yi = β0 + β1 Ti + ui

and cov (T , u) 6= 0 or T is . . .
We need to find an instrument Z for T such that:
Z is relevant for explaining T , Cov (Z , T ) 6= 0
Z is not correlated with the unobservables Cov (Z , u) = 0
This suggest that Z can affect Y only through X .
Valid and invalid instruments

Sky is the limit

Military service and income (Angrist, 1990)

IV: draft lottery
Police force size and crime rates (Levitt, 1997)
IV: year to election
Institutional quality and GDP per capita (Acemoglu et al,
IV: settler’s mortality rate
Schooling and income (Duflo, 2001)
IV: INPRES program
Having two or more children and mothers’ labor supply
(Angrist, 1998)
IV: first two children are of the same sex
1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Two important aspects of IV estimation

Relevance and exogeneity assumptions and IV estimates

Standard errors of IV estimate
If we satisfy the assumptions, then:

Cov (Y , Z )
β1 =
Cov (T , Z )

The IV estimator of β1 is:

i=1 Yi − Ȳ Zi − Z̄
β̂1,IV = P   
i=1 Ti − T̄ Zi − Z̄

If Cov (Z , U) 6= 0, then β̂1,IV is biased
The bias can be large if the correlation between T and Z is
“small” (Z is weak)
Standard errors
The standard error from OLS is:
  σ̂ 2
Var β̂OLS =
The standard error from IV is:
  σ̂ 2
Var β̂IV =

  σ̂ 2 σ̂ 2  
Var β̂OLS = ≤ = Var β̂IV

If Z is not relevant or if the correlation is small =⇒ RT2 ,Z is

small =⇒ the IV estimator has a quite large standard error!
Simulation with Stata

1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Simulation with Stata

Simulation with Stata

Set the number of observations:

set obs 1000
Generate the instrument:
gen Z = rnormal()
Generate the error term:
gen U = rnormal()
This implies that Cov (Z , U) = 0. Thus, we fulfill the
exogeneity assumption
Simulation with Stata

Simulation with Stata

The variable of interest T is correlated with both the

instrument and the error term:
gen T = 5*rnormal() + D*Z + 10*U
Play around with D to obtain weak or strong instruments:
5*Z, 10*Z, 0.2*Z
Simulate the outcome of interest
gen Y = 5 + 8*T + 5*U
Simulation with Stata

Simulation with Stata

Compare the results from OLS regression:

reg Y T
and those from IV regression:
ivregress 2sls Y (T = Z), first
Simulation with Stata

Test Batteries

Test of endogeneity:
H0 : T is exogeneous
Stata: estat endog
Reject H0 if p−value > 0.05
Test of relevance
H0 : Z is a weak instrument
Stata: estat firststage
Reject H0 if p−value > 0.05
1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Estimation in the multivariate case

Consider a more general case

Yi = β0 + β1 Ti + γ1 Xi1 + · · · + γik Xik + ui

Suppose that T is endogenous, k ∗ = 1

Can also extend to case of more than 1 endogenous variable:
T and X1 , k ∗ = 2
Rank condition for identification: we need l ≥ k ∗ instruments,
and each endogenous variable must have its own instrument.
If these conditions are met, we can use two–stage least
squares (2SLS).
1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
2SLS approach

First stage: Regress each endogenous variable on all of the

instruments and all of the exogenous variables.
Note: Check to make sure that the first stage predicts the
endogenous variable reasonably well.
Second stage: Regress Y on the fitted values of the
endogenous variables as well as all of the exogenous variables.
Think about the following model:

wage = β0 + β1 educ + γ1 exp + γ2 exp 2 + u

educ is endogenous, i.e. Cov (educ, u) 6= 0

To estimate α1 , we introduce an IV: father education and
mother education
First stage: regress educ on feduc, meduc exp and exp 2
Calculate predicted value educ
Second stage: regress wage on educ [ exp and exp 2 , and the
coefficient of educ is desired β̂1
Test Batteries

Test of endogeneity:
H0 : T is exogeneous
Stata: estat endog
Reject H0 if p−value > 0.05
Test of relevance
H0 : Z is a weak instrument
Stata: estat firststage
Reject H0 if p−value > 0.05
Test of overidentification restriction
H0 : overidentification restriction satisfied
Stata: estat overid
Results in the literature

Results in the literature

Classic IV estimates population average causal effects under

very strong assumptions
constant treatment effect assumptions
We can only identify average effects of subpopulations
induced by the instrument to change the value of endogenous
Think of a randomly selected subpopulations to receive a
The average effect is referred to as the local average
treatment effect (LATE)
We must take caution on the intepretation of LATE

The data is informative for the complier subpopulations, but
not the other subpopulations.
Formal Setup

1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Formal Setup

Potential outcomes framework

Use notations from AIR (1996), slightly different with the

ones we’ve used in previous meeting
The potential outcomes for unit i is Yi (Di ):
Yi (0): outcome if i did not receive the program or
intervention or status
Yi (1): outcome if i received the program or intervention or
The causal effect of the program is:

Yi (1) − Yi (0)

Counterfactual problems → missing data problem

Formal Setup

Potential outcomes framework

Let Yi (d, z) be the potential outcome of individual i with:

treatment status Di = d
instrumental value Zi = z.
Angrist (1990). Treatment status is veteran status and the
instrument is draft eligibility status.
The causal effect of veteran status given i’s draft eligibility
status (Zi ) is:
Yi (1, Zi ) − Yi (0, Zi )
The causal effect of draft eligibility status given i’s veteran
status (Di ) is:
Yi (Di , 1) − Yi (Di , 0)
Formal Setup

Extending the potential outcome framework

Let Yi (d, z) be the potential outcome of individual i with:

treatment status Di = d
instrumental value Zi = z.
Angrist (1990). Treatment status is veteran status and the
instrument is draft eligibility status.
The causal effect of veteran status given i’s draft eligibility
status (Zi ) is:
Yi (1, Zi ) − Yi (0, Zi )
The causal effect of draft eligibility status given i’s veteran
status (Di ) is:
Yi (Di , 1) − Yi (Di , 0)
Formal Setup

Extending the potential outcome framework

AIR use a double indexing on the outcome Yi

From here on, we will discuss how IV initiates a causal chain:
the instrument Zi affects treatment status Di
treatment status Di then affects outcomes Yi
Formal Setup

Potential treatment status

Define potential treatment status:

D1i is i 0 s treatment status when Zi = 1
D0i is i 0 s treatment status when Zi = 0
Given these notations, the observed treatment status is:

Di = D0i + (D1i − D0i ) Zi

What is the causal effect of Zi on Di ?

Assumptions of LATE

1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Assumptions of LATE

A1: Independence Assumption

The instrument is independent of the vector of potential outcomes
and potential treatment assignments:

[{Yi (d, z) ; ∀d, z} , D1i , D0i ] ⊥ Zi

Assumptions of LATE

A1: Independence Assumption

Can random assignment guarantees this?

Assumptions of LATE

A1: Independence Assumption

Independence implies that in the first stage:

E [Di | Zi = 1] − E [Di | Zi = 0] = E [D1i | Zi = 1] − E [D0i | Zi = 0]

= E [D1i − D0i ]

which is the average causal effect of Zi on Di .

Assumptions of LATE

A1: Independence Assumption

Furthermore, independence implies that in the reduced form:

E [Yi | Zi = 1] − E [Yi | Zi = 0] = E [Yi (D1i , 1) | Zi = 1] − E [Yi (D0

= E [Yi (D1i , 1) − Yi (D0i , 0)]

which is the causal effect of Zi on Yi .

Does this mean we have identified the treatment effect?
Assumptions of LATE

A2: Exclusion Restriction

Exclusion restriction implies that:

Yi (d, 0) = Yi (d, 1)

for d = 0, 1.
Assumptions of LATE

A2: Exclusion Restriction

We can also write the assumption as:

Yi (1, 1) = Yi (1, 0) ≡ Y1i

Yi (0, 1) = Yi (0, 0) ≡ Y0i

The instrument Zi affect Yi through a single causal channel
Di .
Discuss the potential problem
Example: educational deferments in draft eligibility (Angrist &
Krueger, 1992; Card & Lemieux, 2001)
Assumptions of LATE

A2: Exclusion Restriction

This restriction means that the observed outcome Yi can be

written as:

Yi = Yi (0, Zi ) + [Yi (1, Zi ) − Yi (0, Zi )] Di

= Y0i + (Y1i − Y0i ) Di
Assumptions of LATE

A3: First Stage

Recall that

E [Di | Zi = 1] − E [Di | Zi = 0] = E [D1i − D0i ]

The assumption says that:

E [D1i − D0i ] 6= 0.
Assumptions of LATE

Compliance types

Note that there are four potential compliance types:

Di (0)
0 1
0 never-taker defier
Di (1)
1 complier always-taker
Assumptions of LATE

Compliance type by treatment and instrument

Note that there are four potential compliance types:

0 1
0 complier/never-taker never-taker/defier
1 always-taker/defier complier/always-taker
Assumptions of LATE

A4: Monotonicity

For all z, w ,
Di (z) ≥ Di (w )
Di (z) ≤ Di (w )
for all i.
Assumptions of LATE

A4: Monotonicity

Monotonicity implies:

D1i − D0i ≥ 0
π1i ≥ 0

for all i or
π1i ≤ 0
Those affected by the instrument are affected in the same way.
Assumptions of LATE

A4: Monotonicity

Using latent-index model:

1 γ0 + γ1 Zi ≥ εi
Di =
0 otherwise


D0i = 1 [γ0 > εi ]

D1i = 1 [γ0 + γ1 > εi ]

which satisfy monotonicity

Assumptions of LATE


Note that these assumptions partition the population:

Compliers (D1i > D0i )
Always-takers (D1i = D0i = 1)
Never-takers (D1i = D0i = 0)
Note that IV is informative on the compliers only. Why?
Assumptions of LATE

IV Estimand

By A2:

Yi = Y0i + (Y1i − Y0i ) Di

and taking the conditional average

E [Yi | Zi = 1] − E [Yi | Zi = 0] = E [Y0i + (Y1i − Y0i ) D1i | Zi = 1]

− E [Y0i + (Y1i − Y0i ) D0i | Zi = 0]

By A1:

E [Yi | Zi = 1] − E [Yi | Zi = 0] = E [(Y1i − Y0i ) (D1i − D0i )]

The causal effect of Z on Y is the product of causal effect of

D on Y and the causal effect of Z on D.
Assumptions of LATE

IV Estimand

Then, by law of iterated expectation:

= E [(Y1i − Y0i ) | D1i − D0i = 1] P (D1i − D0i = 1)

− E [(Y1i − Y0i ) | D1i − D0i = −1] P (D1i − D0i = −1)

Monotonicity rule out defiers:

= E [(Y1i − Y0i ) | D1i − D0i = 1] P (D1i − D0i = 1)

= E [(Y1i − Y0i ) | D1i > D0i ] P (D1i > D0i )

Technically: the average causal effect of D on Y is the

product of the average causal effect of D on Y for individuals
with Di (0) = 0 and Di (1) = 1 and their proportion in the
Assumptions of LATE

Local Average Treatment Effects Theorem

The LATE theorem. Suppose: (A1, independence), (A2,
exclusion), (A3, first stage), and (A4, monotonicity), then:

E [Yi | Zi = 1] − E [Yi | Zi = 0]
= E [Y1i − Y0i | D1i > D0i ]
E [Di | Zi = 1] − E [Di | Zi = 0]
Assumptions of LATE

LATE theorem

In the veteran status example, the IV estimates capture the

effect of military service on men who served because of
draft eligibility (who would not otherwise have served).
In other word, the IV estimates captures the effect of a
treatment on those whose treatment status is changed by
the instrument.
Assumptions of LATE

LATE theorem

Note that IV is informative about the average effects among

the compliers
IV is not informative about the average effects among never
takers and always takers.
Compliant subpopulation

1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Compliant subpopulation

LATE theorem

Note that we can also consider one-sided compliance:

Di (0) = 0.
Units assigned to the control group never receive a treatment
Units assigned to the treatment group can decline to take it.
Who remains?
Compliant subpopulation

LATE theorem

Note that:

{Di = 1} = {D0i = D1i = 1} ∪

{{D1i − D0i = 1} ∩ {Zi = 1}}

Treated individuals = always-takers + compliers assigned

Zi = 1
Then, treatment on the treated (TOT) is a weighted average
of effects on always-takers and compliers

E [Yi | Zi = 1] − E [Yi | Zi = 0] ITT effect

= = E [Y1i − Y0i |
E [Di | Zi = 1] compliance rate
Anindita and Sahadewo (Forthcoming, 2019)

1 Instrumental Variable Estimation

2 Instrumental Variable
Simulation with Stata
3 IV in the multivariate case
4 Local Average Treatment Effects
Formal Setup
Assumptions of LATE
Compliant subpopulation
5 Cash Transfer for Poor Students Program
Anindita and Sahadewo (Forthcoming, 2019)
Anindita and Sahadewo (Forthcoming, 2019)

Motivation I

Disadvantaged children and youth continue to face difficulties

accessing and performing in schools
High out-of-pocket and opportunity costs of education hinders
students from completing schools
76% of students dropped out due to economic reason
8.7% do so because they had to work
A social welfare program that relaxes the financial constraints
faced by these households is crucial
Anindita and Sahadewo (Forthcoming, 2019)

Motivation II

Mixed evidence on the impact of conditional cash transfer on

household education spending
Gitter and Barham (2008), Soares, Ribas, and Osorio (2010)
Handa, Peterman, Davis, and Stampini (2009)
In Indonesia, there is no evidence on the effect of the cash
transfers for poor students program (Bantuan Siswa Miskin,
BSM) on household education spending
Students and households decide on how to spend their
No formal monitoring or incentive by the government
BSM affects students outcomes if it relaxes their households’
budget constraint
Anindita and Sahadewo (Forthcoming, 2019)

Research Questions

Does BSM affect household education spending?

If so, what are the spending categories affected the most by
the BSM?
Anindita and Sahadewo (Forthcoming, 2019)

Bantuan Siswa Miskin

Anindita and Sahadewo (Forthcoming, 2019)

Bantuan Siswa Miskin –> Kartu Indonesia Pintar

Anindita and Sahadewo (Forthcoming, 2019)

Bantuan Siswa Miskin Program

The objectives of this program are:

Provide support for disadvantaged youth
Support students completing the required learning activities
To motivate students to work harder at school
Reduce the burden of poor parents in meeting the full costs of
The program provides cash for:
Books, uniforms, sport equipment, shoes, teaching materials
for practice subjects, transportation
Anindita and Sahadewo (Forthcoming, 2019)

Bantuan Siswa Miskin Program

The amount of BSM cash varies by schooling level:

Elementary school: Rp450,000 ($43)
Junior high school: Rp750,000 ($72)
Senior high school: Rp1,000,000 ($96)
Households decide themselves on how to spend the BSM cash
Anindita and Sahadewo (Forthcoming, 2019)

Targeting of BSM Beneficiaries

Targeting was a combination of: Household-based targeting:

Social Protection Card (KPS) School-based targeting
Schools nominate students to Regional Education Offices if:
Students received BSM-beneficiary candidate cards
Met eligibility criteria: whether students were orphans,
students’ head of households were laid off, students’
households were beneficiaries of the PKH program, or
students’ households were victims of natural disaste
Students targeted by schools received BSM if there were
quota left in their region.
Our study exploit the randomness in this process to identify
the effect of the BSM program
Anindita and Sahadewo (Forthcoming, 2019)

Data and methodology

We compile a longitudinal data using:

4th wave of the Indonesian Family Life Survey (Strauss,
Witoelar, Sikoki, and Wattie, 2009)
5th wave of the Indonesian Family Life Survey (Strauss,
Witoelar, and Sikoki, 2016)
Provides: detailed education spending: uniform, school
tuition, transportation, stipend, and boarding variables of
interest: whether a household received BSM and eligibility
criteria households’ and students’ characteristics
We use first difference instrumental variable (FDIV) model to
estimate the effects of the BSM program on household
education spending
Anindita and Sahadewo (Forthcoming, 2019)

Empirical Strategy

The basic unobserved effects model is:

shareit = α + β1 BSMit + β2 KPSit + ΓXit + ci + uit

Why would β1 be biased if we use OLS?

Anindita and Sahadewo (Forthcoming, 2019)

Empirical Strategy

The basic unobserved effects model is:

∆sharei = α + β1 ∆BSMi + β2 ∆KPSi + Γ∆Xi + ∆ui

Can we be confident that β1 is now unbiased?

Anindita and Sahadewo (Forthcoming, 2019)

Empirical Strategy

The basic unobserved effects model is:

∆sharei = α + β1 ∆BSMi + β2 ∆KPSi + Γ∆Xi + ∆ui

We can exploit the BSM selecton process as instrumental


∆BSMi = θ1 PKHi +θ2 DISASTERi +θ3 LAIDOFFi +θ4 ORPHANi +

θ5 DISABILITYi + ρ∆KPSi + Θ∆Xi + ∆i

Anindita and Sahadewo (Forthcoming, 2019)

Anindita and Sahadewo (Forthcoming, 2019)

Anindita and Sahadewo (Forthcoming, 2019)

Anindita and Sahadewo (Forthcoming, 2019)


We find that the conditional cash transfer for education has a

significant effect on household education spending
This is evidence that poor households faced budget
constraints and the transfer relaxed these constraints
Households spent the BSM cash for the intended purpose

