Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Difference-in-difference

Rus’an Nasrudin

Nov 2, 2021

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 1 / 38


1 Introduction

2 Example

3 Theory

4 Standard Error

5 Extension

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 2 / 38


Introduction

Table of Contents

1 Introduction

2 Example

3 Theory

4 Standard Error

5 Extension

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 3 / 38


Introduction

The terms and models in DiD

Canonical DiD

yi = α + β1 AF T ERi + β2 T REAT EDi + β3 AF T ERi · T REAT EDi + εi

General TWFE (Two-way fixed effects)

yit = βDit + Xit γ + αi + δt + εit

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 4 / 38


Introduction

History

First application of DiD was by John Snow (1855)


He used this method to discover how residents of London were being
infected with cholera
He compares the death rates from cholera in districts served by two
water companies
In 1849 both companies obtained water from the dirty Thames
in 1852, one of them, moved water works upriver to an area free of
sewage
The death rates fell sharply in district served by this water company
compared to whom served by two others.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 5 / 38


Example

Table of Contents

1 Introduction

2 Example

3 Theory

4 Standard Error

5 Extension

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 6 / 38


Example

Example 1: The effect of monetary policy

Richardson and Troost (2009) studied the impact of monetary policy during the great
depression of 1930s in the US1 .
Ease money policy is, rarely, an random event. Fortunately, they had two districts
observation with (Atlanta) and without (St. Louis) ease money policy.
It allows to evaluate its impact on banking and economic performance.
They found that central bank intervention influenced bank health, credit availability and
business activity.

1
Richardson, Gary, and William Troost. ”Monetary intervention mitigated banking panics during the great
depression: quasi-experimental evidence from a federal reserve district border, 1929–1933.” Journal of Political
Economy 117.6 (2009): 1031–1073.
Rus’an Nasrudin Difference-in-difference Nov 2, 2021 7 / 38
Example

Graphical illustration

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 8 / 38


Example

The algebra
Let Ydt be the number of banks open in district d and time t. d = {Atlanta, St.Louis} and
t = {1930, 1931}.

δDD = (YA,1931 − YA,1930 ) − (YS,1931 − YS,1930 )


= (121 − 135) − (132 − 165)
= −14 − (−33) = 19

Or

δDD = (YA,1931 − YS,1931 ) − (YA,1930 − YS,1930 )


= (121 − 132) − (135 − 165)
= −11 − (−30) = 19

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 9 / 38


Example

Counterfactual

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 10 / 38


Example

Example 2: The impact of Askeskin program

In 2005, a first step to have a universal health insurance had began in Indonesia.
The nation-wide health insurance for the poor was introduce, namely Askeskin.
Sparrow et al. (2013) utilise mini Susenas to evaluate the programme impact on health
care access and health spending of the beneficiaries to the comparison group2 .
They found that Askeskin increases health care access measured by outpatient utilisation
with side effect of the increasing out of pocket spending for those who received it.

2
Sparrow, Robert, Asep Suryahadi, and Wenefrida Widyanti. ”Social health insurance for the poor:
Targeting and impact of Indonesia’s Askeskin programme.” Social science & medicine 96 (2013): 264–271.
Rus’an Nasrudin Difference-in-difference Nov 2, 2021 11 / 38
Example

Example 3: Long-term impact of SD Inpres

Duflo (2001) estimates the schooling and labour market consequences of SD Inpres
programme in Indonesia in the 1970s.
Exploiting the regional variations of the school construction, she found that:
Each primary school constructed per 1000 children led to an average increase of 0.12 to
0.19 years of education.
The programme, also, corresponds to 1.5 to 2.7 percent increase in wage.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 12 / 38


Example

Impact of SD Inpres

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 13 / 38


Theory

Table of Contents

1 Introduction

2 Example

3 Theory

4 Standard Error

5 Extension

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 14 / 38


Theory

Observed and potential outcome


Suppose YtiT as the outcome with subscript t = {1, 2} denotes time and superscript T = {0, 1}
represents treatment status. T = 1 is treatment group and, otherwise, is the control group.
Assume a treatment heterogeneity model:

βi = β + ηi

The expected observed outcomes or the conditional expectations are:

E[Y2i1 |T = 1] = α2 + β + E[ηi |T2i = 1] + θi1 + E[µ2i |T2i = 1]


E[Y1i1 |T = 1] = α1 + θi1 + E[µ1i |T2i = 1]
E[Y2i0 |T = 0] = α2 + θi0 + E[µ2i |T2i = 0]
E[Y1i0 |T = 0] = α1 + θi0 + E[µ1i |T2i = 0]

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 15 / 38


Theory

OLS specification of DiD

OLS is similar to taking differences in observed changes over time in the average outcomes for the treatment
(T = 1) and the control group (T = 0).
The Difference-in-Difference estimator takes form:

β̂DiD = E[Y2i1 − Y1i1 |T2i = 1] − E[Y2i0 − Y1i0 |T2i = 0]


= E[Y2i1 |T2i = 1] − E[Y1i1 |T2i = 1] − E[Y2i0 |T2i = 0] + E[Y1i0 |T2i = 0]
= β + E[ηi |T2i = 1] + E[µ2i − µ1i |T2i = 1] − E[µ2i − µ1i |T2i = 0]

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 16 / 38


Theory

Unbiased DiD Estimator

Theorem 3.1 (Properties)


DiD estimator unbiased under the paralel trend assumption or µ2i − µ1i ⊥ T2i ;
It identifies AT T = β + E[ηi |T2i = 1];
It identifies AT T = AT E = β under assumption E[ηi ⊥ T2i ] ;

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 17 / 38


Theory

Visualisation of double difference

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 18 / 38


Theory

Visualisation of double difference

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 19 / 38


Theory

Implementation

The DiD estimator is β3 in the regression:

yit = α + β1 t + β2 T2t + β3 t · T2t + εit

Proof:

E[Y2i1 − Y1i1 |T2i = 1] − E[Y2i0 − Y1i0 |T2i = 0]


= [(α + β1 + β2 + β3 ) − (α + β2 )] − [(α + β1 ) − (α)]
= β3

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 20 / 38


Theory

Three ingredients

To sum up, the DiD regression has three ingredients:


A dummy for the treatment observation (T ). It varies across observation and it controls
for fixed differences between the units being compared.
A dummy for post-treatment period (t). It varies over time and it controls for the fact
that conditions change over time for all observations.
The interaction term T · t. It is generated by multiplying the two dummies, the coefficient
on this term is the DiD causal effect.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 21 / 38


Theory

Back to Example 2

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 22 / 38


Theory

Parallel trend assumption

An important test to defend as robustness of the estimate in the DiD set up is the parallel
trend assumption.
What to report? Yes, those two lines of observed outcome in the pre-treatment period.
Yet, it is hard to get. Thus, a qualitative argument is important.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 23 / 38


Standard Error

Table of Contents

1 Introduction

2 Example

3 Theory

4 Standard Error

5 Extension

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 24 / 38


Standard Error

Repeated outcome and serial correlation

One important note about the DiD estimate is that it is a special case of estimation with
panel data.
The repetitive nature of the dataset raises serial correlation meaning the values of variable for
nearby periods are likely to be similar.
When the dependent variable is serially correlated, the residuals often serially correlated as well.
A combination of these serial correlations changes the formula to calculate the standard errors.
Often, simple standard error formula will exaggerate the precision of the estimates.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 25 / 38


Standard Error

Clustered standard error

The robust standard error discussed in the CEF topic provides correct standard error for
heteroskedasticity.
As for the serially correlated challenge, the appropriate formula in this case is known as the
clustered standard error.
The rule of thumb is that the unit that serially correlated, i.e. observation, is the unit of the
cluster.
Formal discussion about the non-standard issue for standard error will be covered in meeting
14th.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 26 / 38


Extension

Table of Contents

1 Introduction

2 Example

3 Theory

4 Standard Error

5 Extension

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 27 / 38


Extension

1-A test for causality in the spirit of Granger (1969)

There are several tests to show the robustness of the treatment effect with DiD estimates.
One is the test that postulate past treatment predict current outcome while future treatment
does not. It turns that we expect a statistically insignificant estimate of β+ in the equation
like:
m
X Xq
yi,t = γi + λt + β−τ Ti,t−τ + β+τ Ti,t+τ + Xit δ + εit
τ =0 τ =1

It follows the spirit of Granger (1969) in which to see whether causes happen before
consequences and not vice versa.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 28 / 38


Extension

The pre-treatment placebo testing

In the two period observations, the test collapse into a pre-treatment placebo testing like:

yi,t−1 = α + βTi,t + Xi,t−1 δ + εi,t−1

Similar to above, we expect β to be not statistically significant to show the lead does not
affect outcome.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 29 / 38


Extension

2-Relaxing the parallel trend assumption

One robustness test used in DiD set up is the inclusion of subject specific time trend.
It allows treatment and control subject to follow different trend. The rule is if the estimated
effect of interest are unchanged by the inclusion of these trends, it is promising and
discouraging otherwise.
Note however, we only allow to do the test if we have at least 3 periods in our data.
Let’s see the example from section 5.2 of Mastering Metrics..

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 30 / 38


Extension

Ideal trend

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 31 / 38


Extension

Spurious DiD estimate

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 32 / 38


Extension

Stable coefficient

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 33 / 38


Extension

Implementation

Denote individual i located in district d who being exposed to treatment T based on those
location, the specification that captures different locational time trend is:

yidt = γ0 s + γ1st + λt + βTdt + Xidt δ + εidt

Where γ1st is district specific trend coefficient multiplied by time trend variable. The
specification allows treatment and control district to follow different trend (Autor 2003).

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 34 / 38


Extension

3-Check if treatment changes two groups composition

The next important discussion is about how to pick control?


The CIA intuition leads us to think about whether treatment changes composition of subgroup
within treatment and control group as a confounding factor?
If yes, we need to incorporate covariate to limit comparison between these subgroups
contaminating the DiD estimates.
For example Kugler, Jimeno and Hernanz (2005) who look at the effect of age specific
employment policies in Spain, or Meyer and Rosenbaum (2001) who look at the income
maintenance policy in which composition of individual changes by migration process.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 35 / 38


Extension

Implementation

Denote individual i located in district d belongs to subgroup s who being exposed to


treatment T based on those location. The treatment changes the composition of s, thus, the
specification that captures different group time trend is:

yisdt = γdt + λst + θas + βTsdt + Xisdt δ + εisdt

Where γdt is district specific trend coefficient that are common across subgroup, λat is the
time-varying subgroup effect, and θsd is the district-specific subgroup effects. This
triple-differences approach may generate a more convincing set of results in the presence of
composition changes.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 36 / 38


Extension

4-Avoid picking bad control

The last issue to talk about is bad control in DiD.


Likewise the set up in CIA approach, we need to avoid bad controls.
Specifically, we cannot include variable that are altered by the treatment variable in the
covariate.
One of the illustration taken from Card (1992) who investigates the effect of federal minimum
wage on teen wage and employment. If he would use adult employment to control for location
specific time trend effect, he needs to argue that this variable is unaltered by the minimum
wage policy.

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 37 / 38


Extension

Recent Development in DiD Estimates

Canonical 2 x 2 DiD no longer a valid estimator when we have multiple timing:


Goodman-Bacon, Andrew. “Difference-in-differences with variation in treatment
timing.”Journal of Econometrics (2021).
Callaway, Brantly, and Pedro HC Sant’Anna. “Difference-in-differences with multiple time
periods.”Journal of Econometrics (2020).

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 38 / 38

You might also like