Difference-In-Difference: Rus'an Nasrudin

Difference-in-difference
Rus’an Nasrudin
Nov 2, 2021
Rus’an Nasrudin Difference-in-difference Nov 2, 2021 1 / 38

1 Introduction
2 Example
3 Theory
4 Standard Error
5 Extension

Introduction
Table of Contents
1 Introduction
2 Example
3 Theory
4 Standard Error
5 Extension

Introduction
The terms and models in DiD
Canonical DiD
yi = α + β1 AF T ERi + β2 T REAT EDi + β3 AF T ERi · T REAT EDi + εi
General TWFE (Two-way fixed effects)
yit = βDit + Xit γ + αi + δt + εit

Introduction
History
First application of DiD was by John Snow (1855)

He used this method to discover how residents of London were being
infected with cholera
He compares the death rates from cholera in districts served by two
water companies
In 1849 both companies obtained water from the dirty Thames
in 1852, one of them, moved water works upriver to an area free of
sewage
The death rates fell sharply in district served by this water company
compared to whom served by two others.

Example
Table of Contents
1 Introduction
2 Example
3 Theory
4 Standard Error
5 Extension

Example
Example 1: The effect of monetary policy
Richardson and Troost (2009) studied the impact of monetary policy during the great
depression of 1930s in the US1 .
Ease money policy is, rarely, an random event. Fortunately, they had two districts
observation with (Atlanta) and without (St. Louis) ease money policy.
It allows to evaluate its impact on banking and economic performance.
They found that central bank intervention influenced bank health, credit availability and
business activity.
1
Richardson, Gary, and William Troost. ”Monetary intervention mitigated banking panics during the great
depression: quasi-experimental evidence from a federal reserve district border, 1929–1933.” Journal of Political
Economy 117.6 (2009): 1031–1073.
Example
Graphical illustration

Example
The algebra
Let Ydt be the number of banks open in district d and time t. d = {Atlanta, St.Louis} and
t = {1930, 1931}.
δDD = (YA,1931 − YA,1930 ) − (YS,1931 − YS,1930 )

= (121 − 135) − (132 − 165)
= −14 − (−33) = 19
Or
δDD = (YA,1931 − YS,1931 ) − (YA,1930 − YS,1930 )

= (121 − 132) − (135 − 165)
= −11 − (−30) = 19

Example
Counterfactual

Example
Example 2: The impact of Askeskin program
In 2005, a first step to have a universal health insurance had began in Indonesia.
The nation-wide health insurance for the poor was introduce, namely Askeskin.
Sparrow et al. (2013) utilise mini Susenas to evaluate the programme impact on health
care access and health spending of the beneficiaries to the comparison group2 .
They found that Askeskin increases health care access measured by outpatient utilisation
with side effect of the increasing out of pocket spending for those who received it.
2
Sparrow, Robert, Asep Suryahadi, and Wenefrida Widyanti. ”Social health insurance for the poor:
Targeting and impact of Indonesia’s Askeskin programme.” Social science & medicine 96 (2013): 264–271.
Example
Example 3: Long-term impact of SD Inpres
Duflo (2001) estimates the schooling and labour market consequences of SD Inpres
programme in Indonesia in the 1970s.
Exploiting the regional variations of the school construction, she found that:
Each primary school constructed per 1000 children led to an average increase of 0.12 to
0.19 years of education.
The programme, also, corresponds to 1.5 to 2.7 percent increase in wage.

Example
Impact of SD Inpres

Theory
Table of Contents
1 Introduction
2 Example
3 Theory
4 Standard Error
5 Extension

Theory
Observed and potential outcome

Suppose YtiT as the outcome with subscript t = {1, 2} denotes time and superscript T = {0, 1}
represents treatment status. T = 1 is treatment group and, otherwise, is the control group.
Assume a treatment heterogeneity model:
βi = β + ηi
The expected observed outcomes or the conditional expectations are:
E[Y2i1 |T = 1] = α2 + β + E[ηi |T2i = 1] + θi1 + E[µ2i |T2i = 1]

E[Y1i1 |T = 1] = α1 + θi1 + E[µ1i |T2i = 1]
E[Y2i0 |T = 0] = α2 + θi0 + E[µ2i |T2i = 0]
E[Y1i0 |T = 0] = α1 + θi0 + E[µ1i |T2i = 0]

Theory
OLS specification of DiD
OLS is similar to taking differences in observed changes over time in the average outcomes for the treatment
(T = 1) and the control group (T = 0).
The Difference-in-Difference estimator takes form:
β̂DiD = E[Y2i1 − Y1i1 |T2i = 1] − E[Y2i0 − Y1i0 |T2i = 0]

= E[Y2i1 |T2i = 1] − E[Y1i1 |T2i = 1] − E[Y2i0 |T2i = 0] + E[Y1i0 |T2i = 0]
= β + E[ηi |T2i = 1] + E[µ2i − µ1i |T2i = 1] − E[µ2i − µ1i |T2i = 0]

Theory
Unbiased DiD Estimator
Theorem 3.1 (Properties)

DiD estimator unbiased under the paralel trend assumption or µ2i − µ1i ⊥ T2i ;
It identifies AT T = β + E[ηi |T2i = 1];
It identifies AT T = AT E = β under assumption E[ηi ⊥ T2i ] ;

Theory
Visualisation of double difference

Theory
Visualisation of double difference

Theory
Implementation
The DiD estimator is β3 in the regression:
yit = α + β1 t + β2 T2t + β3 t · T2t + εit
Proof:
E[Y2i1 − Y1i1 |T2i = 1] − E[Y2i0 − Y1i0 |T2i = 0]

= [(α + β1 + β2 + β3 ) − (α + β2 )] − [(α + β1 ) − (α)]
= β3

Theory
Three ingredients
To sum up, the DiD regression has three ingredients:

A dummy for the treatment observation (T ). It varies across observation and it controls
for fixed differences between the units being compared.
A dummy for post-treatment period (t). It varies over time and it controls for the fact
that conditions change over time for all observations.
The interaction term T · t. It is generated by multiplying the two dummies, the coefficient
on this term is the DiD causal effect.

Theory
Back to Example 2

Theory
Parallel trend assumption
An important test to defend as robustness of the estimate in the DiD set up is the parallel
trend assumption.
What to report? Yes, those two lines of observed outcome in the pre-treatment period.
Yet, it is hard to get. Thus, a qualitative argument is important.

Standard Error
Table of Contents
1 Introduction
2 Example
3 Theory
4 Standard Error
5 Extension

Standard Error
Repeated outcome and serial correlation
One important note about the DiD estimate is that it is a special case of estimation with
panel data.
The repetitive nature of the dataset raises serial correlation meaning the values of variable for
nearby periods are likely to be similar.
When the dependent variable is serially correlated, the residuals often serially correlated as well.
A combination of these serial correlations changes the formula to calculate the standard errors.
Often, simple standard error formula will exaggerate the precision of the estimates.

Standard Error
Clustered standard error
The robust standard error discussed in the CEF topic provides correct standard error for
heteroskedasticity.
As for the serially correlated challenge, the appropriate formula in this case is known as the
clustered standard error.
The rule of thumb is that the unit that serially correlated, i.e. observation, is the unit of the
cluster.
Formal discussion about the non-standard issue for standard error will be covered in meeting
14th.

Extension
Table of Contents
1 Introduction
2 Example
3 Theory
4 Standard Error
5 Extension

Extension
1-A test for causality in the spirit of Granger (1969)
There are several tests to show the robustness of the treatment effect with DiD estimates.
One is the test that postulate past treatment predict current outcome while future treatment
does not. It turns that we expect a statistically insignificant estimate of β+ in the equation
like:
m
X Xq
yi,t = γi + λt + β−τ Ti,t−τ + β+τ Ti,t+τ + Xit δ + εit
τ =0 τ =1
It follows the spirit of Granger (1969) in which to see whether causes happen before
consequences and not vice versa.

Extension
The pre-treatment placebo testing
In the two period observations, the test collapse into a pre-treatment placebo testing like:
yi,t−1 = α + βTi,t + Xi,t−1 δ + εi,t−1
Similar to above, we expect β to be not statistically significant to show the lead does not
affect outcome.

Extension
2-Relaxing the parallel trend assumption
One robustness test used in DiD set up is the inclusion of subject specific time trend.
It allows treatment and control subject to follow different trend. The rule is if the estimated
effect of interest are unchanged by the inclusion of these trends, it is promising and
discouraging otherwise.
Note however, we only allow to do the test if we have at least 3 periods in our data.
Let’s see the example from section 5.2 of Mastering Metrics..

Extension
Ideal trend

Extension
Spurious DiD estimate

Extension
Stable coefficient

Extension
Implementation
Denote individual i located in district d who being exposed to treatment T based on those
location, the specification that captures different locational time trend is:
yidt = γ0 s + γ1st + λt + βTdt + Xidt δ + εidt
Where γ1st is district specific trend coefficient multiplied by time trend variable. The
specification allows treatment and control district to follow different trend (Autor 2003).

Extension
3-Check if treatment changes two groups composition
The next important discussion is about how to pick control?

The CIA intuition leads us to think about whether treatment changes composition of subgroup
within treatment and control group as a confounding factor?
If yes, we need to incorporate covariate to limit comparison between these subgroups
contaminating the DiD estimates.
For example Kugler, Jimeno and Hernanz (2005) who look at the effect of age specific
employment policies in Spain, or Meyer and Rosenbaum (2001) who look at the income
maintenance policy in which composition of individual changes by migration process.

Extension
Implementation
Denote individual i located in district d belongs to subgroup s who being exposed to

treatment T based on those location. The treatment changes the composition of s, thus, the
specification that captures different group time trend is:
yisdt = γdt + λst + θas + βTsdt + Xisdt δ + εisdt
Where γdt is district specific trend coefficient that are common across subgroup, λat is the
time-varying subgroup effect, and θsd is the district-specific subgroup effects. This
triple-differences approach may generate a more convincing set of results in the presence of
composition changes.

Extension
4-Avoid picking bad control
The last issue to talk about is bad control in DiD.

Likewise the set up in CIA approach, we need to avoid bad controls.
Specifically, we cannot include variable that are altered by the treatment variable in the
covariate.
One of the illustration taken from Card (1992) who investigates the effect of federal minimum
wage on teen wage and employment. If he would use adult employment to control for location
specific time trend effect, he needs to argue that this variable is unaltered by the minimum
wage policy.

Extension
Recent Development in DiD Estimates
Canonical 2 x 2 DiD no longer a valid estimator when we have multiple timing:

Goodman-Bacon, Andrew. “Difference-in-differences with variation in treatment
timing.”Journal of Econometrics (2021).
Callaway, Brantly, and Pedro HC Sant’Anna. “Difference-in-differences with multiple time
periods.”Journal of Econometrics (2020).

Difference-In-Difference: Rus'an Nasrudin

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Difference-In-Difference: Rus'an Nasrudin

Uploaded by

Copyright:

Available Formats

Difference-in-difference

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 1 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 2 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 3 / 38

The terms and models in DiD

yi = α + β1 AF T ERi + β2 T REAT EDi + β3 AF T ERi · T REAT EDi + εi

General TWFE (Two-way fixed effects)

yit = βDit + Xit γ + αi + δt + εit

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 4 / 38

First application of DiD was by John Snow (1855)

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 5 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 6 / 38

Example 1: The effect of monetary policy

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 8 / 38

δDD = (YA,1931 − YA,1930 ) − (YS,1931 − YS,1930 )

δDD = (YA,1931 − YS,1931 ) − (YA,1930 − YS,1930 )

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 9 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 10 / 38

Example 2: The impact of Askeskin program

Example 3: Long-term impact of SD Inpres

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 12 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 13 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 14 / 38

Observed and potential outcome

The expected observed outcomes or the conditional expectations are:

E[Y2i1 |T = 1] = α2 + β + E[ηi |T2i = 1] + θi1 + E[µ2i |T2i = 1]

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 15 / 38

OLS specification of DiD

β̂DiD = E[Y2i1 − Y1i1 |T2i = 1] − E[Y2i0 − Y1i0 |T2i = 0]

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 16 / 38

Unbiased DiD Estimator

Theorem 3.1 (Properties)

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 17 / 38

Visualisation of double difference

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 18 / 38

Visualisation of double difference

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 19 / 38

The DiD estimator is β3 in the regression:

yit = α + β1 t + β2 T2t + β3 t · T2t + εit

E[Y2i1 − Y1i1 |T2i = 1] − E[Y2i0 − Y1i0 |T2i = 0]

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 20 / 38

To sum up, the DiD regression has three ingredients:

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 21 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 22 / 38

Parallel trend assumption

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 23 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 24 / 38

Repeated outcome and serial correlation

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 25 / 38

Clustered standard error

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 26 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 27 / 38

1-A test for causality in the spirit of Granger (1969)

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 28 / 38

The pre-treatment placebo testing

yi,t−1 = α + βTi,t + Xi,t−1 δ + εi,t−1

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 29 / 38

2-Relaxing the parallel trend assumption

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 30 / 38

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 31 / 38

Spurious DiD estimate

Rus’an Nasrudin Difference-in-difference Nov 2, 2021 32 / 38