Professional Documents
Culture Documents
Diff - Simplifying The Estimation of Difference-In-difference Treatment Effects
Diff - Simplifying The Estimation of Difference-In-difference Treatment Effects
Diff - Simplifying The Estimation of Difference-In-difference Treatment Effects
Abstract. In this article, I present the features of the user-written command diff,
which estimates difference-in-differences (DID) treatment effects. diff simplifies
the DID analysis by allowing the conventional DID setting to be combined with
other nonexperimental evaluation methods. The command is equipped with an
attractive set of options: the single DID with covariates, the kernel propensity-score
matching DID, and the quantile DID. Specific options are included to obtain DID
estimation on a repeated cross-section setting and to test the general balancing
properties of the model. I illustrate the features of diff using a sample of the
dataset from the pioneering implementation of DID by Card and Krueger (1994,
American Economic Review 84: 772–793).
Keywords: st0424, diff, difference-in-differences, causal inference, kernel propensity
score, quantile treatment effects, nonexperimental methods, DID, QDID
1 Introduction
There is a growing body of literature using difference-in-differences (DID) treatment
effects as a reliable nonexperimental evaluation method.1 DID estimation has been
widely used when panel data or repeated cross-sections are available for intervention
impact assessments. A key aspect of DID is that it facilitates the causal inference anal-
ysis of an intervention when time-invariant unobserved heterogeneity might confound a
causal-effect analysis (Abadie 2005; Angrist and Pischke 2009). Different specifications
of the DID model can also account for observed heterogeneity and can incorporate other
nonexperimental evaluation methods into the analysis.
Despite the availability of other plausible methods based on the existence of obser-
vational data for nonexperimental causal inference (that is, matching methods, instru-
mental variables, regression discontinuity, etc.), DID estimation offers an alternative by
reaching unbiased results while accounting for time-invariant unobserved heterogeneity.
Four elements are specific to the DID setting (see figure 1): the first one is the availability
of a treated group and control group; the second is the existence of parallel paths in the
pretreatment trends; the third is the clear time cutoff identifying when the treatment
starts, so there is a before and after period; and the fourth is the assumption that, with-
1. According to https://scholar.google.com (accessed in April 2015), while the number of academic
documents using DID was 136 in 2000, it had reached 2,990 in 2014.
c 2016 StataCorp LP st0424
J. M. Villa 53
out the treatment, the treated group would show a trend similar to that observed for
the control group. Thus the DID treatment effects are obtained when panel or repeated
cross-section data are available and a treatment has been administered.
Although the latest version of Stata is equipped with the command teffects, which
estimates the treatment effects on a cross-sectional basis, DID is based on the assessment
of an intervention’s impact on a given outcome variable in a before-and-after setting.
While DID treatment effects are focused on comparing treated and control groups sharing
common pretreatment trends, the options of the teffects command entail estimating
the average treatment effects, with special focus on the nearest-neighbor matching ap-
proach. Therefore, although existing nonexperimental evaluation methods can reach
different levels of internal and external validity (Dehejia 2013), the best method for
evaluating a given intervention depends on the characteristics of the available data.
In this article, I present the user-written command diff, which estimates DID treat-
ment effects. diff runs several types of DID estimation beyond basic single DID. diff is
attractive because it combines the single DID with control covariates, advanced match-
ing methods, and balancing-test analysis. By employing two-period panel data or re-
peated cross-sections, diff joins the DID treatment-effects estimation with the kernel
propensity-score matching following Heckman, Ichimura, and Todd (1997, 1998), and
Blundell and Dias (2009). This kernel propensity-score matching in diff follows the
algorithm of psmatch2 developed under a cross-sectional setting by Leuven and Sianesi
(2003). diff also allows estimation of the DID treatment effects at different quantiles for
the kernel matching and repeated cross-sections options (Meyer, Viscusi, and Durbin
1995). In this article, I provide details on implementing diff using a sample of the
dataset from Card and Krueger’s (1994) pioneering article on the effects of a natural
experiment consisting of a minimum-wage increase in the United States. Finally, I
explain how the balancing properties can be tested when information on covariates is
provided.
54 Simplifying the estimation of difference-in-differences treatment effects
This single DID can be combined with other nonexperimental evaluation methods.
Additional control covariates are important when observed heterogeneity may confound
the identification strategy. Given the features of DID estimation, observed covariates
should be exempt from the effects of the treatment. Thus, if observable covariates (Xi )
are available, they can be added into the analysis.
pi = E(Zi = 1|Xi )
According to Heckman, Ichimura, and Todd (1997), the kernel matching is given by
the propensity score, given the covariates, which leads to the calculation of the kernel
weights,
K pih−pn
k
wi =
(3)
pi −pk
K hn
in which K(·) is the kernel function and hn is the selected bandwidth. The kernel
weights are then introduced into (1) to obtain a kernel propensity-score matching DID
treatment effect as follows:
Now, to increase the internal validity of the DID estimand, one can restrict (4) to the
common support of the propensity score for treated and control groups. The common
support is the overlapping region of the propensity for treated and control groups. This
sample of i units can be restricted to the region defined as
(i : pi ∈ [max{min(pi |Zi = 1), min(pi |Zi = 0)}, min{max(pi |Zi = 1), min(pi |Zi = 0)}])
Complementarily, when treated and control units cannot be followed over the base-
line and follow-up periods, the DID treatment effects can be estimated with repeated
cross-sections. This is very common when a treatment has been administered to cer-
tain regional or demographic groups over several cross-sections. The kernel propensity-
score matching with repeated cross-section DID treatment effects is specified following
Blundell and Dias (2009).
c c
Here wit=0 and wit=1 are the kernel weights for the control group in the baseline and
t
follow-up periods, respectively, while wit=0 is the kernel weight for the treated group
in the baseline period. The three sets of kernel weights are calculated independently
56 Simplifying the estimation of difference-in-differences treatment effects
according to the estimated propensity score and do not require the panel structure of
the units in the sample.
Finally, the balancing property of the treated and the control can be tested. Given
the availability of observable covariates, it can be shown that in absence of the treatment,
the outcome variable is orthogonal to the treatment indicator given the set of covariates.
In other words, the balancing property can be tested in the baseline as
Yit=0 ⊥Zi |Xi (5)
Note that the balancing property is optional in the DID setting. The most important
assumption, which is not tested in this approach, is the complement of the parallel paths
of the outcome for the treated and the control groups. Given the availability of two
periods in this analysis, this assumption cannot be tested here. For an extension of this
test, see Mora and Reggio (2012).
2.1 Estimation
To estimate the expected values in (1), we rely on linear regression for the single DID
analysis. The subsequent complementary introduction of control variables or kernel
propensity-score matching weights is similarly specified by linear regression. In the
basic framework, the estimation can be shown as follows:
outcome vari = β0 +β1 ×period()i +β2 ×treated()i +β3 ×period()i ×treated()i +ei
Here outcome vari is the outcome variable for each unit; period()i is a binary vari-
able taking the value of 0 in the baseline and 1 in the follow-up periods; and treated()i
is a binary variable indicating the treatment status for each unit, similar to Zi = 1.
The expected values in (1) are obtained from the interaction of the estimated coef-
ficients. The estimated coefficients have the following interpretation:
DID estimation by providing the output table with the estimated coefficients and their
interactions.
3.1 Syntax
diff outcome var if in weight , period(varname) treated(varname)
cov(varlist) kernel id(varname) bw(#) ktype(kernel) rcs qdid(quantile)
pscore(varname) logit support addcov(varlist) cluster(varname) robust
bs reps(int) test report nostar export(filename)
The command requires the specification of the outcome variable (outcome var) and
allows the use of sampling weights.
The simplification of the diff command also consists of the arrangement of the
regression coefficients in the output table. The number of observations, R-squared, the
standard errors, the t statistic (or the z statistic when standard errors are bootstrapped),
and the p-value are also presented.
Baseline
Control β0
Follow-up
Diff-in-Diff β3
R-square: #.##
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
58 Simplifying the estimation of difference-in-differences treatment effects
3.2 Options
period(varname) specifies the binary period variable (0: baseline; 1: follow-up). Op-
tion period() is required.
treated(varname) specifies the binary treatment variable (0: controls; 1: treated).
Option treated() is required.
cov(varlist) allows the user to include control covariates in the model [Xi in (2)]. The
coefficients of the variables in cov(varlist) are not displayed in the output table.
They can be seen when option report is specified.
kernel performs the kernel propensity-score matching DID. This option generates the
variable weights, which contains the weights2 derived from the kernel propensity-
score matching, as well as generates ps when the propensity score is not supplied
in pscore(varname), following Leuven and Sianesi (2003). This option requires
specification of the id(varname) option except when the rcs option is also specified
(under the repeated cross-section setting).3 Under a panel or cross-sectional setting,
you can specify the support option with kernel to allow the estimation of the DID
on the common support.
id(varname) specifies the identification variable for each unit or individual when the
dataset is composed of a panel of treated and control groups. Option kernel requires
id().
bw(#) specifies the supplied bandwidth of the kernel function. The default is bw(0.06).
ktype(kernel) specifies the type of the kernel function. The types are epanechnikov
(the default), gaussian, biweight, uniform, and tricube.
rcs indicates that the kernel is set for repeated cross-section. This option does
not require option id(varname). Option rcs strongly assumes that covariates in
cov(varlist) do not vary over time.
qdid(quantile) performs the quantile difference-in-differences (QDID) estimation at the
specified quantile from 0.1 to 0.9 (quantile 0.5 performs the QDID at the median).
This option may be combined with kernel and cov(). qdid() does not support
weights or robust standard errors. This option uses the Stata commands qreg for
quantile nonlinear regressions and bsqreg for complementary bootstrapped standard
errors. See Angrist and Pischke (2009) for detailed information on quantile treat-
ment effects and Meyer, Viscusi, and Durbin (1995) for an illustrative example.
pscore(varname) specifies the supplied propensity score.
logit specifies logit estimation of the propensity score. The default is probit estimation.
The results of the probit estimation are used to predict the probability of being
treated, known as the propensity score, and then to calculate the kernel matching,
as in (3).
support performs diff on the common support of the propensity score given the option
kernel.
addcov(varlist) specifies additional covariates with those specified in the estimation of
the propensity score.
cluster(varname) estimates clustered standard errors by the specified category in
varname.
robust estimates robust standard errors following Stata’s sandwich-type estimation.
bs executes a bootstrapped estimation of standard errors.
reps(int) specifies the number of replications when the bs option is also specified. The
default is reps(50).
test performs a balancing t test of the difference in the means of the covariates between
the control and the treated groups in period() = 0. The option test combined
with kernel performs the balancing t test with the weighted covariates; see [R] ttest.
This option is one way to test (5).
report displays the inference of the included covariates or the estimation of the propen-
sity score when option kernel is specified.
nostar removes the inference stars from the p-values.
export(filename) exports the output table into the working directory in a .csv file.
See [D] cd for details.
60 Simplifying the estimation of difference-in-differences treatment effects
4 Example
To illustrate the use of diff, we use a downloadable dataset (included with the com-
mand) with a sample of the data used by Card and Krueger (1994).4 The data are from
a study by the authors on the impact of the increase in minimum wage in New Jersey
(the treated group) on the employment level in the fast-food industry. This interven-
tion took place in April 1992. They compare the changes in the number of employees
at fast-food restaurants in the treated group with those located in a neighboring state,
Pennsylvania (the control or untreated group). They conducted a baseline survey in
February 1992 and a follow-up in November.
4. This dataset is provided for illustration only. It might not be suitable for all diff options.
J. M. Villa 61
. use cardkrueger1994.dta
(Sample dataset from Card and Krueger (1994))
. describe
Contains data from cardkrueger1994.dta
obs: 780 Sample dataset from Card and
Krueger (1994)
vars: 8 12 Mar 2014 14:03
size: 11,700
With 780 observations, the number of units (or restaurants) is 314 and 76 in the
treated and the control groups (or states), respectively. The outcome variable is full-
time employment (fte). Some covariates are defined as binary variables, indicating
whether the observation belongs to a given fast-food restaurant. The basic statistics
are shown as follows:
Baseline
Control 20.013
Treated 17.069
Diff (T-C) -2.944 1.160 -2.54 0.011**
Follow-up
Control 17.523
Treated 17.518
Diff (T-C) -0.005 1.160 -0.00 0.997
R-square: 0.01
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
The baseline rows contain information on the mean outcome for each group as well
as each group’s single difference (−2.944 in this case). These estimators are presented
along with their standard errors, t statistics, and p-values. The same information is
displayed for the follow-up period. The last row is the DID treatment-effects estimand,
implying an increase in the number of employees by 2.939. The p-value is accompanied
by a star, which indicates the statistical inference at different significance levels, as
shown below the table (*** p < 0.01; ** p < 0.05; * p < 0.1). In this case, the DID
estimand is significant at the 10% level.
J. M. Villa 63
Baseline
Control 20.013
Treated 17.069
Diff (T-C) -2.944 1.468 -2.01 0.045**
Follow-up
Control 17.523
Treated 17.518
Diff (T-C) -0.005 1.216 -0.00 0.997
R-square: 0.01
- Bootstrapped Standard Errors
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
64 Simplifying the estimation of difference-in-differences treatment effects
Baseline
Control 21.342
Treated 19.003
Diff (T-C) -2.339 1.052 -2.22 0.026**
Follow-up
Control 18.852
Treated 19.452
Diff (T-C) 0.600 1.052 0.57 0.569
R-square: 0.19
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
Option report displays the output table of the coefficients and statistics from the
cov(varlist).
Baseline
Control 21.342
Treated 19.003
Diff (T-C) -2.339 1.052 -2.22 0.026**
Follow-up
Control 18.852
Treated 19.452
Diff (T-C) 0.600 1.052 0.57 0.569
R-square: 0.19
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
To view the first stage of the estimation of the propensity score, the user should
supply the report option.
. diff fte, treated(treated) period(t) cov(bk kfc roys) kernel id(id) report
KERNEL PROPENSITY SCORE MATCHING DIFFERENCE-IN-DIFFERENCES
Report - Propensity score estimation with probit command
Atention: _pscore is estimated at baseline
Iteration 0: log likelihood = -192.3521
Iteration 1: log likelihood = -191.15937
Iteration 2: log likelihood = -191.15777
Probit regression Number of obs = 390
LR chi2(3) = 2.39
Prob > chi2 = 0.4957
Log likelihood = -191.15777 Pseudo R2 = 0.0062
Matching iterations...
................................................................................
> ..............................................................................
> ..............................................................................
> ..............................................................................
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 780
Baseline Follow-up
Control: 76 76 152
Treated: 314 314 628
390 390
Baseline
Control 20.006
Treated 17.069
Diff (T-C) -2.937 0.959 -3.06 0.002***
Follow-up
Control 17.367
Treated 17.518
Diff (T-C) 0.151 0.959 0.16 0.875
R-square: 0.02
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
Baseline
Control 20.006
Treated 17.497
Diff (T-C) -2.508 0.961 -2.61 0.009***
Follow-up
Control 17.367
Treated 17.518
Diff (T-C) 0.151 0.961 0.16 0.875
R-square: 0.01
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
4.4 QDID
It sometimes is useful to assess the effects of the intervention over the distribution of the
outcome variable. diff provides this option on the DID setting. Here one would like to
know whether the effect of the increase in minimum wage was stronger for restaurants
with a low or high number of full-time employees. The QDID is then obtained when the
option qdid(quantile) is specified. For example, estimating the treatment effects on the
median of the number of full-time employees requires the following syntax:
Baseline
Control 17.250
Treated 17.250
Diff (T-C) 0.000 0.996 0.00 1.000
Follow-up
Control 17.750
Treated 17.750
Diff (T-C) 0.000 1.003 0.00 1.000
R-square: 0.15
- Values are estimated at the .5 quantile
**Inference: *** p<0.01; ** p<0.05; * p<0.1
By chance, when one accounts for covariates, at the 0.5 quantile, which is the same as
the median of the dependent variable, the value of the full-time employment variable
(fte) is similar for control and treated units in the baseline and follow-up periods.
Therefore, the result above indicates a DID effect of −1.407e-15 (very close to 0), but it
is rounded to −0.000 because of the numbering format of the table.
As with the single DID, the QDID can be combined with the kernel option (also in
repeated cross-sections).
. diff fte, treated(treated) period(t) qdid(0.50) cov(bk kfc roys) kernel
> id(id) report
KERNEL PROPENSITY SCORE MATCHING QUANTILE DIFFERENCE-IN-DIFFERENCES
Report - Propensity score estimation with probit command
Atention: _pscore is estimated at baseline
Iteration 0: log likelihood = -192.3521
Iteration 1: log likelihood = -191.15937
Iteration 2: log likelihood = -191.15777
Probit regression Number of obs = 390
LR chi2(3) = 2.39
Prob > chi2 = 0.4957
Log likelihood = -191.15777 Pseudo R2 = 0.0062
Matching iterations...
................................................................................
> ..............................................................................
> ..............................................................................
> ..............................................................................
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 780
Baseline Follow-up
Control: 76 76 152
Treated: 314 314 628
390 390
Baseline
Control 17.000
Treated 15.750
Diff (T-C) -1.250 1.202 -1.04 0.299
Follow-up
Control 16.000
Treated 17.000
Diff (T-C) 1.000 1.195 0.84 0.403
R-square: 0.00
- Values are estimated at the .5 quantile
**Inference: *** p<0.01; ** p<0.05; * p<0.1
When combined with the kernel option, the covariates are weighted, and the dif-
ferences are obtained by linear regression (this test is also suitable with repeated cross-
sections).
. diff fte, treated(treated) period(t) cov(bk kfc roys) test id(id) kernel
Matching iterations...
................................................................................
> ..............................................................................
> ..............................................................................
> ..............................................................................
TWO-SAMPLE T TEST
Number of observations (baseline): 390
Baseline Follow-up
Control: 76 - 76
Treated: 314 - 314
390 -
t-test at period = 0:
5 Acknowledgments
I thank Kit Baum from Boston College for his valuable suggestions. I also thank at-
tendees at the 2012 Stata Users Meeting Group in London, UK, for providing feedback
on a previous version of this command. David Card from the University of Califor-
nia, Berkeley, as well as Vincenzo di Maro from The World Bank and Pablo Ibarraran
from the Inter-American Development Bank, provided important suggestions in an early
stage of the development of the code. Monica Oviedo from Universitat Autònoma de
Barcelona contributed with a review of some options of the diff code. I am grateful
to the Global Development Institute (formerly Brooks World Poverty Institute) and
United Nations University World Institute for Development Economics Research for
their research support. All the errors and omissions in the article are my own.
6 References
Abadie, A. 2005. Semiparametric difference-in-differences estimators. Review of Eco-
nomic Studies 72: 1–19.
Abadie, A., J. L. Herr, G. Imbens, and D. M. Drukker. 2004. nnmatch: Stata module to
compute nearest-neighbor bias-corrected estimators. Statistical Software Components
S439701, Department of Economics, Boston College.
http://econpapers.repec.org/software/bocbocode/s439701.htm.
J. M. Villa 71
Angrist, J. D., and J.-S. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s
Companion. Princeton, NJ: Princeton University Press.
Becker, S. O., and A. Ichino. 2002. Estimation of average treatment effects based on
propensity scores. Stata Journal 2: 358–377.
Card, D., and A. B. Krueger. 1994. Minimum wages and employment: A case study of
the fast-food industry in New Jersey and Pennsylvania. American Economic Review
84: 772–793.
Cerulli, G. 2014. ivtreatreg: A command for fitting binary treatment models with
heterogeneous response to treatment and unobservable selection. Stata Journal 14:
453–480.
Leuven, E., and B. Sianesi. 2003. psmatch2: Stata module to perform full Mahalanobis
and propensity score matching, common support graphing, and covariate imbalance
testing. Statistical Software Components S432001, Department of Economics, Boston
College. https://ideas.repec.org/c/boc/bocode/s432001.html.
Mora, R., and I. Reggio. 2012. Treatment effect identification using alternative parallel
assumptions. Working Paper 12-33, Universidad Carlos III de Madrid.
Nichols, A. 2007. rd: Stata module for regression discontinuity estimation. Statistical
Software Components S456888, Department of Economics, Boston College.
https://ideas.repec.org/c/boc/bocode/s456888.html.