Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Impact Evaluation

Universidad del Rosario

Alexandra Contreras Cristancho PROBLEM SET 3

REGRESSION DISCONTINUITY DESIGNS


For the next questions you should use the dataset RDD_Melguizoetal2015.dta that is in the folder of
the problem set in e-aulas. Submit a do-file where appropriate to answer the questions as well.

1. Familiarize yourself with the data. How many beneficiaries (=recipients of the ACCES credit) and
non- beneficiaries are in the dataset? How many beneficiaries take up the grant? Are there any
beneficiaries who do not meet the eligibility criteria (i.e., exam score below the cut-off)? Are the
group of beneficiaries and non- beneficiaries comparable in terms of age, gender, mother’s
education level and household income level?

71,109 beneficiaries accept the benefit. 3,169 beneficiaries do not pass the eligibility
criteria.

Answer: Therefore, according to the


ascribed information taken from the
database referred to in the PS3 point 1,
where 104,103 data were obtained. It is
concluded that 91,260 are beneficiaries
and 12,843 are not beneficiaries.

2. The fuzzy RD design relies on a discontinuity in the probability of treatment assignment. Check
that the probability of participating in the ACCES program jumps at the cut-off exam score. Use
variables distr_ajust which is the exam score standardized.
Answer: After taking the distr_ajust command, the following result was obtained: the standardized
grade is 5.0, the mean is 0.78, the standard deviation is 0.79, the minimum value is -5.714 and the
maximum value is 4.478. STATA: sort dist_ajust (summarize dist_ajust). cmogram estbenef_acces
2
Impact Evaluation
Universidad del Rosario

dist_ajust if dist_ajust >= -2 & dist_ajust <= 2, cut(0) scatter line(0) graphopts(xtitle("Exam Score
Standardized") ytitle("Prob. of Partic. in ACCES")) qfitci.

3. Check that the observable characteristics relevant to the outcome are continuous around the cut-off
exam score. Use variables age (age_student), gender (dmen), mother’s education level (mother_edu)
and household income level (income).

STATA:xtile Xdist_ajust= dist_ajust, nq(100)


*Graph age_student
preserve
collapse (mean) age_student dist_ajust, by(Xdist_ajust)
tw (scatter age_student dist_ajust if dist_ajust<0 & dist_ajust>-1) ///
(scatter age_student dist_ajust if dist_ajust>0 & dist_ajust<1) ///
(lpolyci age_student dist_ajust if dist_ajust<0 & dist_ajust>-1, kernel(triangle)
lcolor(red) ciplot(rline)) ///
(lpolyci age_student dist_ajust if dist_ajust>0 & dist_ajust<1, kernel(triangle)
lcolor(red) ciplot(rline)), ///
xline(0, lcolor(green)) legend(off)
restore

*Graph dmen
preserve
collapse (mean) dmen dist_ajust, by(Xdist_ajust)
tw (scatter dmen dist_ajust if dist_ajust<0 & dist_ajust>-1) ///
(scatter dmen dist_ajust if dist_ajust>0 & dist_ajust<1) ///
(lpolyci dmen dist_ajust if dist_ajust<0 & dist_ajust>-1, kernel(triangle) lcolor(red)
ciplot(rline)) ///
(lpolyci dmen dist_ajust if dist_ajust>0 & dist_ajust<1, kernel(triangle) lcolor(red)

3
Impact Evaluation
Universidad del Rosario

ciplot(rline)), ///
xline(0, lcolor(green)) legend(off)
restore

*Graph mother_edu
preserve
collapse (mean) mother_edu dist_ajust, by(Xdist_ajust)
tw (scatter mother_edu dist_ajust if dist_ajust<0 & dist_ajust>-1) ///
(scatter mother_edu dist_ajust if dist_ajust>0 & dist_ajust<1) ///
(lpolyci mother_edu dist_ajust if dist_ajust<0 & dist_ajust>-1, kernel(triangle)
lcolor(red) ciplot(rline)) ///
(lpolyci mother_edu dist_ajust if dist_ajust>0 & dist_ajust<1, kernel(triangle)
lcolor(red) ciplot(rline)), ///
xline(0, lcolor(green)) legend(off)
restore

*Graph income
preserve
collapse (mean) income dist_ajust, by(Xdist_ajust)
tw (scatter income dist_ajust if dist_ajust<0 & dist_ajust>-1) ///
(scatter income dist_ajust if dist_ajust>0 & dist_ajust<1) ///
(lpolyci income dist_ajust if dist_ajust<0 & dist_ajust>-1, kernel(triangle) lcolor(red)
ciplot(rline)) ///
(lpolyci income dist_ajust if dist_ajust>0 & dist_ajust<1, kernel(triangle) lcolor(red)
ciplot(rline)), ///
xline(0, lcolor(green)) legend(off)
restore

4
Impact Evaluation
Universidad del Rosario

Answer: According to the regression carried out in Stata and obtaining the attached graphs, it is inferred
that the relevant variables for the result obtained are continuous. Exactly on the court exam score.

4. Do a graphical analysis of the probability of enrolment (enrollment) by distance to the exam cut-
off (intention- to-treat effect). Are there any differences by socio-economic status?
STATA: collapse (mean) enrollment dist_ajust, by(xtile_dist_ajust)
tw (scatter enrollment dist_ajust if dist_ajust<1 & dist_ajust>-1 ), ///
xline(0, lcolor(gray)) ytitle("Prob. of enrollment") xtitle("Dist. Ajust")
restore

**Differences by socio-economic status

gen enrollment_estr1 = enrollment if icfes_estrato_imp == 1


gen dist_ajust_estr1 = dist_ajust if icfes_estrato_imp == 1

preserve
collapse (mean) enrollment_s1 dist_ajust_estr1, by(xtile_dist_ajust)
tw (scatter enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1>=0, color(ebblue)) ///
(scatter enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1<0, color(ebblue)) ///
(lpolyci enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1>=0, kernel(triangle) lcolor(dknavy)
ciplot(rline)) ///
(lpolyci enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1<0, kernel(triangle) lcolor(dknavy)
ciplot(rline)), ///
xline(0, lcolor(gray)) ytitle("Prob. of Enrollment") xtitle("Dist. Ajust")
restore

gen enrollment_str2 = enrollment if icfes_estrato_imp == 2


gen dist_ajust_str2 = dist_ajust if icfes_estrato_imp == 2

preserve
collapse (mean) enrollment_str2 dist_ajust_str2, by(xtile_dist_ajust)
tw (scatter enrollment_str2 dist_ajust_str2 if dist_ajust_str2>=0, color(ebblue)) ///
(scatter enrollment_str2 dist_ajust_str2 if dist_ajust_str2<0, color(ebblue)) ///
(lpolyci enrollment_str2 dist_ajust_str2 if dist_ajust_str2>=0, kernel(triangle) lcolor(dknavy)
ciplot(rline)) ///
5
Impact Evaluation
Universidad del Rosario

(lpolyci enrollment_str2 dist_ajust_str2 if dist_ajust_str2<0, kernel(triangle) lcolor(dknavy)


ciplot(rline)), ///
xline(0, lcolor(gray)) ytitle("Prob. of Enrollment") xtitle("Dist. Ajust")
restore

Answer: According to the commands and the


graph obtained, it is inferred that
that the socioeconomic characterization of the
student infers on the probability of enrollment
when the court exam is close. Stratum 1 and 2
were included. Among the requirements for
accessing the ACCES credit line are: belonging to
stratum 1 or 2, having a high average score on
the state exam. Therefore, if there are students in
the graph who meet the score but are not chosen,
it is because they are not within the permitted
socioeconomic strata.

5. Run a regression to estimate the ITT effect of eligibility on enrolment. Do the results change if
you add socio- economic control variables?

***ESTIMATE THE EFFECT


** OLS estimation without controls

gen dist_ajust_2 = dist_ajust*dist_ajust

preserve
keep if dist_ajust<100 & dist_ajust>-100
eststo rdd1: reg enrollment elegible_icetex2 dist_ajust, cl(departamento)
eststo rdd2: reg enrollment elegible_icetex2 dist_ajust dist_ajust2, cl(departamento)
restore
esttab rdd1 rdd2, se r2 star(* .1 ** .05 *** .01) label

** OLS estimation with controls

preserve
keep if dist_ajust<100 & dist_ajust>-100
eststo rdd3: reg enrollment elegible_icetex2 dist_ajust i.dmen i.mother_edu i.income age,
cl(departamento)
eststo rdd4: reg enrollment elegible_icetex2 dist_ajust dist_ajust2 i.dmen i.mother_edu i.income age,
cl(departamento)
restore
esttab rdd3 rdd4, se r2 star(* .1 ** .05 *** .01) label

6
Impact Evaluation
Universidad del Rosario

Answer: Indeed, by placing a socioeconomic control variable, the results are altered; since, the inscription
is given by an a priori lesson where not having that socio-economic level exempts you from any possibility
of being part of this educational credit offer. Therefore, thanks to the attached tables it can be deduced
that totally including a control variable of this type generates a negative correlation with the possibility of
choosing to access this line of credit.

INSTRUMENTAL VARIABLES
For the following questions you should use the dataset IV_Attanasio2011EJ. The corresponding paper
is called “Community Nurseries and the Nutritional Status of Poor Children. Evidence from
Colombia” and you can find it in the e-aulas folder for this problem set.

Suppose that the production function of child’s nutritional status can be approximated by a linear

function such as
H is the child’s nutritional status, A is a measure of participation in Hogares Comunitarios (HC) (i.e.,
whether or not the child attends an HC nursery – attendance; or the number of days that the child
attends an HC nursery - exposure). F is food fed to the child, L is female labor supply, and z is a vector
of observable variables.

6. Why would a regression of child’s nutritional status on either measure of participation in HC yield
biased estimate of the treatment effect?

Answer: In the case of community households, when both participation indicators are presented, it is
imperative that biases may occur due to the following causes: on the one hand, there may be the possibility
of self-selection; since the parents decide whether or not the child attends these care centers and this
determines their participation in them. For this reason, sampling errors can appear intrinsically generating
7
Impact Evaluation
Universidad del Rosario

endogeneity. Finally, as it is evident that there are omitted variables that are very important for the
development of the model; since there may be a causal relationship with the outcome variable; which is
responsible for measuring the nutritional status of the child and it is imperative that this variable be
included, since, by excluding it either due to impediments to accessing information or because they are
not measurable and quantifiable, they may in the future generate a bias of omitted variable that must be
corrected as soon as possible and another vital variable such as the height of the children should not be
forgotten.

7. Suggested instruments are:


a. Two measures of distance from the household to the nearest HC nursery: one distance
measured in the most current survey and one measured in the baseline survey.
Answer: In terms of the two attached measures; The valid instrumental variables, in this case, have no
relationship whatsoever with the relevant omitted variables, since these variables determine whether the
children actually receive adequate nutrition. On the other hand, what can be affected is the variable of
interest, because the variable that measures participation in community households in this example is
independent. Third, the space traveled or distance from each of the houses to the community homes
totally determines the participation of a child in those places; Well, the longer the distance, the longer the
time elapsed and therefore the higher the cost to get to the place, for the parents. However, it would be
a mistake to estimate that the quality of the food provided to the children of these community homes
may have a causal relationship with the distance between the children's homes and the community homes,
through a independent variable.

b. Median fee paid by children to attend an HC nursery in the town (as indicators of cost of
participation both in terms of time and money).

Under which assumptions are these variables valid instruments. Is this plausible?

Answer: At this point, the rate or payment that is made for children to attend community homes can be
considered as an instrumental variable, which is fully valid because there is no relationship on the quality
of nutrition provided to children, since reiterates again that there is no correlation and one does not
depend on the other. Secondly, if there is a correlation between attendance or participation in these
households, since the higher the opportunity cost of attending these public centers, the greater the
negative correlation in this case, since there are fewer possibilities of paying for this service and have the
child attend group homes. Finally, after the analysis presented, the variable that can have effects is the
result variable through the independent variable. In the case that concerns us, the assigned variables are
valid if the premise is maintained that in this case the instrumental variable will not have a relationship
with the independent variable and not with other types of variables such as those omitted. If the first
regression stage exists a correlation between the chosen instrumental variable with the outcome variable;
Instruments and assumptions can be met in a directly proportional manner.

8. Evaluate empirically the validity of the two sets of instruments (i.e., first stage regressions) using a
linear model (OLS) controlling for relevant covariates. Consider using quadratic terms of each
instrument if appropriate. Perform an F-test to analyse if the instruments (and their squared terms)
are weak and interpret the results.

8
Impact Evaluation
Universidad del Rosario

(1) (1)
VARIABLES Exposure VARIABLES Attendance

Traveltimhogcom -0.159*** Traveltimhogcom -0.281***


(0.0287) (0.0533)
traveltimhogcom2 0.0509*** traveltimhogcom2 0.0791***
(0.0134) (0.0250)
time_hc_b -0.185*** time_hc_b -0.184***
(0.0269) (0.0501)
time_hc_b2 0.0656*** time_hc_b2 0.0770***
(0.0123) (0.0228)
hc_fee_med -0.0294*** hc_fee_med -0.0335***
(0.00210) (0.00389)
hc_fee_med2 0.00108*** hc_fee_med2 0.00123***
(0.000138) (0.000255)
Constant 0.324*** Constant 0.421***
(0.00618) (0.0115)

Observations 6,264 Observations 6,359


R-squared 0.136 R-squared 0.068
Standard errors in parentheses
Standard errors in *** p<0.01, ** p<0.05, * p<0.1
parentheses
*** p<0.01, ** p<0.05, *
p<0.1

STATA: reg exposure traveltimhogcom traveltimhogcom2 time_hc_b time_hc_b2 hc_fee_med


hc_fee_med2
outreg2 using 8.doc, replace ctitle(Exposure)
reg asis_hc traveltimhogcom traveltimhogcom2 time_hc_b time_hc_b2 hc_fee_med hc_fee_med2
outreg2 using 8.doc, append ctitle(Attendance) .

Answer: Analyzing the assigned variables (Exposure and Attendance) it is necessary to highlight that
they are independent variables. In this case, there is a negative relationship between the treatment
variable and the instrumental variable. In the results obtained, it is evident that the first part of the
results are considered significant for each of the variables of instruments squared and the clearly original
ones. Therefore, it is inferred that the component is non-linear between the treatment variables
specified in two measurements and the instruments displayed. The result of the T-Test command is
considerable, therefore, it is clear that if there is significance of the instrumental variables and they are
considered for this reason, as totally valid for the regression.

9. Do you reach the same results if you use a non-linear model as appropriate? [hint: tobit model for
exposure, probit model for attendance]. Predict the fitted values of tobit and probit models.

9
Impact Evaluation
Universidad del Rosario

(1) (2) (1)


VARIABLES Tobit Tobit VARIABLES Probit Attendance
Exposure Exposure
traveltimhogcom -1.215***
traveltimhogcom -0.159*** (0.226)
(0.0286) traveltimhogcom2 0.186
traveltimhogcom2 0.0509*** (0.136)
(0.0134) time_hc_b -0.757***
time_hc_b -0.185*** (0.191)
(0.0269) time_hc_b2 0.308***
time_hc_b2 0.0656*** (0.0890)
(0.0123) hc_fee_med -0.105***
hc_fee_med -0.0294*** (0.0134)
(0.00210) hc_fee_med2 0.00349***
hc_fee_med2 0.00108*** (0.000946)
(0.000138) Constant -0.0999***
var(e.exposure) 0.0492*** (0.0384)
(0.000879)
Constant 0.324*** Observations 6,359
(0.00618) Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Observations 6,264 6,264
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1

STATA:tobit exposure traveltimhogcom Answer: After performing the regression, it is


traveltimhogcom2 time_hc_b time_hc_b2 found that there is an adjustment by a non-linear
hc_fee_med hc_fee_med2 model. It was again found that the instrumental
outreg2 using TobitExposure.doc, replace variables have a negative relationship and remain
ctitle(Tobit Exposure) significant. For this regression it was carried out
predict TExposure through Tobir for Exposure and Probit for
probit asis_hc traveltimhogcom Attendance. However, not all the variables are
traveltimhogcom2 time_hc_b time_hc_b2 significant and the distance squared one obtained
hc_fee_med hc_fee_med2 a p_value of 0.171 and does not represent a
outreg2 using ProbitAttendance.doc, significance of even 10%, being the exception to
replace ctitle(Probit Attendance) the rule.
predict PAttendance

10. Compute the IV estimates of the treatment effect of the (instrumented) programme participation
on child’s nutrition (height-for-age z scores) using the 2SLS estimator with linear and quadratic
terms of the instrumental variables. Include relevant covariates [hint: use command ivreg and
restrict the sample to those observations with no missing values.]. Compare these estimates with
the instrumented OLS regression of height-for-age on HC participation. What do these estimates
suggest in terms of selection bias?

10
Impact Evaluation
Universidad del Rosario

(1) (1)
VARIABLES 2SLS VARIABLES IVOLS

exposure 0.506*** exposureHat 0.511***


(0.161) (0.158)
Constant -1.343*** Constant -1.344***
(0.0319) (0.0314)

Observations 6,264 Observations 6,361


R-squared -0.019 R-squared 0.002
Standard errors in parentheses Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1 *** p<0.01, ** p<0.05, * p<0.1

Stata: Answer: At this point, there is a selection bias;


ivreg2 haz ( exposure=traveltimhogcom however, it is desired to mitigate through the
traveltimhogcom2 time_hc_b time_hc_b2 instrumental variable method. The regression in
hc_fee_med hc_fee_med2) this case is done by 2SLS and by IV OLS. A
outreg2 using 2sls.doc, replace ctitle(2SLS)
possible bias was also evidenced with the omitted
variables that may become vital in the original or
reg exposure traveltimhogcom initial model.
traveltimhogcom2 time_hc_b time_hc_b2 The first OLS method is performed manually, and
hc_fee_med hc_fee_med2 this allows the standard error to be further
predict exposureHat minimized and the coefficient is smaller but not as
reg haz exposureHat significant. In the second case, the method by
outreg2 using IVOLS.doc, replace 2SLS, allows a greater error to exist and at this
ctitle(IVOLS) point it is standard. Both methods yield a more
unbiased coefficient, since the OLS method is
underestimated. Although the errors obtained in
both methods are asymmetric, they do have a
similarity in terms of endogeneity that must be
adjusted.

11

You might also like