Professional Documents
Culture Documents
Impact Evaluation Universidad Del Rosario: Problem Set 3
Impact Evaluation Universidad Del Rosario: Problem Set 3
1. Familiarize yourself with the data. How many beneficiaries (=recipients of the ACCES credit) and
non- beneficiaries are in the dataset? How many beneficiaries take up the grant? Are there any
beneficiaries who do not meet the eligibility criteria (i.e., exam score below the cut-off)? Are the
group of beneficiaries and non- beneficiaries comparable in terms of age, gender, mother’s
education level and household income level?
71,109 beneficiaries accept the benefit. 3,169 beneficiaries do not pass the eligibility
criteria.
2. The fuzzy RD design relies on a discontinuity in the probability of treatment assignment. Check
that the probability of participating in the ACCES program jumps at the cut-off exam score. Use
variables distr_ajust which is the exam score standardized.
Answer: After taking the distr_ajust command, the following result was obtained: the standardized
grade is 5.0, the mean is 0.78, the standard deviation is 0.79, the minimum value is -5.714 and the
maximum value is 4.478. STATA: sort dist_ajust (summarize dist_ajust). cmogram estbenef_acces
2
Impact Evaluation
Universidad del Rosario
dist_ajust if dist_ajust >= -2 & dist_ajust <= 2, cut(0) scatter line(0) graphopts(xtitle("Exam Score
Standardized") ytitle("Prob. of Partic. in ACCES")) qfitci.
3. Check that the observable characteristics relevant to the outcome are continuous around the cut-off
exam score. Use variables age (age_student), gender (dmen), mother’s education level (mother_edu)
and household income level (income).
*Graph dmen
preserve
collapse (mean) dmen dist_ajust, by(Xdist_ajust)
tw (scatter dmen dist_ajust if dist_ajust<0 & dist_ajust>-1) ///
(scatter dmen dist_ajust if dist_ajust>0 & dist_ajust<1) ///
(lpolyci dmen dist_ajust if dist_ajust<0 & dist_ajust>-1, kernel(triangle) lcolor(red)
ciplot(rline)) ///
(lpolyci dmen dist_ajust if dist_ajust>0 & dist_ajust<1, kernel(triangle) lcolor(red)
3
Impact Evaluation
Universidad del Rosario
ciplot(rline)), ///
xline(0, lcolor(green)) legend(off)
restore
*Graph mother_edu
preserve
collapse (mean) mother_edu dist_ajust, by(Xdist_ajust)
tw (scatter mother_edu dist_ajust if dist_ajust<0 & dist_ajust>-1) ///
(scatter mother_edu dist_ajust if dist_ajust>0 & dist_ajust<1) ///
(lpolyci mother_edu dist_ajust if dist_ajust<0 & dist_ajust>-1, kernel(triangle)
lcolor(red) ciplot(rline)) ///
(lpolyci mother_edu dist_ajust if dist_ajust>0 & dist_ajust<1, kernel(triangle)
lcolor(red) ciplot(rline)), ///
xline(0, lcolor(green)) legend(off)
restore
*Graph income
preserve
collapse (mean) income dist_ajust, by(Xdist_ajust)
tw (scatter income dist_ajust if dist_ajust<0 & dist_ajust>-1) ///
(scatter income dist_ajust if dist_ajust>0 & dist_ajust<1) ///
(lpolyci income dist_ajust if dist_ajust<0 & dist_ajust>-1, kernel(triangle) lcolor(red)
ciplot(rline)) ///
(lpolyci income dist_ajust if dist_ajust>0 & dist_ajust<1, kernel(triangle) lcolor(red)
ciplot(rline)), ///
xline(0, lcolor(green)) legend(off)
restore
4
Impact Evaluation
Universidad del Rosario
Answer: According to the regression carried out in Stata and obtaining the attached graphs, it is inferred
that the relevant variables for the result obtained are continuous. Exactly on the court exam score.
4. Do a graphical analysis of the probability of enrolment (enrollment) by distance to the exam cut-
off (intention- to-treat effect). Are there any differences by socio-economic status?
STATA: collapse (mean) enrollment dist_ajust, by(xtile_dist_ajust)
tw (scatter enrollment dist_ajust if dist_ajust<1 & dist_ajust>-1 ), ///
xline(0, lcolor(gray)) ytitle("Prob. of enrollment") xtitle("Dist. Ajust")
restore
preserve
collapse (mean) enrollment_s1 dist_ajust_estr1, by(xtile_dist_ajust)
tw (scatter enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1>=0, color(ebblue)) ///
(scatter enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1<0, color(ebblue)) ///
(lpolyci enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1>=0, kernel(triangle) lcolor(dknavy)
ciplot(rline)) ///
(lpolyci enrollment_estr1 dist_ajust_estr1 if dist_ajust_estr1<0, kernel(triangle) lcolor(dknavy)
ciplot(rline)), ///
xline(0, lcolor(gray)) ytitle("Prob. of Enrollment") xtitle("Dist. Ajust")
restore
preserve
collapse (mean) enrollment_str2 dist_ajust_str2, by(xtile_dist_ajust)
tw (scatter enrollment_str2 dist_ajust_str2 if dist_ajust_str2>=0, color(ebblue)) ///
(scatter enrollment_str2 dist_ajust_str2 if dist_ajust_str2<0, color(ebblue)) ///
(lpolyci enrollment_str2 dist_ajust_str2 if dist_ajust_str2>=0, kernel(triangle) lcolor(dknavy)
ciplot(rline)) ///
5
Impact Evaluation
Universidad del Rosario
5. Run a regression to estimate the ITT effect of eligibility on enrolment. Do the results change if
you add socio- economic control variables?
preserve
keep if dist_ajust<100 & dist_ajust>-100
eststo rdd1: reg enrollment elegible_icetex2 dist_ajust, cl(departamento)
eststo rdd2: reg enrollment elegible_icetex2 dist_ajust dist_ajust2, cl(departamento)
restore
esttab rdd1 rdd2, se r2 star(* .1 ** .05 *** .01) label
preserve
keep if dist_ajust<100 & dist_ajust>-100
eststo rdd3: reg enrollment elegible_icetex2 dist_ajust i.dmen i.mother_edu i.income age,
cl(departamento)
eststo rdd4: reg enrollment elegible_icetex2 dist_ajust dist_ajust2 i.dmen i.mother_edu i.income age,
cl(departamento)
restore
esttab rdd3 rdd4, se r2 star(* .1 ** .05 *** .01) label
6
Impact Evaluation
Universidad del Rosario
Answer: Indeed, by placing a socioeconomic control variable, the results are altered; since, the inscription
is given by an a priori lesson where not having that socio-economic level exempts you from any possibility
of being part of this educational credit offer. Therefore, thanks to the attached tables it can be deduced
that totally including a control variable of this type generates a negative correlation with the possibility of
choosing to access this line of credit.
INSTRUMENTAL VARIABLES
For the following questions you should use the dataset IV_Attanasio2011EJ. The corresponding paper
is called “Community Nurseries and the Nutritional Status of Poor Children. Evidence from
Colombia” and you can find it in the e-aulas folder for this problem set.
Suppose that the production function of child’s nutritional status can be approximated by a linear
function such as
H is the child’s nutritional status, A is a measure of participation in Hogares Comunitarios (HC) (i.e.,
whether or not the child attends an HC nursery – attendance; or the number of days that the child
attends an HC nursery - exposure). F is food fed to the child, L is female labor supply, and z is a vector
of observable variables.
6. Why would a regression of child’s nutritional status on either measure of participation in HC yield
biased estimate of the treatment effect?
Answer: In the case of community households, when both participation indicators are presented, it is
imperative that biases may occur due to the following causes: on the one hand, there may be the possibility
of self-selection; since the parents decide whether or not the child attends these care centers and this
determines their participation in them. For this reason, sampling errors can appear intrinsically generating
7
Impact Evaluation
Universidad del Rosario
endogeneity. Finally, as it is evident that there are omitted variables that are very important for the
development of the model; since there may be a causal relationship with the outcome variable; which is
responsible for measuring the nutritional status of the child and it is imperative that this variable be
included, since, by excluding it either due to impediments to accessing information or because they are
not measurable and quantifiable, they may in the future generate a bias of omitted variable that must be
corrected as soon as possible and another vital variable such as the height of the children should not be
forgotten.
b. Median fee paid by children to attend an HC nursery in the town (as indicators of cost of
participation both in terms of time and money).
Under which assumptions are these variables valid instruments. Is this plausible?
Answer: At this point, the rate or payment that is made for children to attend community homes can be
considered as an instrumental variable, which is fully valid because there is no relationship on the quality
of nutrition provided to children, since reiterates again that there is no correlation and one does not
depend on the other. Secondly, if there is a correlation between attendance or participation in these
households, since the higher the opportunity cost of attending these public centers, the greater the
negative correlation in this case, since there are fewer possibilities of paying for this service and have the
child attend group homes. Finally, after the analysis presented, the variable that can have effects is the
result variable through the independent variable. In the case that concerns us, the assigned variables are
valid if the premise is maintained that in this case the instrumental variable will not have a relationship
with the independent variable and not with other types of variables such as those omitted. If the first
regression stage exists a correlation between the chosen instrumental variable with the outcome variable;
Instruments and assumptions can be met in a directly proportional manner.
8. Evaluate empirically the validity of the two sets of instruments (i.e., first stage regressions) using a
linear model (OLS) controlling for relevant covariates. Consider using quadratic terms of each
instrument if appropriate. Perform an F-test to analyse if the instruments (and their squared terms)
are weak and interpret the results.
8
Impact Evaluation
Universidad del Rosario
(1) (1)
VARIABLES Exposure VARIABLES Attendance
Answer: Analyzing the assigned variables (Exposure and Attendance) it is necessary to highlight that
they are independent variables. In this case, there is a negative relationship between the treatment
variable and the instrumental variable. In the results obtained, it is evident that the first part of the
results are considered significant for each of the variables of instruments squared and the clearly original
ones. Therefore, it is inferred that the component is non-linear between the treatment variables
specified in two measurements and the instruments displayed. The result of the T-Test command is
considerable, therefore, it is clear that if there is significance of the instrumental variables and they are
considered for this reason, as totally valid for the regression.
9. Do you reach the same results if you use a non-linear model as appropriate? [hint: tobit model for
exposure, probit model for attendance]. Predict the fitted values of tobit and probit models.
9
Impact Evaluation
Universidad del Rosario
10. Compute the IV estimates of the treatment effect of the (instrumented) programme participation
on child’s nutrition (height-for-age z scores) using the 2SLS estimator with linear and quadratic
terms of the instrumental variables. Include relevant covariates [hint: use command ivreg and
restrict the sample to those observations with no missing values.]. Compare these estimates with
the instrumented OLS regression of height-for-age on HC participation. What do these estimates
suggest in terms of selection bias?
10
Impact Evaluation
Universidad del Rosario
(1) (1)
VARIABLES 2SLS VARIABLES IVOLS
11