Professional Documents
Culture Documents
Problem Set 1: Randomized Control Trials: Exercise 1
Problem Set 1: Randomized Control Trials: Exercise 1
Exercise 1
You have data from a survey of 200 individuals’ mental health scores (Mi , from 1 to 10, 10
being the best) and on whether these individuals took a 1-month vacation in the past year (Vi ).
The group of people who took the vacation (Vi = 1; number of individuals: n1 = 77) have an
average mental health score of 6.3 (standard deviation: s1 = 2.4) whereas the group who did
not take a vacation (Vi = 0; number of individuals: n0 = 123) have an average mental health
score of 8.2 (standard deviation s0 = 1.2).
1. You compare the group means. Show how the difference in means relates to the average
treatment effect (ATE) of taking a 1-month vacation. Explain why the difference in group
means is not a good estimator of the ATE.
2. Show that under random assignment of vacation status Vi , one can identify the ATE using
the difference in group means.
3. Suppose that the assignment of Vi was randomized. Calculate an estimate for the ATE.
5. Is the ATE estimate significantly different from zero at the 5% significance level?
Exercise 2
The LMU wants to assess the efficacy of student-led tutorials for an intermediate econo-
metrics course (Econometrics 2). Students are randomly assigned to either the treatment
or control group: students in the treatment group receive additional econometrics tutorials
(treatment=1) in addition to the lectures and classes every student receives.
The following R code generates a dataset for 1000 students with the described variables.
# Number of students
n_experiment <- 1000
# Generate variables
experiment1 <- data.frame(
id = 1:n_experiment,
1
Applied Econometrics, Summer Term 2024
1. Calculate the averages for the pre-score and the observed covariates for the treatment
and control groups. Generally speaking, does balance in observables imply that the
randomization was done correctly?
2. Calculate the average post-score for the treatment and control groups.
3. Without estimating a regression, calculate an estimate for the Average Treatment Effect
(ATE) of receiving tutorials. Calculate the standard error of your estimate for the ATE.
4. Using a regression, estimate the ATE. Compare your results to those above.
5. Add controls to your regression. Explain how and why your results change.
Exercise 3
Suppose that not all students complete their econometrics course.
1. Consider the following R code describing the process by which students drop out of the
course:
set.seed(123)
experiment2 <- experiment1
experiment2$dropout <- 0
experiment2$dropout[experiment2$dropout_helper == 1 &
experiment2$treatment == 1] <- 1
experiment2$post_score[experiment2$dropout == 1] <- NA
Describe how the dataset experiment2 differs from experiment1. Then compute the
same statistics as above (Exercise 2, Questions 1-5).
2. Alternatively, consider the following R code describing the process by which students
drop out of the course:
2
Applied Econometrics, Summer Term 2024
set.seed(123)
experiment3 <- experiment1
experiment3$dropout <- 0
experiment3$dropout[experiment3$post_score < 65] <- 1
experiment3$post_score[experiment3$dropout == 1] <- NA
Describe how the dataset experiment3 differs from experiment2. Then compute the
same statistics as above (Exercise 2, Questions 1-5).
3. Compare your answers in Questions 1 and 2. Explain how and why they differ.
Exercise 4
Suppose now that not all students who are eligible for the attending tutorials (i.e., those in the
treatment group) actually attend the tutorials. Consider the following R code:
set.seed(123)
experiment4 <- experiment1
3. Use the treatment assignment as an instrument for taking the treatment. What does this
estimate?
Exercise 5
Suppose that at LMU there is a buddy system, where at the beginning of their studies each
student is assigned to another student, their ‘buddy’.
Suppose that the true data generating process has a direct effect of student-led tutorials, but
there is also a spillover effect from a stuident’s buddy’s treatment status.
Consider the following R code:
library(dplyr)
set.seed(123)
experiment5 <- experiment1
3
Applied Econometrics, Summer Term 2024
# Data-Generating Process
experiment5$post_score <- with(experiment5, round(
pre_score + 10*treatment+ 5*treat_buddy + 2*covariate1
+ rnorm(n_experiment, mean = 0, sd = 10)
))
2. Calculate the spillover effect of a student’s buddy being treated. Is this estimate biased?
3. Suppose you do not know which students are in a buddy group together. Can you still
estimate spillover effects? If not, suggest how the underlying dataset would have to be
different for you to be able to estimate spillover effects.