Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Sebastian Hager Applied Econometrics – LMU Munich

Summer Term 2024

Problem Set 1: Randomized Control Trials

Exercise 1
You have data from a survey of 200 individuals’ mental health scores (Mi , from 1 to 10, 10
being the best) and on whether these individuals took a 1-month vacation in the past year (Vi ).
The group of people who took the vacation (Vi = 1; number of individuals: n1 = 77) have an
average mental health score of 6.3 (standard deviation: s1 = 2.4) whereas the group who did
not take a vacation (Vi = 0; number of individuals: n0 = 123) have an average mental health
score of 8.2 (standard deviation s0 = 1.2).

1. You compare the group means. Show how the difference in means relates to the average
treatment effect (ATE) of taking a 1-month vacation. Explain why the difference in group
means is not a good estimator of the ATE.

2. Show that under random assignment of vacation status Vi , one can identify the ATE using
the difference in group means.

3. Suppose that the assignment of Vi was randomized. Calculate an estimate for the ATE.

4. Calculate the standard error of the ATE estimate.

5. Is the ATE estimate significantly different from zero at the 5% significance level?

Exercise 2
The LMU wants to assess the efficacy of student-led tutorials for an intermediate econo-
metrics course (Econometrics 2). Students are randomly assigned to either the treatment
or control group: students in the treatment group receive additional econometrics tutorials
(treatment=1) in addition to the lectures and classes every student receives.

For each student, we observe their grade in Econometrics 1, a beginner-level econometrics


course, (pre score) as well as their score in Econometrics 2 (post score). We also observe
the additional variable covariate1 which captures students’ answers to a survey in their first
year about how much they like statistics.

The following R code generates a dataset for 1000 students with the described variables.

# Set seed for reproducibility


set.seed(789)

# Number of students
n_experiment <- 1000

# Generate variables
experiment1 <- data.frame(
id = 1:n_experiment,

1
Applied Econometrics, Summer Term 2024

treatment = rbinom(n_experiment, 1, 0.5),


pre_score = round(rnorm(n_experiment, mean = 50, sd = 10)),
covariate1 = round(runif(n_experiment, min=0, max=30)),
covariate2 = round(runif(n_experiment, min=20, max=28))
)

# Data Generating Process


experiment1$post_score <- with(experiment1,
round(pre_score + 10*treatment + 2*covariate1
+ rnorm(n_experiment, mean = 0, sd = 10)))

1. Calculate the averages for the pre-score and the observed covariates for the treatment
and control groups. Generally speaking, does balance in observables imply that the
randomization was done correctly?

2. Calculate the average post-score for the treatment and control groups.

3. Without estimating a regression, calculate an estimate for the Average Treatment Effect
(ATE) of receiving tutorials. Calculate the standard error of your estimate for the ATE.

4. Using a regression, estimate the ATE. Compare your results to those above.

5. Add controls to your regression. Explain how and why your results change.

Exercise 3
Suppose that not all students complete their econometrics course.

1. Consider the following R code describing the process by which students drop out of the
course:

set.seed(123)
experiment2 <- experiment1

experiment2$dropout_helper <- rbinom(n_experiment, 1, 0.2)

experiment2$dropout <- 0
experiment2$dropout[experiment2$dropout_helper == 1 &
experiment2$treatment == 1] <- 1

experiment2$post_score[experiment2$dropout == 1] <- NA

Describe how the dataset experiment2 differs from experiment1. Then compute the
same statistics as above (Exercise 2, Questions 1-5).

2. Alternatively, consider the following R code describing the process by which students
drop out of the course:

2
Applied Econometrics, Summer Term 2024

set.seed(123)
experiment3 <- experiment1

experiment3$dropout <- 0
experiment3$dropout[experiment3$post_score < 65] <- 1

experiment3$post_score[experiment3$dropout == 1] <- NA

Describe how the dataset experiment3 differs from experiment2. Then compute the
same statistics as above (Exercise 2, Questions 1-5).

3. Compare your answers in Questions 1 and 2. Explain how and why they differ.

Exercise 4
Suppose now that not all students who are eligible for the attending tutorials (i.e., those in the
treatment group) actually attend the tutorials. Consider the following R code:

set.seed(123)
experiment4 <- experiment1

# Compliance with treatment


experiment4$treat_comply <- 1
experiment4$treat_comply[experiment4$pre_score > 50] <- 0
experiment4$takeup <- experiment4$treat_comply * experiment4$treatment

# Data Generating Process


experiment4$post_score <- with(experiment4,
round(pre_score + 10*takeup + 2*covariate1
+ rnorm(n_experiment, mean = 0, sd = 10)))

1. Calculate an estimate of the ATE. Is it biased?

2. Calculate an estimate of the Intent to Treat Effect (ITT). Is it biased?

3. Use the treatment assignment as an instrument for taking the treatment. What does this
estimate?

Exercise 5
Suppose that at LMU there is a buddy system, where at the beginning of their studies each
student is assigned to another student, their ‘buddy’.
Suppose that the true data generating process has a direct effect of student-led tutorials, but
there is also a spillover effect from a stuident’s buddy’s treatment status.
Consider the following R code:

library(dplyr)
set.seed(123)
experiment5 <- experiment1

# Create groups of two

3
Applied Econometrics, Summer Term 2024

experiment5$group_buddy <- ceiling(experiment5$id / 2)

# Create a spillover indicator variable


experiment5 <- experiment5 %>%
group_by(group_buddy) %>%
mutate(treat_buddy = sum(treatment) - treatment)

# Data-Generating Process
experiment5$post_score <- with(experiment5, round(
pre_score + 10*treatment+ 5*treat_buddy + 2*covariate1
+ rnorm(n_experiment, mean = 0, sd = 10)
))

1. Calculate the direct effect of being treated. Is this estimate biased?

2. Calculate the spillover effect of a student’s buddy being treated. Is this estimate biased?

3. Suppose you do not know which students are in a buddy group together. Can you still
estimate spillover effects? If not, suggest how the underlying dataset would have to be
different for you to be able to estimate spillover effects.

You might also like