Problem Set 1: Randomized Control Trials: Exercise 1

Sebastian Hager Applied Econometrics – LMU Munich
Summer Term 2024
Problem Set 1: Randomized Control Trials
Exercise 1
You have data from a survey of 200 individuals’ mental health scores (Mi , from 1 to 10, 10
being the best) and on whether these individuals took a 1-month vacation in the past year (Vi ).
The group of people who took the vacation (Vi = 1; number of individuals: n1 = 77) have an
average mental health score of 6.3 (standard deviation: s1 = 2.4) whereas the group who did
not take a vacation (Vi = 0; number of individuals: n0 = 123) have an average mental health
score of 8.2 (standard deviation s0 = 1.2).
1. You compare the group means. Show how the difference in means relates to the average
treatment effect (ATE) of taking a 1-month vacation. Explain why the difference in group
means is not a good estimator of the ATE.
2. Show that under random assignment of vacation status Vi , one can identify the ATE using
the difference in group means.
3. Suppose that the assignment of Vi was randomized. Calculate an estimate for the ATE.
4. Calculate the standard error of the ATE estimate.
5. Is the ATE estimate significantly different from zero at the 5% significance level?
Exercise 2
The LMU wants to assess the efficacy of student-led tutorials for an intermediate econo-
metrics course (Econometrics 2). Students are randomly assigned to either the treatment
or control group: students in the treatment group receive additional econometrics tutorials
(treatment=1) in addition to the lectures and classes every student receives.
For each student, we observe their grade in Econometrics 1, a beginner-level econometrics

course, (pre score) as well as their score in Econometrics 2 (post score). We also observe
the additional variable covariate1 which captures students’ answers to a survey in their first
year about how much they like statistics.
The following R code generates a dataset for 1000 students with the described variables.
# Set seed for reproducibility

set.seed(789)
# Number of students
n_experiment <- 1000
# Generate variables
experiment1 <- data.frame(
id = 1:n_experiment,
1
Applied Econometrics, Summer Term 2024
treatment = rbinom(n_experiment, 1, 0.5),

pre_score = round(rnorm(n_experiment, mean = 50, sd = 10)),
covariate1 = round(runif(n_experiment, min=0, max=30)),
covariate2 = round(runif(n_experiment, min=20, max=28))
)
# Data Generating Process

experiment1$post_score <- with(experiment1,
round(pre_score + 10*treatment + 2*covariate1
+ rnorm(n_experiment, mean = 0, sd = 10)))
1. Calculate the averages for the pre-score and the observed covariates for the treatment
and control groups. Generally speaking, does balance in observables imply that the
randomization was done correctly?
2. Calculate the average post-score for the treatment and control groups.
3. Without estimating a regression, calculate an estimate for the Average Treatment Effect
(ATE) of receiving tutorials. Calculate the standard error of your estimate for the ATE.
4. Using a regression, estimate the ATE. Compare your results to those above.
5. Add controls to your regression. Explain how and why your results change.
Exercise 3
Suppose that not all students complete their econometrics course.
1. Consider the following R code describing the process by which students drop out of the
course:
set.seed(123)
experiment2 <- experiment1
experiment2$dropout_helper <- rbinom(n_experiment, 1, 0.2)
experiment2$dropout <- 0
experiment2$dropout[experiment2$dropout_helper == 1 &
experiment2$treatment == 1] <- 1
experiment2$post_score[experiment2$dropout == 1] <- NA
Describe how the dataset experiment2 differs from experiment1. Then compute the
same statistics as above (Exercise 2, Questions 1-5).
2. Alternatively, consider the following R code describing the process by which students
drop out of the course:
2
set.seed(123)
experiment3$dropout <- 0
experiment3$dropout[experiment3$post_score < 65] <- 1
experiment3$post_score[experiment3$dropout == 1] <- NA
Describe how the dataset experiment3 differs from experiment2. Then compute the
same statistics as above (Exercise 2, Questions 1-5).
3. Compare your answers in Questions 1 and 2. Explain how and why they differ.
Exercise 4
Suppose now that not all students who are eligible for the attending tutorials (i.e., those in the
treatment group) actually attend the tutorials. Consider the following R code:
set.seed(123)
# Compliance with treatment

experiment4$treat_comply <- 1
experiment4$treat_comply[experiment4$pre_score > 50] <- 0
experiment4$takeup <- experiment4$treat_comply * experiment4$treatment
# Data Generating Process

experiment4$post_score <- with(experiment4,
round(pre_score + 10*takeup + 2*covariate1
+ rnorm(n_experiment, mean = 0, sd = 10)))
1. Calculate an estimate of the ATE. Is it biased?
2. Calculate an estimate of the Intent to Treat Effect (ITT). Is it biased?
3. Use the treatment assignment as an instrument for taking the treatment. What does this
estimate?
Exercise 5
Suppose that at LMU there is a buddy system, where at the beginning of their studies each
student is assigned to another student, their ‘buddy’.
Suppose that the true data generating process has a direct effect of student-led tutorials, but
there is also a spillover effect from a stuident’s buddy’s treatment status.
Consider the following R code:
library(dplyr)
set.seed(123)
# Create groups of two
3
experiment5$group_buddy <- ceiling(experiment5$id / 2)
# Create a spillover indicator variable

experiment5 <- experiment5 %>%
group_by(group_buddy) %>%
mutate(treat_buddy = sum(treatment) - treatment)
# Data-Generating Process
experiment5$post_score <- with(experiment5, round(
pre_score + 10*treatment+ 5*treat_buddy + 2*covariate1
+ rnorm(n_experiment, mean = 0, sd = 10)
))
1. Calculate the direct effect of being treated. Is this estimate biased?
2. Calculate the spillover effect of a student’s buddy being treated. Is this estimate biased?
3. Suppose you do not know which students are in a buddy group together. Can you still
estimate spillover effects? If not, suggest how the underlying dataset would have to be
different for you to be able to estimate spillover effects.

Problem Set 1: Randomized Control Trials: Exercise 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Problem Set 1: Randomized Control Trials: Exercise 1

Uploaded by

Copyright:

Available Formats

Sebastian Hager Applied Econometrics – LMU Munich

Summer Term 2024

Problem Set 1: Randomized Control Trials

4. Calculate the standard error of the ATE estimate.

For each student, we observe their grade in Econometrics 1, a beginner-level econometrics

# Set seed for reproducibility

treatment = rbinom(n_experiment, 1, 0.5),

# Data Generating Process

experiment2$dropout_helper <- rbinom(n_experiment, 1, 0.2)

# Compliance with treatment

# Data Generating Process

1. Calculate an estimate of the ATE. Is it biased?

2. Calculate an estimate of the Intent to Treat Effect (ITT). Is it biased?

# Create groups of two

experiment5$group_buddy <- ceiling(experiment5$id / 2)

# Create a spillover indicator variable

1. Calculate the direct effect of being treated. Is this estimate biased?

You might also like