Professional Documents
Culture Documents
1 Using Pre-Experiment Data On Primary Metric For Variance Reduction
1 Using Pre-Experiment Data On Primary Metric For Variance Reduction
and
14
1X
Ȳpost,i = Yi,t , (2)
7 t=8
where i = 1, ..., 909.
The two treatment effect estimators considered here are (1) the raw differ-
ence in mean in Ȳpost,i between the treated and the control in the post period,
and (2) the regression estimator, regression Ȳpost,i on the treatment assignment
indicator using Ȳpre,i as a control variable/covariate. Let Di be the treatment
assignment indicator taking the value one if unit i was assigned treatment and
zero otherwise.
dE raw = 1 1 X
X
AT Ypost,i − Ypost,i , (3)
n1 n0
i:Di =1 i:Di =0
where n1 and n0 are the number of users in the treatment and control group,
respecitvely. For the regression estimator we find the least square estimate of
β2 from
Ypost,i = β0 + β1 Ypre,i + β2 Di + i . (4)
That is, βˆ2 = ATdE regression . Since, assuming that treatment assignment is
randomized, Di is independent of the outcome (Y), the relative difference in
variance of AT
dE raw and AT dE regression is exactly the squared correlation be-
tween Ypost,i and Ypre,i .
In this data set the squared correlation between Ypost,i and Ypre,i is approx-
imately 0.46, which implies that the expected relative variance reduction from
V ar(AT
dE regression )−V ar(AT
dE raw )
the regression adjustment is 46%, i.e., ≈ −0.46.
V ar(AT
dE raw )
1
0.6
0.3
Estimate
0.0
−0.3
−0.6
Raw Regression
Estimator
Figure 1: Distribution of estimates over random samples for the regression and
raw estimator
2
1.00
0.75
Estimator
Power
Regression
0.50
Raw
0.25
0.00
0.0 0.2 0.4 0.6 0.8
Hypothetical treatment effect
Figure 2: Power for the regression and raw estimator for various hypothetical
treatment effects.
expected from the variance reduction results, that the power is higher for the
regression estimator for all hypothetical treatment effects until the power is 1
for both estimators.
To more clearly display the gain in power in terms of sample size, Figure
3 displays the power curves for a fixed treatment effect of 0.5, over different
sample sizes. It is clear that using regression, the desired power (usually 0.8)
is achieved for much small sample sizes using regression rather than the raw
estimator.
3
1.0
0.9
0.8 Estimator
Power
Regression
Raw
0.7
0.6
100 150
Sample size
Figure 3: Power for the regression and raw estimator for various hypothetical
treatment effects.