The Benefits of Probability Proportional To Size Sampling in Cluster Randomized Experiments

Michael J.
Higgins
The benefits of probability proportional to size
Postdoctoral Fellow, Department of Politics
sampling in cluster randomized experiments Princeton University
Introduction Quantity of interest Current practice

Cluster-randomized experiments (CREs), are performed when treatment is We are interested in inference on: I Sampled clusters: Either simple random sample or non-random selection.
randomized across clusters (groups) of units of interest instead of across b Xnc nc
b X I Blocking: Matched pair on size of cluster.
the individual units. The number of clusters in the experiment and the 1 X 1 X
PATE = ykc1 − ykc0
number of observations obtained within each cluster are typically n n Under simple random sampling of clusters and complete randomization of
c=1 k=1 c=1 k=1
restricted by a budget constraint. The assignment of treatment to clusters treatment, the unbiased Horvitz-Thompson estimator is not generally
A location shift involves adding a constant α to all potential outcomes: invariant under location shifts which can artificially inflate variances.
makes analysis difficult; under the Neyman-Rubin Causal Model, no
estimator an average treatment effect currently exists that is both unbiased y∗kci = ykci + α Blocking does not fix this problem.
and invariant to location shifts in potential outcomes. We show that, when Estimators are invariant: values do not change when outcomes are shifted. Current practice is for researchers to use matched pair blocking and use
the quantity of interest is the population average treatment effect (PATE), either difference-in-means (DIM) or Des-Raj (Middleton and Aronow 2014)
such estimators can be obtained by initially sampling clusters with ykci: Potential outcome of unit k in cluster c under treatment i.
estimators for the PATE. DIM will generally be biased if outcomes are
probability proportional to size (PPS). n/b/nc: Number of units/clusters/units in cluster c.
correlated with cluster sizes. Des-Raj requires the inclusion of an additional
tuning parameter; estimation of this parameter will bias the estimator.
Cluster randomized experiments procedure PPS sampling without replacement
nsi: number of units that receive treatment i.
A probability-proportional-to size-sample of size s drawn without Sc/Skc: Cluster/unit sampling indicators
1) Sample clusters from a population 2) Form blocks of sampled clusters Tci/#Ti: Treatment indicator/number of clusters receiving treatment i.
(16 clusters, 56 units in total) replacement (PPSWOR) is any such sample satisfying:
sc: Number of units/clusters/units in cluster c.
● ● ● ● ●
● ● ● ● ●
P(Sample cluster c) = snc/n.
● ● ●
● ●
Simulation results
●
However, this condition does not uniquely define a sampling scheme.

● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
●
● ● ●
● ●
● ● ● ● ●
● ● ● ● ●
Generally, joint probabilities of being sampled must also be specified:
● ●
●
●
We simulate potential outcomes under a model with several cluster-level
πcc0 ≡ P(Sample clusters c, c0).
●
● ●
● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ●
●
● ●
● ●
● ●
●
●
● covariates and unit-level covariates, where treatment effects depend on
●●
●
●
● ● ● ● ●
Sunter (1986) provides an efficient method for drawing a PPSWOR sample, cluster sizes. We consider three designs:
● ● ●
●
● ●
●
● ●
●
●●
●
●
●
●
which is implemented in the R package SunterSampling. 1. Clusters sampled using PPSWOR, block on all cluster-level covariates.
● ● ● ● ●
● ● ● ● ●
2. Clusters sampled using SRS, block on all cluster-level covariates.
Estimators
● ●
● ●
● ●
● ●● ● ● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
3. Clusters sampled using SRS, block only on size.
● ● ● ● ●
● ● ● ● ●
Under PPSWOR sampling, the quantity: Our simulation includes 16 distinct clusters with sizes varying between 30
3) Assign treatment to clusters 4) Sample units from clusters and 600 units. 15 units are sampled from each cluster. Under each design,
b nc
● ● ● ● ● ● ● ● ● ● X ScTci X ykciSkc we perform 10,000 CREs.
● ●
● ●
●
●
● ●
● ●
●
●
● µ̂i =
●
●
●
● ●
● ● ●
●
●
●
● ●
● ● ● #Ti sc PPSWOR SRS with all cluster covariates SRS with all cluster covariates
● ● ● ● ● ● ● ● ● ●
c=1 k=1 Mean = 447, SD = 1339, Truth = 443 Mean = 336, SD = 1118.9 Mean = 333, SD = 1204
●
●
● ●
●
● ●
● ●
● ● ●
●
● is an unbiased estimator of the population mean of units under treatment i,
● ● ● ●
●● ●
µi, that is a LINEAR function of potential outcomes. Hence, b

PATE = µ̂1 − µ̂0
● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
●●
●
●
●●
●
●
is an unbiased and location-invariant estimator of the PATE.
●
● ● ●
●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ●
The variance and covariance of this estimator are:
● ●
●
!  b

●
● ●
●
● ●
● ●
● ● ● 1  2 X nc 2 
Var(µ̂i) = E σi,bet + σc,i,with 
●● ● ●● ●
● ● ● ●

● ●
●
●
●
●● ● ●
● ● ● ● ● ● ● ● ● ● #Ti  n 
 c=1 
b X
π µ µ
! X
Common settings for CREs +E
1 
− 1 
 cc0 ct 0
ci

− µ2i  ,

#Ti  0
s(s − 1) 
c=1 c ,c
CREs are used when assigning treatment to units is infeasible or when b X
b
necessary to avoid interference between treatment groups. 1 X
cov(µ̂1, µ̂0) = πcc0 µc1µc00 − µ1µ0.
I Developing countries: Villages receive treatment.
s(s − 1)
c=1 c0,c -3000 -1000 0 1000 3000 -3000 -1000 0 1000 3000 -3000 -1000 0 1000 3000
I Education: Classrooms receive treatment. Estimate Estimate Estimate

Where σ2i,bet denotes the across-cluster variance for treatment i, and σ2c,i,with PPSWOR sampling eliminates the bias seen when using a SRS of clusters.
I Medical Trials: Treatment given to clinics/medical practices.
denotes the finite sample variance within cluster c. In this case, standard errors are slightly higher for PPSWOR sampling.
www.princeton.edu/∼mjh5 mjh5@princeton.edu

The Benefits of Probability Proportional To Size Sampling in Cluster Randomized Experiments

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Benefits of Probability Proportional To Size Sampling in Cluster Randomized Experiments

Uploaded by

Copyright:

Available Formats

Michael J.

Introduction Quantity of interest Current practice

However, this condition does not uniquely define a sampling scheme.

µi, that is a LINEAR function of potential outcomes. Hence, b

I Education: Classrooms receive treatment. Estimate Estimate Estimate

You might also like