Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Permutation Test

Kosuke Imai

Harvard University

S TAT 186/G OV 2002 C AUSAL I NFERENCE

Fall 2019

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 1 / 15


Announcements

TF Section begins this week

First problem set will be posted on Friday

Gov 2002 students: find a project partner!

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 2 / 15


Randomized Controlled Trials (RCTs)

Why randomize treatment assignment in experiments?


1 makes the treatment and control groups “identical”
Joint distribution of all observed X and unobserved U pretreatment
confounders is identical:

P(X, U | T = 1) = P(X, U | T = 0)

U includes potential outcomes {Y (1), Y (0)}


Treatment assignment is statistically independent of X and U

{X, U} ⊥
⊥T and in particular {Y (1), Y (0)} ⊥
⊥T

Removes selection problem stochastically ! controlled experiments


2 enables us to formally quantify the degree of uncertainty

Potential problems of RCTs: sample selection, placebo effects,


noncompliance, missing data, spillover/carryover effects roles
of statistical methods

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 3 / 15


Randomization Inference vs. Model-based Inference

Randomization as the “reason basis for inference” (Fisher)


Randomness comes from the physical act of randomization, which
then can be used to make statistical inference
Also called design-based inference
Advantage: design justifies analysis

Contrast this with model-based inference, which assumes a


distribution for potential outcomes
Advantage of model-based inference: flexibility

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 4 / 15


Lady Tasting Tea (Fisher 1935. The Design of Experiments. Oliver and Boyd)

Does tea taste different depending on whether the tea was poured
into the milk or whether the milk was poured into the tea?
8 cups; n = 8
Randomly choose 4 cups into which pour the tea first (Ti = 1)
Null hypothesis: the lady cannot tell the difference
Sharp null – H0 : Yi (1) = Yi (0) for all i = 1, . . . , 8
Statistic: the number of correctly classified cups
The lady classified all 8 cups correctly!
Did this happen by chance?

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 5 / 15


Permutation Test
Frequency Plot Probability Distribution
cups guess actual scenarios . . .

0.5
1 M M T T

30

0.4
2 T T T T

probability
frequency

0.3
3 T T T T

20
4 M M T M

0.2
5 M 5 10
M M M

0.1
6 T T M M

0.0
0

7 T T M T
0 2 4 6 8 0 2 4 6 8
8 M M M M
correctly guessed 8 Number4of correctly6guessed cups Number of correctly guessed cups

8 C4 = 70 ways to do this and each arrangement is equally likely


What is the p-value?
No assumption, but the sharp null may be of little interest

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 6 / 15


of how regression coefficients were selected in the appendix.
5% as the Threshold
RESULTS
It is usual and convenient for experimenters to take 5 per cent.
Figures 1(b)(a) and 1(b)(b) show the distribution of z-scores7 for coefficients reported
as a standard levelrepresents
inof significance,
the APSR and the AJPS forin the
one- sensetests,
and two-tailed that they8 are
respectively. The dashed line
the critical value for the canonical 5% test of statistical significance. There is
prepared to ignore aall
clearresults which
pattern in these figures.fail to reach
Turning this standard.
first to the two-tailed tests, there is a dramatic
spike in the number of z-scores in the APSR and AJPS just over the critical value of 1.96
R. A. Fisher (1935). The Design of 1(b)(a)).
(see Figure Experiments. Oliver
The formation & Boyd
in the neighborhood of the critical value resembles

90
80
Publication bias:

70
p-hacking

60
Frequency
file drawer bias
50
40
30

Potential solutions:
20

pre-registration
10

statistical control of
0

multiple testing 0.16 1.06 1.96 2.86 3.76 4.66 5.56 6.46 7.36 8.26 9.16 10.06 10.96 11.86 12.76 13.66

z-Statistic

Figure 1(a). Histogram


(Gerber z-statistics, 2008.
andofMalhotra. APSR & Q.AJPS (Two-Tailed).
J. Political Sci.) Width of bars
(0.20) approximately represents 10% caliper. Dotted line represents critical z-statistic
Kosuke Imai (Harvard) (1.96) associated with p =Test
Permutation 0.05 significance levelStat186/Gov2002
for one-tailed tests.Fall 2019 7 / 15
Basic Setup

Units: i = 1, . . . , n
Treatment: Ti ∈ {0, 1}
Outcome: Yi = Yi (Ti )

Complete randomization of the treatment assignment


Exactly n1 units receive the treatment
n0 = n − n1 units are assigned to the control group
This differs from the Bernoulli randomization

We know the distribution of Ti :


n1
Pr(Ti = 1 | Yi (1), Yi (0)) =
n
Pn
for all i = 1, . . . , n and i=1 Ti = n1

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 8 / 15


Fisher’s Exact Test
2 × 2 table:
Treated (T = 1) Control (T = 0)
Pn Pn
Success (Y = 1) i=1 Ti Yi (1) i=1 (1 − Ti )Yi (0)
Pn Pn
Failure (Y = 0) i=1 Ti (1 − Yi (1)) i=1 (1 − Ti )(1 − Yi (0))

Total n1 n0

Test statistic: S = ni=1 Ti Yi (1)


P

Under complete randomization and the sharp null of no treatment


effect, the test statistic follows the hyper-geometric distribution:
m n−m
 
s n1 −s
Pr(S = s | On ) = n

n1
Pn
where m = i=1 Yi and On = {Yi (0), Yi (1)}ni=1 .
Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 9 / 15
Computation

Exact computation difficult when n is large

Monte Carlo approximation:


1 Fill in missing potential outcomes under the sharp null
2 Sample Ti according to complete randomization
3 Compute the test statistic
Can be made arbitrarily accurate by increasing number of draws

Analytical approximations:
n1 m mn0 n1  m
E(S | On ) = , and V(S | On ) = 1−
n n(n − 1) n
p
1 Normal: {S − E(S | On )}/ V(S | On ) ∼ N (0, 1)
2 Binomial(n1 , m/n)
Becomes accurate as n grows

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 10 / 15


The Project STAR (Mosteller. 1997. Bull. Am. Acad. Arts Sci.)

The Student-Teacher Achievement Ratio Project (1985–1989)


More than 10,000 students involved with the cost of $12 million
Effects of class size in early grade levels
3 arms: Small class, Regular-sized class, Regular class with aid

Long-term impact of class size:


Small class Regular-sized class

Graduate 754 892

Not graduate 148 189

Total 902 1081


Exact p-value: 0.28 (one-sided), 0.55 (two-sided)
Asymptotic p-value: 0.26 (one-sided), 0.53 (two-sided)

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 11 / 15


Rank-sum Tests
Fisher’s exact test assumes binary outcome
Rank-sum tests are often used for continous outcome
Rank of the outcome for unit i: Ri (Y)
Wilcoxson’s rank-sum statistic:
n
X
S = Ti · Ri (Y)
i=1
1 symmetric
2 moments (assume no tie):
n1 (n + 1) n0 n1 (n + 1)
E(S | On ) = , V(S | On ) =
2 12
3 reference distribution does not depend on index
Mann-Whitney U test statistic:
n1 (n + 1)
U = S−
2
Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 12 / 15
The Project STAR Revisited

Effect of kindergraden class size on 8th grade reading score:


small class regular class
0.012

0.012
0.008

0.008
Density

Density
0.004

0.004
0.000

0.000
400 500 600 700 800 900 400 500 600 700 800 900

Wilcoxon’s rank-sum test (there are some ties):


p-value < 0.001

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 13 / 15


General Procedure for Permutation Tests

1 Specify sharp null hypothesis


Typically, H0 : τ0i = Yi (1) − Yi (0) where we set τ0i = 0 for all i
No effect implies no heterogenous effect, no spillover effect, etc.

2 Choose a test statistic S = f ({Yi , Ti , τ0i }ni=1 )


Pn
Fisher’s exact test statistic: S = i=1 Ti (Yi − τ0i )
Other commonly used test statistics include rank sum and
difference-in-means

3 Compute the reference distribution and p-value based on the


randomized distribution of treatment assignment
Exact distribution in small samples
Large-sample approximation
Monte Carlo approximation as a general strategy

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 14 / 15


Summary

Randomization of treatment assignment as a basis for inference


design-based, assumption-free inference
Inference over repeated (hypothetical) randomization
sample inference rather than population inference

Sharp null hypothesis:


implies no effect for every unit
may not be of interest but serves as a starting point of analysis

Reading: I MBENS AND RUBIN , C HAPTER 5

Kosuke Imai (Harvard) Permutation Test Stat186/Gov2002 Fall 2019 15 / 15

You might also like