Professional Documents
Culture Documents
Lecture BDS 9-23-24 Print
Lecture BDS 9-23-24 Print
testing, part 2
6 March 2024
Weak control
■ Testing our d hypotheses with a testing procedure such that (4) holds is known in
the literature as weak control. Here comes the definition (basically re-stating what
we have above).
■ Definition (weak control) We say that a testing procedure for testing the family of
hypotheses H1 , . . . , Hd provides weak control at level α if
10
FWER
■ Here comes the definition that introduces this notation.
■ Definition (FWER) For an arbitrary family of hypotheses Hj , j ∈ J, the
probability
PHj ,j∈J (reject at least one true hypothesis)
is called the family wise error rate (FWER).
■ With this definition we can write the question on the previous slide as
■ Note also that with this definition we can write the question on the previous slide
for the other examples on the previous slide as
and
PH1 ,...,Hd−1 ( reject at least one true hypothesis ) = ?
12
Strong control
■ Definition (strong control) We say that a testing procedure for the family of
hypotheses H1 , . . . , Hd provides strong control at level α if for all I ⊂ {1, . . . , d}
15
p-value
■ We denote the random variable p-value for the test statistic T by p̂(T ).
■ Assume we test for a normal distribution with known standard deviation σ the
(single) hypothesis H0 : µ ≤ 0 against H1 : µ > 0.
■ As test statistic T based on n observations X1 , . . . , Xn we use
Pn
i=1 Xi
T = √ .
nσ
α > 1 − Φ(T ).
■ The smallest α2 for which this holds is the p-value, denoted here by p̂(T ) and
taken here to be
p̂(T ) = 1 − Φ(T ).
■ Remark: This equation confirms once more that p-values are random as the test
statistic T is random.
■ On the next slide we look at the distribution of p̂(T ) under the (null) hypothesis.
2
For those interested in mathematical details it it not entirely correct to define α in (5) as the smallest
α. What we are actually looking for is the infinimum.
19
p-value (cont’d)
■ We note that p̂(T ) takes only values between 0 and 1 (as all p-values do).
■ Under the (null) hypothesis with µ = 0 the test statistic T has a standard normal
distribution.
■ Then we find for any u ∈ (0, 1)
■ Hence under the null hypothesis p̂(T ) has a uniform distribution on (0, 1).
■ This result holds in general.
20
p-value (cont’d)
The general result is as follows:
■ Theorem: Let T be a test statistic for testing a single hypothesis and p̂(T ) be the
corresponding p-value.
a) For θ ∈ ΘH0 we have
■ Remark: (i) Part b) just says that for test statistics with a continuous distribution
the p-value has a uniform distribution under the (null) hypothesis.
(ii) For a test statistic that does not have a continuous distribution part a) says that
under the (null) hypothesis the probability that the p-value is less than or equal to
u is less than or equal to the probability for this event under a uniform distribution.
21
Bonferroni
■ The following testing procedure is known as Bonferroni procedure and it gives
strong control for multiple testing; subject to the conditions in Theorem
(Bonferroni procedure) below.
■ Bonferroni procedure
◆ Before taking the sample decide on the level α.
◆ For test statistic Ti of hypothesis Hi , 1 ≤ i ≤ d, calculate the p-value:
p̂i = p̂i (Ti ).
◆ Now given the data (t1 , . . . , td ): For each i = 1, . . . , d if we have for the
observed p-value p̂i (ti ) ≤ αd reject Hi .
■ Remark (p-value notation) It is common in the literature, by a slight abuse of
notation (recall we use X for a rv and x for a particular value of X), to denote
both the p-value which is a random variable and its realization by p̂i .
23
Bonferroni (cont’d)
■ Theorem (Bonferroni procedure): For i = 1, . . . , d assume that for any u ∈ (0, 1)
and any J ⊆ {1, . . . , d} we have
24
Holm
Another method that controls the FWER strongly and that is based on p-values is
Holm’s procedure.
Holm procedure
■ Denote the increasingly ordered observed p-values (committing the above
described abuse of notation) by p̂(1) , . . . , p̂(d) and the associated hypotheses by
H(1) , . . . , H(d) .
■ Step 1: If p̂(1) ≥ αd , accept H1 , . . . , Hd , and stop. If p̂(1) < α
d reject H(1) and test
α
the remaining d − 1 hypotheses at level d−1 .
■ Step 2: If p̂(1) < αd , and p̂(2) ≥ d−1 α
accept the remaining hypotheses
α
H(2) , . . . , H(d) . If p̂(2) < d−1 reject H(2) and test the remaining d − 2 hypotheses
α
at level d−2 .
■ Continue like this.
25
Holm (cont’d)
■ Theorem: Assume the assumptions of the Theorem (Bonferroni procedure) are
satisfied. Then the Holm procedure strongly controls the FWER.
■ Proof: Let the true hypotheses be Hk , k ∈ K. Define p̂min = min{pk , k ∈ K}.
Let R be the rank of p̂min when ranking all p-values.3 Then, Holm’s procedure
rejects at least one true hypothesis only if
α α min α
p̂(1) < , . . . , p̂(R−1) < , p̂ < .
d d−R+2 d−R+1
We note that R is at most d − |K| + 1, cf. Exercise sheet 5. For the probability of
at least one false rejection we find
min α
PHk ,k∈K (reject any Hk ) ≤ PHk ,k∈K p̂ < .
d−R+1
α
This is a bit unpleasant, because both p̂min and d−R+1 are random, as R is
random. Yet, R is smaller than the non-random d − |K| + 1.
3
Exercise sheet 5 illustrates p̂min and R
26
Single & step down, p-values
Remarks:
■ The two procedures above can equivalently be written by adjusting the p-values
and comparing them to α:
◆ For Bonferroni the adjusted p-values are d · p̂i (ti );
◆ For Holm the adjusted p-values are (d − i + 1) · p̂(i) .
■ The Bonferroni procedure is a single step method in the sense that there is only one
cutoff point and any hypothesis with p-value less than this cutoff point is rejected.
■ Holm’s procedure is an example of a step down procedure. One starts with the
’most significant’ p-value and then proceeds to the second ’most significant’
p-value and so on. Step-down, because from the most significant p-value we move
in the direction of the least significant p-value.
■ Holm’s procedure has better power than Bonferroni’s procedure.
28