Professional Documents
Culture Documents
An Improved Bonferroni Inequality and Applications
An Improved Bonferroni Inequality and Applications
An Improved Bonferroni Inequality and Applications
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
An improvedBonferroniinequalityand applications
BY K. J. WORSLEY
Departmentof Mathematics,McGill University,
Montreal,Quebec,Canada
SUMMARY
We presentan improvedBonferroniinequalitywhichgives an upper bound forthe
probabilityofthe unionof an arbitrarysequence ofevents.The bound is constructedin
termsof the joint probabilityof pairs of events,which are representedby edges on a
graph. Examples of applicationsto periodicity,location shiftdetection,Kolmogorov-
Smirnovtests and outlierdetectionare given.
Some keywords:Bonferroniinequality; Periodicity;Location shiftdetection;Kolmogorov-Smirnovtest;
Outlier.
1. INTRODUCTION
Many authorshave used the Bonferroniinequalityto provideupperand lowerbounds
forthe probabilityof the union of a sequence of events A1, ..., An:
n n n n \ n
E pr(Ai)-EEpr(Ai niAj) < pr(U/= Ai) E
ipr(Ai). (1)
i=1 i<j i=i
2. AN IMPROVED BOUND
Usuallytheupperbound(1) is moreimportant becauseitprovidesa conservative
test,
yet it is not so accurateas the lowerbound.The upperboundis improvedby the
following result.
i= 1
UAi = Ap1 u (AP2\A2) . (APn\Apn) (3)
COROLLARY 1. We have
n n n-l
pr A /
pr(Ai)-A pr(Ai n Ai+1). (4)
i=l i=l ~~~~i=l1
Proof.Let
m
T= U {eiJk. i E Jk\jk},
k=1
The least upperbound is foundby maximizingthe secondtermin (2) over all possible
trees.An algorithmforfindingmaximumspanningtreesis given by Kruskal (1956). It
3. EXAMPLES
3-1. Peak periodsofa disease
Let Ni be the total occurrencesof a disease in the six monthperiod startingwith
month i (i = 1..., 12). David & Newell suggest a simple test for periodicity based on
V = max (I Ui I), where Ui = (Ni-Ni+ 6)/N' and N = Ni + Ni+ 6 (i = 1, ..., 6), the total
numberof occurrencesthroughoutthe year. If the occurrencesare independentand
equally likelyin any monththen U1,..., U6 are asymptoticallynormalconditionalon N,
with covariance Pi, i+h = 1 -h/3 between Ui and Ui+h (i = 1, ..., 6; h = 0, ..., 6-i). Let Ai
be the event that IUiI > c (i = 1,..., 6) so that pr (V > c) = pr (u Ai). David & Newell
findboundsfortheleveloccriticalpointsof V by equatingtheupperand lowerbounds(1)
to ocand solvingforc. An alternativeupperboundfornormalrandomvariables,givenby
Sid'ak (1968), is slightlylowerthan the Bonferroniupper bound (1):
n \ n
pr U Ai) 1-_ {1-pr(Ai)}. (5)
i=l i=l
Table 1. Bounds (a) forpr ( V > c) at levelc pointof V; (b) forpr (t > c) at levelocpointsoft
(a) c = pr(V > c) = (b) oc= pr(t > c) =
0 10 0 05 0-01 0-10 0 05 0-01
c = 2-247 c = 2-537 c = 3 095 c = 3-14 c = 3 66 c = 4-93
Bonferroniupper bound (1) 0-1478 00671 0-0118 0-124.2 0-0576 00104
Sldak upper bound (5) 0-1390 00652 0-0117
Improvedupper bound (4) 0-1170 0-0556 0-0105 0 1028 0-0508 0-0100
Bonferronilowerbound (1) 0-0981 00498 0-0100 00937 0-0492 0-0100
3-2. Locationshiftdetection
If Y1,..., Yn is a sequence of independentnormal observations,then the likelihood
ratio test fora shiftin location afterobservationi is based on t = max (I tiI ), wheretiis
the usual t statisticfora difference in mean betweenthe firsti observationsand the last
n-i observations for i = 1, .. ., n- 1 (Sen & Srivastava, 1975). Let Ai be the event that
tiI > c (i = 1,..., n-1), so that pr (t > c) = pr (u Ai). Hawkins (1977) finds upper
boundsforleveloccriticalpointsoft usingtheupperbound ( 1). Some exact criticalpoints
are obtained by Worsley(1979) usingextensionsof (1).
prove that the maximum spanning tree is T = {e12,e23, ..., en2,n_1}. For if we add any
edge eij (i < j + 1) that is not in T, to T and remove an edge ekk+ 1 forsome i < k < j then
the resulting tree has less weight since Pij < Pk, k+ . Hence (4) is the least upper bound.
These bounds are compared at some exact critical points for n = 10 observations in
Table 1(b). It can be seen that the upper bound (4) is closer to the true level octhan all
other upper or lower bounds.
tests
3.3. Kolmogorov-Smirnov
Let D denote the Kolmogorov-Smirnov type one-sample statistic to test goodness of
fit of a random variable X in the presence of unknown nuisance parameters 0. Durbin
(1975) obtained the distribution of D in terms of a Fourier transform,whereas Margolin
& Maurer (1976) used (1) and generalizations of inequalities given by Kounias. Let
F(X(i), 0) be the distribution function of X evaluated at the ith order statistic X(i) of a
sample of size n, with 0 estimated by the maximum likelihood estimator 0, and let
The maximum spanning tree is T = .es2, e23,o.., e67}, so that the least upper bound!
and
given by (4), is pr (D > 0(309) < 0 2057. The true value lies between 0 2030 and 0(202(15,
the upper and lower bounds given by (1) are 0-2562 and 0 1963. The bound (4) is more
accurate than all other upper and lower bounds calculated from probabilities
pr (Ai n Aj) by Margolin & Maurer.
3 4. Outlierdetection
The statistic commonly used to detect an outlier froma linear regression with normal
errors is the maximum absolute studentized least-squares residual. If the residuals
el,..., en all have the same variance then this is equivalent to the statistic
Z = max (Zi I) where Zi = ei/(Xej)' is the ith normed residual. Stefansky uses Z to test
for an outlier from a two-way factorial design. Let Ai be the event that
Zi I > z (i = 1, ..., n) so that pr (Z > z) = pr (u Ai). Stefansky finds bounds for level oc
critical points of Z by equating the upper and lower bounds (1) to c and solving for z.
Let Pi be the correlation between ei and ej. Then it can be shown that pr (Ai rn Aj) is a
4. CONCLUSION
The inequality (2) is sharpest when the events Ai and Ai have high positive
dependence.If A 1,..., Anare independentthenthelowerbound (1) is always closerto the
true value than the upper bound (2) whenever (2) is less than unity. For outlier
detection,(2) does not performwell because residualsare almostindependent.However
ifthe sequenceA1, ..., Anis such thatpr(Ai n Aj) < pr (Ak n Ak+ 1) wheneveri < k < j,
thenthe least upperbound is always (4), and thisbound appears to be veryclose to the
true value.
REFERENCES
CHEW, V. (1968). Simultaneouspredictionintervals.Technometrics 10, 323--31.
l)AVI i). H. A. (I1956). On theapplicationto statisticsofall elementary theorIeni
in probability.Biomiletr
ika 43.
85 91.
DAVID, H. A. & NEWELL, D. J. (1965). The identification
ofpeak periodsfora disease. Biometrics
21, 645-50.
DURBIN, J. (1975). Kolmogorov-Smirnov testswhenparametersare estimatedwithapplicationsto testsof
exponentialityand tests on spacings.Biometrika62, 5-22.
ELLENBERG, J. H. (1976). Testingfora singleoutlierfroma generallinearregression.Biometrics32, 637-45.
GALPIN, J. S. & HAWKINS, D. M. (1981). Rejection of a single outlier in two- or three-waylayouts.
Technometrics 23, 65-70.
HAWKINS, D. M. (1977). Testinga sequence of observationsfora shiftin location.J. Am. Statist.Assoc. 72,
180-6.
JOSHI, P. C. (1972). Some slippagetestsofmean fora singleoutlierin linearregression.Biometrika59, 109-
20.
KOUNIAS, E. G. (1968). Bounds forthe probabilityof a union,with applications.Ann. Math. Statist.39,
2154-8.
KRUSKAL, J. B. (1956). On the shortestspanningsubtreeof a graph and the travellingsalesman problem.
Proc. Am. Math. Soc. 7:1, 48-50.
KWEREL, S. M. (1975). Most stringent bounds on aggregatedprobabilitiesof partiallyspecifieddependent
probabilitysystems.J. Am. Statist.Assoc. 70, 472-9.
McDONALD, B. J. & THOMPSON, W. A. (1967). Rank sum multiplecomparisonsin one- and two-way
classifications.Biometrika54, 487-97.
MARGOLIN, B. J. & MAUTRER, W. (1976). Tests of the Kolmogorov-Smirnov typeforexponentialdata with
unknownscale, and relatedproblems.Biometrika63, 149-60.
SEN, A. & SRIVASTAVA, M. S. (1975). On tests fordetectingchange in mean. Ann. Statist.3, 98-108.
SIDAK, A. (1968). On multivariatenormalprobabilitiesofrectangles:Theirdependenceon correlations. Ann.
Math. Statist.39, 1425-34.