Professional Documents
Culture Documents
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
Bootstrapping The General Linear Hypothesis Test: Pedro Delicado
hypothesis test
Pedro Delicado
Departamento de Estadstica y Econometra, Universidad Carlos III de Madrid, 28903
Getafe (Madrid), Espa
na.
Introduction.
The use of bootstrap methodology in hypothesis testing essentially based on approximating the critical values by bootstrap distributions has received less attention
that its application in other problems like, for instance, the construction of confidence
regions. Here are some recent references. Beran (1988) studies the asymptotic error
in level of bootstrap tests and the improvement given by prepivoting the test statistic.
In Hinkley (1988) some general ideas about bootstrap testing are briefly discussed.
Romano (1988) uses the bootstrap to aproximate critical values of nonparametric tests
based on measures defined on the empirical distribution. In Boos and Brownie (1989)
1
and Zhang and Boos (1992) the use of bootstrap methods for testing homogeneity of
variances and covariances matrices, respectively, is considered; good (asymptotic and
simulated) results in level and power are obtained. The bootstrap approximation, resampling U- and V-statistics, is also used in Boos, Janssen and Veraverbeke (1989)
for testing homogeneity of scale in the two-sample problem. Hall and Wilson (1991)
and Hall (1992) insist on two guidelines that we analyze in Section 2: usage of pivotal functions and resampling reflecting the null hypothesis. In Mammen (1992) the
convergence of bootstrapped F-test in linear models is proved.
This paper is concerned with the use of bootstrap idea in hypothesis testing. Section 3 is dedicated to present our work for testing a general linear hypothesis in a linear
model. Following the second guideline cited above, we propose a natural modification
of the classical F-statistics that permits resampling under the null hypothesis, though
the data fail to comply with it. In Section 4, this proposal and the basic idea of resampling taking into account the model in consideration is analyzed with some detail
in the framework of the one-way analysis of variance. In Section 5, the corresponding
behaviour is illustrated by reporting the results of a simulation study.
Naturally, the duality between hypothesis testing and confidence regions is maintained
under bootstrap methodology. That leads to consider the first guideline cited: use of
(asymptotically) pivotal quantities, that could be extended to the use of methods with
good behaviour in the problem of confidence interval construction. In general, acting
in this way will imply an improvement in the test level.
However, there is an important conceptual difference when both problems are considered under the bootstrap approach. In hypothesis testing accurate estimates of the
critical values are needed, even if the data had their origin in the alternative hypothesis. This demand could invalidate some direct approximations to the distribution of
interest. Consider, as in Hall and Wilson (1991), a one-dimensional parameter situation and the simple testing problem H0 : = 0 against H1 : 6= 0 . Given that
T = T (X1 , . . . , Xn ) is a good estimator of the unknown , a reasonable test could be
based on the difference T 0 . If our goal is in relation to its level or p-value, our
interest lies in the distribution of T 0 under H0 . A direct and nave application
2
of the bootstrap method would lead to estimate this distribution using the statistic
T 0 , where T is the value of T in the resampling, that is, under the empirical
distribution of (X1 , . . . , Xn ). In Hall (1992, Sec.3.12) it is shown the bad behaviour of
the power function of the corresponding test. The trouble here is that T 0 does not
approximate the null hypothesis when the sample comes from a parameter far away
from 0 .
Hall and Wilson (1991) propose to estimate the T 0 distribution by means of the
statistic T T . Note that this corresponds to resampling the quantity T0 = T
instead of T 0 . In the next section we extend this idea to testing a general linear
hypothesis in a linear model.
It will be convenient to consider the coordinate-free version of the linear model. Let
Y = + e, where Y denotes the n-dimensional vector of observations, is the vector
of means belonging to the subspace V IRn with dimension p < n, and e is the vector
of independent errors ei F such that E(e) = 0. We want to test H0 : V0 versus
6 V0 , being V0 V a subspace with dimension p0 < p. The usual F-statistic can be
written as
T = T (Y ) =
(3.1)
(3.2)
Comments:
i) Since it is based on Y , T0 does not relies on the hypothesis under which the
data are obtained; in other words, it is invariant against the hypothesis generating the
data. Besides, T0 agrees with T under H0 .
ii) A direct reasoning leading to T0 is the following. Since we attempt to approximate the null distribution of T , let us take an arbitrary 0 V and transform Y to
Y0 = Y + 0 . The vector Y0 verifies H0 and, we are naturally conducted to
kPV |V0 Y0 k2 /(p p0 )
kPV |V0 (Y )k2 /(p p0 )
T (Y0 ) =
=
= T0 (Y, ).
kPV Y0 k2 /(n p)
kPV (Y )k2 /(n p)
Note that the above expression is independent of the initial vector 0 V .
Therefore, the bootstrap distribution of T0 (Y, ) is the null bootstrap distribution
of T ; consequently, the critical value t should be taken verifying
P (T0 = T0 (Y , ) t | Y ) = 1 .
(3.3)
We see that the modification of the F-statistics arises in a natural and direct way.
In some sense, this presentation completes the one given in Mammen (1992). Moreover,
his results assure the convergence of our proposal.
We now consider the previous proposal in the particular case of testing equality of
means in the one-way model.
4.1
We want to test H0 :
1 =
Pp
i=1
spaces spanned, respectively, by the vector 1n = (1, .n). ., 1)0 and by the p vectors in
0
IRn : (1n1 , 0)0 , (0, 1n2 , 0)0 , . . . , (0, 1np )0 . It is well know that
P
ni (Y i Y )2 /(p 1)
T (Y ) = P iP
.
2
i
j (Yij Y i ) /(n p)
(4.1)
2
ni (X i X)2 /(p 1)
i ni (ei e) /(p 1)
T0 (Y, ) = P P
=
,
P
P
2
2
i
j (eij ei ) /(n p)
i
j (Xij X i ) /(n p)
i
4.2
P
j
Xij /ni , X =
P P
i
(4.2)
Xij /n.
Resampling schemes.
T0
2
ni (X i X )2 /(p 1)
i ni (ei e ) /(p 1)
=P P
,
=
P
P
2
i
j (eij ei ) /(n p)
i
j (Xij X i ) /(n p)
i
(4.3)
i = eij + Y i Y i = eij Fn .
where Xij = Yij
In practice, the simulation of the T0 distribution will be done generating B independent samples from the empirical distribution of the n residuals eij , that is, considering
5
4.3
The idea of reflecting the model hypothesis in the bootstrap scheme, leads us to consider the resampling in a population of residuals where the empirical variances in each
subpopulation are equal; without loss of generality we take this common value equal
to 1. This leads us to replace (4.3) by
P
TN
2
i ni (eN i eN ) /(p 1)
=P P
,
2
i
j (eN ij eN i ) /(n p)
(4.4)
j (Yij
Y i )2 /ni .
Note that the statistic TN can be also obtained in the following way. Assume
initially heteroscedasticity and consider, as alternative to (4.2),
P
2
i ni (W i W ) /(p 1)
T0N (Y, , 1 , . . . , p ) = P P
,
2
i
j (Wij W i ) /(n p)
where Wij = (Yij i )/i , and i2 is the populational variance of Pi . If there exists
homoscedasticity, T0N becomes T0 and its distribution is the same as the distribution
6
of T under H0 . It follows inmediately that the bootstrap distribution of T0N
coincides
with the distribution of TN . Heuristically, we should expect that this statistic will
improve the approximation to the null distribution.
The simulation of TN will be done similarly as in Subsection 4.2., but starting with
the normalized residuals eN ij . Both schemes above cited are still valid using now the
normalized residuals. In the next section, we will denote them resampling R3 and R4 .
Remark: The analysis and comparison of different residual resampling schemes
adapted to the assumptions are frequent in the bootstrap literature on linear models.
The first suggestion is already in Efron (1979); in fact, our scheme R1 adapts his
suggestion to the one-way model. Close to the resamplings presented above are those
considered in Boos and Brownie (1989) and Boos et al. (1989). A wide discussion on
residual resampling in linear regression can be found in Wu (1986) (see also Liu (1988)).
Likewise, residual resamplings for bootstrap generalized linear regression models are
considered in Moulton and Zeger (1991).
A simulation study.
Error distributions
Resamplings
G1 : N (0, 1)
G2 : E(1)
R1
R3
R1
.053
.055
R3
R2
R4
.049
.032#
.042
.096
Population sizes
(10, 10, 10)
(5, 10, 15)
(20, 20, 20)
(10, 20, 30)
= .05
.051 .048
= .1
.099
.099
.095 .096
.098
.080
= .05
.038
.042
.044 .041
.060
.043
.052
= .1
.091
.090
.109 .111
.115
.090
.110
= .05
.043
.047
.046 .047
.050
.043
.050
= .1
.096
.093
.097 .097
.112
.094
.105
= .05
.046
.043
.057 .056
.063
.043
.052
= .1
.101
.097
.110 .107
.106
.090
.097
5.1
resampling R2 are slightly better than with R1 and both are dominated by the results
using R4 .
5.2
We have also verified the good approximation to the true critical values and the high
empirical power provided by the bootstrap test when the data come from populations
with unequal means. Our first illustration will use some of the combinations considered
in Section 5.1, being now 1 = 1, 2 = 0, 3 = 1.
(Insert Figure 1 about here)
Figure 1 depicts box-plots (with whiskers ending at the extreme values) of 1000
bootstrap estimated critical values, ti , obtained from (3.3). The errors belong to
group G1 with N (0, 1) and G2 with shifted exponential, = 1. (The true values are
F2,27,.05 = 3.35 in G1 and were obtained by simulation using 5000 trials in G2 .) Note
the concentration around the true critical values, and the similar results given by R1
and R3 .
We also consider the general alternative HA : 1 = , 2 = 0, 3 = , (0, 1],
for the above 28 combinations. We obtained similar results for = .05 and = .1 and
we will report the one corresponding to = .05.
Table 2 gives the proportion of trials rejecting the equality of means, that is, the
power, under the above alternatives = .1, .4, .7, 1 for the combinations. In parenthesis
are estimates of the power obtained using the correct critical values. From now on the
true values will be estimated by simulation using 5000 trials. It is worthwhile noting
the high values always attained for = 1, and for = .7 when n = 60.
For the general alternative, we have the following comments. In groups G1 and
G2 , the approximation to true power provided by resampling R1 and R3 is good for
both sizes. In G3 and equal population sizes ((10, 10, 10), (20, 20, 20)), resamplings R1
and R4 gave analogous and excellent approximations to the true power; resampling R2
behaved worse, specially when n = 30.
Error dist.
G1 : N (0, 1)
Resamplings
R1
R3
(S)
G2 : E(1)
R1
R3
R1
R2
R4
(S)
Pop. sizes
.1
.055 .057
(.072)
.064 .047
.057
(.068)
.4
.398 .384
(.390)
.347 .279
.318
(.346)
.7
.776 .771
(.786)
.787 .713
.764
(.806)
.945 .946
(.957)
.957 .917
.947
(.961)
.1
.056 .049
(.058)
.086 .067
.076
(.068)
.4
.292 .295
(.318)
.338 .269
.318
(.285)
.7
.725 .726
(.730)
.690 .595
.658
(.656)
.926 .927
(.931)
.930 .867
.908
(.905)
.1
.105 .100
(.091)
.076 .061
.063
(.084)
.4
.625 .619
(.614)
.616 .589
.607
(.616)
.7
.970 .969
(.970)
.972 .954
.970
(.966)
1.00 1.00
(.999)
.999 .996
.998
(.997)
.1
.084 .082
(.062)
.104 .066
.084
(.083)
.4
.556 .552
(.563)
.544 .493
.534
(.511)
.7
.946 .945
(.936)
.930 .910
.921
(.929)
.998 .998
(.996)
.991 .985
.988
(.992)
10
The situation G3 and different population sizes is illustrated in Figure 2 that shows,
as function of [0, 1], = 0(.1)1, the true (simulated) power for the test with = .05
level and the power curves for resampling R2 and R4 . R1 was not included for the sake
of presentation clarity; its graphic was always above the R4 curve, its approximation
was in general worse that the one corresponding to R4 and practically in agree with this
one for ' 1. With respect to Figure 2 we note that the true curve is nearly always
between R2 and R4 curves. With n = 30, the true power is better adjusted by R2 for
alternatives close to H0 (approximately .4) and by R4 for the opposite situation.
With n = 60, extremly good approximations are obtained and the differences between
R4 and R4 vanish.
(Insert Figure 2 about here)
5.3
Although our development does not deal with heteroscedastic situations, we planned
to check how our proposal would perform under unequal variances. With this aim,
we considered the groups G1 and G3 . In G1 the normal distributions had standard
deviations i = i, i = 1, 2, 3; in G3 we included standard normal, shifted exponential
with = 2, and t3 ( = 3) distributions. The total sizes (n = 30, 60) were assigned to
the populations in three ways: balanced, higher sizes to higher variances, and viceversa.
For each assignation, resamplings R1 and R2 were considered. Both groups and sizes
provided similar results.
As a summary of empirical conclusions about the level approximation we have: the
resampling R1 was very unappropriated with unbalanced sizes, the resampling R1 was
beaten by R2 in all the situations; therefore, if we suspect of possible heteroscedasticity,
the resampling starting from different populations will provide more accurate levels.
The best results occur when there is correspondence between sizes and variability in
populations.
With respect to the power, the above behaviour is maintained and we had good
approximations to the true (simulated) power even with total size n = 30. This is
illustrated in Figure 3. It shows, for G1 and n = 30, the power curves for R1 and R2
(and two population sizes: ((5, 10, 15), (15, 10, 5)) ) as function of [0, 1], = 0(.1)1.
Note that = 0 corresponds to the level approximation. The low true power is a
11
5.4
Finally, we will show the inaccuracy of the nave bootstrap directly based on the Fstatistic of (4.1), that is, when the resampling uses
P
TD
ni (Y i Y )2 /(p 1)
=P P
,
2
i
j (Yij Y i ) /(n p)
i
where Yij = eij + Yi , and eij follow one of the empirical distributions presented in
Sections 4.2 and 4.3. We will refer this procedure as direct bootstrap.
We began conducting 1000 trials for the situation and combinations considered in
nominal levels in fact, only 5 of 56 cases considered were different of zero with
a maximum value of .004. We also studied the distribution of the direct critical
values, tD
i , i = 1, . . . , 1000, = .05, .1, corresponding to 4 combinations: groups
G1 and G2 with resampling R1 , and G3 with R1 and R3 .
were ni = 10. This study was done with data from H0 and from the alternative
HA : 1 = 1, 2 = 0, 3 = 1.
The distributions of the direct critical values were very similar in the four combinations. Figure 4 shows box-plots summarizing the distributions under H0 and HA ,
= .05, n = 30, in the combination G1 and R1 . For comparison, we have also included
box-plots for the distributions of the critical values, ti , derived from our proposal.
(Insert Figure 4 about here)
Two points clearly stand out. Whether the data came from H0 or from HA , the
distribution of critical values tD
i is useless to approximate the true critical value
(F2,28,.05 = 3.35). On the contrary, both from H0 and HA , the distributions of proposed
critical values ti are extremely similar and very concentrated around the true critical
value.
12
Acknowledgement
We thank the Co-Editor and two referees for their helpful comments and suggestions.
References
Beran, R., Prepivoting tests statistics: A bootstrap view of asymptotic refinements,
J. Amer. Statist. Assoc., 83 (1988) 687697.
Boos, D. D. and Brownie, C., Bootstrap methods for testing homogeneity of variances,
Technometrics,31 (1989) 6982.
Boos, D. D., Janssen, P. and Veraverbeke, N., Resampling from centered data in the
two-sample problem, J. Statist. Plann. Inference, 21 (1989) 327345.
Efron, B., Bootstrap methods: another look at the Jacknife, Ann. Statist., 7 (1979)
126.
Hall, P., The Bootstrap and Edgeworth Expansion (Springer-Verlag, New York, 1992)
Hall, P. and Wilson, S.R., Two guidelines for bootstrap hypothesis testing, Biometrics,
47 (1991) 757762.
Hinkley, D. V., Bootstrap methods, J. Roy. Statist. Soc. Ser. B,50 (1988) 321337.
Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 16
(1988) 16961708.
Mammen, E., When does bootstrap work? (Lecture Notes in Statistics, Vol. 77,
Springer-Verlag, New York, 1992)
Moulton, L. H. and Zeger, S.L., Bootstraping generalized linear models, Comp. Stat.
Data Anal., 11 (1991) 5363.
Romano, J. H., A bootstrap revival of some nonparametric distance tests, J. Amer.
Statist. Assoc., 83 (1988) 698708.
Wu, C. F. J., Jacknife, bootstrap and other resampling methods in regression analysis,
Ann. Statist., 14 (1986) 16211295.
Zhang, J. and Boos, D. D., Bootstrap critical values for testing homogeneity of covariance matrices, J. Amer. Statist. Assoc., 87 (1992) 425429.
13
14
15
16
Data from H0
Modified statistic
Direct statistic
Data from HA
Modified statistic
Direct statistic
17