Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Better Bootstrap Confidence Intervals

Author(s): Bradley Efron


Source: Journal of the American Statistical Association , Mar., 1987, Vol. 82, No. 397
(Mar., 1987), pp. 171-185
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association

Stable URL: https://www.jstor.org/stable/2289144

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to
digitize, preserve and extend access to Journal of the American Statistical Association

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Better Bootstrap Confidence Intervals
BRADLEY EFRON*

We consider the problem of setting approximate confidence intervals for in the Fieller example is obtained by inverting the end-
a single parameter 0 in a multiparameter family. The standard approx- points of the interval for 0. They also tend to be more
imate intervals based on maximum likelihood theory, 6 t &z(r), can be
accurate than the standard intervals. In the situation of
quite misleading. In practice, tricks based on transformations, bias cor-
Table 1 the bootstrap intervals agree with the exact inter-
rections, and so forth, are often used to improve their accuracy. The
bootstrap confidence intervals discussed in this article automatically in- vals to three decimal places. Efron (1985) showed that this
corporate such tricks without requiring the statistician to think them is no accident; there is a wide class of problems for which
through for each new application, at the price of a considerable increase the bootstrap intervals are an order of magnitude more
in computational effort. The new intervals incorporate an improvement
accurate than the standard intervals.
over previously suggested methods, which results in second-order cor-
In those problems where exact confidence limits exist
rectness in a wide variety of problems. In addition to parametric families,
bootstrap intervals are also developed for nonparametric situations. the endpoints are typically of the form

KEY WORDS: Resampling methods; Approximate confidence inter- 0 + vr(zf') + A(')IVn + B001n + ), (1.2)
vals; Transformations; Nonparametric intervals; Second-order theory;
Skewness corrections. where n is the sample size (see Efron 1985). The standard
intervals (1.1) are first-order correct in the sense that the
1. INTRODUCTION term 0 + &z(a) asymptotically dominates (1.2). However,
the second-order term A can have a major effect
This article concerns setting approximate confidence in-
in small-sample situations. It is this term that causes the
tervals for a real-valued parameter 0 in a multiparameter
asymmetry of the exact intervals about the MLE as illus-
family. The nonparametric case, where the number of
trated in Table 1. As a point of comparison the Student-t
nuisance parameters is infinite, is also considered. The
effect is of third-order magnitude, comparable with
word "approximate" is important, because in only a few
special situations can exact confidence intervals be con-
&B(/)ln in (1.2). The bootstrap method described in Efron
(1985) was shown to be second-order correct in a certain
structed. Table 1 shows one such situation: the data (yl,
class of problems, automatically producing intervals of cor-
Y2) are bivariate normal with unknown mean vector (qj,
t2), covariance matrix = I the identity; the parameters rect
of second-order asymptotic form 0 + (Z(a) + Ana/
interest are 0 = q2'li1 and, in addition, 4 = 1/0. Fieller's + .).
n
This article describes an improved bootstrap method
construction (1954) gives central 90% interval (5% error
that is second-order correct in a wider class of problems.
in each tail) of [.29, .76] for 0, having observed y = (8,
This wider class includes all of the familiar parametric
4). The corresponding interval for 4 = 1/0 is the obvious
examples where there are no nuisance parameters and
mapping 4 E [1/.76, 1/.29].
where the data have been reduced to a one-dimensional
Table 1 also shows the standard approximate intervals
summary statistic, with asymptotic properties of the usual
0 e [0 + AOZ(a), + &z(1a), (1.1) MLE form (see Sec. 5).
Several authors have developed higher-order correct ap-
where 0 is the maximum likelihood estimate (MLE) of 0,
proximate confidence intervals based on Edgeworth ex-
a is an estimate of its standard deviation, often based on
pansions (Abramovitch and Singh 1985; Beran 1984a,b;
derivatives of the log-likelihood function, and z(a) is the
Hall 1983; Withers 1983), sometimes using bootstrap
100 * a percentile point of a standard normal variate. In
methods to reduce the theoretical computations. There is
Table 1, a = .05 and z(a) = - z(l-a) = - 1. 645.
a close theoretical relationship between this line of work
The standard intervals (1.1) are extremely useful in sta-
and the current article (see, e.g., Remark G, Sec. 11).
tistical practice because they can be applied in an auto-
However, the details of the various methods are consid-
matic way to almost any parametric situation. However,
erably different, and they can give quite different numer-
they can be far from perfect, as the results for 4 show.
ical results. An important point, which will probably have
Not only is the standard interval for 4 quite different from
to be settled by extensive simulations, is which method,
the exact interval, it is not even the obvious transformation
if any, handles best the practical problems of day-to-day
[1/.73, 1/.27] of the standard interval for 0.
applied statistics.
Approximate confidence intervals based on bootstrap
computations were introduced by Efron (1981, 1982a). 2. OVERVIEW
Like the standard intervals, these can be applied auto-
matically to almost any situation, though at greater com- The standard interval (1.1) is based on taking literally
putational expense than (1.1). Unlike (1.1), the bootstrap the asymptotic normal approximation
intervals transform correctly, so the interval for 4 = 1/0 (0 - 0)I- N(O, 1), (2.1)

* Bradley Efron is Professor of Statistics and Biostatistics, Department


of Statistics, Stanford University, Stanford, CA 94305. The author is ? 1987 American Statistical Association
grateful to Robert Tibshirani, Timothy Hesterberg, and John Tukey for Journal of the American Statistical Association
several useful discussions, suggestions, and references. March 1987, Voi. 82, No. 397, Theory and Methods

171

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
172 Journal of the American Statistical Association, March 1987

on a family
Table 1. Central 90% Confidence (discussed in Sec.
Intervals for3) for 0
which
= (2.2) fails. Theand
q2/q1l
1/0, Having Observed (y,, Y2) = (8, 4) From a main purpose of this article, to produce automatically in-
Bivariate Normal Distribution y - N2(vp, 1) tervals that are second-order correct, generally requires
assumption (2.3) rather than (2.2).
For 0 (RIL) For 4 (RIL)
It is not surprising that a theory based on (2.3) is usually
Exact interval (also bootstrap) [.29, .76] (1.21) [1.32, 3.50] (2.20) more accurate than a theory based on (2.1). In fact, ap-
Standard approximation (1.1) [.27, .73] (1.00) [1.08, 2.92] (1.00)
plied statisticians make frequent use of devices like those
MLE 0= .5 2
in (2.3), transformations, bias corrections, and even ac-
NOTE: The exact celeration adjustments, to are
intervals improve thebased
performance of the
on F
of the interval, measured from the MLE, to the left side. The exact intervals are markedly
asymmetric. The approximate bootstrap intervals of Efron (1 982a) agree with the exact intervals
standard intervals. The advantage of the BCa method is
to three decimal places in this case. that it automates these improvements, so the statistician
does not have to think them through anew for each new
application.
with the estimated standard error a considered to be a The bootstrap was originally introduced as a nonpara-
fixed constant. In certain cases it is well known that both metric Monte Carlo device for estimating standard errors.
convergence to normality and constancy of a can be dra- The basic idea, however, can be applied to any statistical
matically improved by considering instead of 0 and 0 a problem, including parametric ones, and does not neces-
monotone tranformation 0 = g(0) and 4 = g(0). The sarily require Monte Carlo simulations. We will begin our
classic example is that of the correlation coefficient from discussion of the BCa method by considering the simplest
a bivariate normal sample, for which Fisher's inverse hy- type of parametric problem: where the data consists only
perbolic tangent transformation works beautifully (see Ef- of a single real-valued statistic 0 in a one-parameter family
ron 1982b). of densities fo(0), say 0 - fo, and where we want a con-
The bias-corrected bootstrap intervals previously intro- fidence interval for 0 based on 0.
duced by Efron (1981, 1982a), called the BC intervals, Sections 3, 4, and 5 describe the BCa method in this
assume that normality and constant standard error can be simple setting, show how to calculate it from bootstrap
achieved by some transformation 4 = g(0), 4= g(0), say computations, and demonstrate that it gives second-order
correct intervals for 0 under reasonable conditions.
- )/t - N(-zo, 1), (2.2)
Of course there is no need for the bootstrap in the simple
t being the constant standard error situation
of 4. 0 - Allowing
fo, since then itthe
is usually
bias quite easy to
constant zo in (2.2) considerably improves the approxi- calculate exact confidence intervals for 0. There are three
mation in many cases, including that of the normal cor- reasons for beginning the discussion with the simple sit-
relation coefficient. Taking (2.2) literally gives the obvious uation: (a) it makes clear the logic of the BCa method; (b)
confidence interval (4 + zzO) ? ZZ(a) for 4, which can then it makes possible the comparison of BCa intervals with
be converted back to a confidence interval for 0 by the exact intervals, exact intervals usually not existing in com-
inverse transformation 0 = g-1(4). The advantage of the plicated problems; (c) it then turns out to be quite easy
BC method is that all of this is done automatically from to extend the BCa method to complicated situations, where
bootstrap calculations, without requiring the statistician to it is more likely to be needed.
know the correct transformation g. The simple situation 0 - fo can be made more compli-
The improved bootstrap method introduced in this ar- cated, and more realistic, in two ways: the data can consist
ticle, called BC, makes one further generalization on (2.1): of a vector y instead of a single summary statistic 0, and
it is assumed that for some monotone transformation g,
the parameter can be a vector y instead of a single un-
known scalar 0. Section 6 considers multiparameter fam-
some bias constant zo, and some "acceleration constant"
a, the transformation 4 g(O), 4 = g(0) results in ilies f,,(y), where we wish to set an approximate confi-
dence interval for a real-valued function 0 = t(nv).-
(4) - 4)/z - N(-zoo+, CT+), ao = 1 + a+. (2.3) Our approach is to reduce the problem back to the
Notice that (2.2) is the special case of (2.3), with a = 0. simple situation. The data vector y is replaced by an ef-
Given (2.3), it is not difficult to find the correct confi- ficient estimator 0 of 0, perhaps the MLE, and the mul-
dence interval for 4 and then convert it back to an interval tiparameter family fo, is replaced by a least favorable one-
for 0 by 0 = g-1(4). The BCa method produces this in- parameter family. All of the calculations are handled au-
terval for 0 automatically, without requiring any knowl- tomatically by the BCa algorithm. For a class of examples,
edge of the transformation to form (2.3). This is the gist including the Fieller problem of Table 1, the BCa method
of Lemma 1 in Section 3. automatically produces second-order correct intervals, but
The difference between (2.2) and (2.3) is greater than a proof of general second-order correctness does not yet
it seems. The hypothesized ideal transformation g leading exist for multiparameter situations.
to (2.2) must be both normalizing and variance stabilizing, Section 7 returns to the original nonparametric setting
whereas in (2.3) g need be only normalizing. Efron (1982b) of the bootstrap: the data y is assumed to be a random
shows that normalization and stabilization are partially sample xl, x2,. . . ., x,, from a completely unknown prob-
antagonistic goals in familiar families such as the Poisson ability distribution F. We wish to set an approximate con-
and the binomial. Schenker's counterexample to the BC fidence interval for 0 = t(F), some real-valued function
method (1985), which helped motivate this article, is based of F. The BCa method extends in a natural way to the

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Efron: Better Bootstrap Confidence Intervals 173

nonparametric setting. In the case where 0 is the expec- Having observed 0, the bootstrap distribution 0* 0
tation, theoretical analysis shows the BCa intervals per- (XI9/l9) has density c(A*IJ)85e-95(o*/o) for Q* > 0.
forming reasonably. Except for the case of the expecta- bootstrap cdf is G(s) = I9.5(9.5sl0), where I9.5 ind
tion, not much is proved about nonparametric BCa intervals, the incomplete gamma function of degree 9.5.
though the empirical results look promising. Section 8 de- Now suppose that for a family 0 - f there exists a
velops a heuristic justification for the nonparametric BCa monotone-increasing transformation g and constants zo a
method in terms of the geometry of multinomial sampling. a such that
In the simple situation 0 - fo the parametric boot-
strap distribution d* fO can often be written down ex- g( ), = g(0) (3.5)
plicitly, or at least approximated by standard parametric satisfy
devices such as Edgeworth expansions. The number of
bootstrap replications of 0*, to use the terminology of + = + + (Z - zo), Z - N(O, 1) (3.6)
previous papers, is then, effectively, infinity. For more with
complicated situations like the nonparametric confidence
C = 1 + a+/. (3.7)
interval problem, Monte Carlo sampling is usually needed
to calculate the BCa intervals. How many bootstrap rep- This is of form (2.3), with T = 1. [Eq. (2.3) can always
lications are necessary? The answer, on the order of 1,000, be reduced to the case z = 1; see Remark A, Sec. 11.]
is derived in Section 9. This compares with only about 100 We will assume that + > - 1/a if a > 0 in (3.7), so o1 >
bootstrap replications necessary to adequately calculate a 0, and likewise 0 < -1/a if a < 0. The constant a will
bootstrap standard error. Bootstrap confidence intervals typically be in the range tat < .2, as will zo.
require a lot more computation than bootstrap standard Let (D denote the standard normal cdf, and let G-1(a)
errors, if second-order accuracy is desired. denote the 100 a a percentile of the bootstrap cdf (3.2).
To get the main ideas across, some important technical
Lemma 1. Under conditions (3.5)-(3.7), the correct
points are deferred until Sections 10-12.
central confidence interval of level 1-2a for 0 is

3. BOOTSTRAP CONFIDENCE INTERVALS FOR Q E [G-t(4>(z[a])), GcT'Q(z[1 - a]))], (3.8)


SIMPLE PARAMETRIC SITUATIONS
where
We first consider the simple situation 0 - fo, where we
have a one-parameter family of densities fo( 0) for the real-
valued statistic 0. We wish to set a confidence interval for z[a] = Z + 1 - a(zo + Z(a)) (3.9)
0 having observed only 0. The statistic 0 estimates 0. Later and likewise for z[1 - a].
we will make specific assumptions about the properties of The proof of Lemma 1, at the end of this section, makes
0 as an estimator of 0-essentially that 0 behaves like the clear that interval (3.8) is correct in a strong sense: it is
MLE asymptotically, though 0 may be some first-order equivalent, under assumptions (3.5)-(3.7), to the usual
efficient estimator other than the MLE. obvious interval for a simple translation problem. Given
By definition, the parametric bootstrap distribution in the bootstrap cdf G(s) and values of zo and a derived from
this simple situation is bootstrap calculations as in the following sections, we can
form interval (3.8), (3.9) for 0 whether or not assumptions
0* f (3.1)
(3.5)-(3.7) apply. This by definition is the BCa interval
In other words it is the distribution of the statistic of in- for 0.
terest when the unknown parameter 0 is set equal to the If zo and a equal 0, then z[a] = Z(a) and (3.8) becomes
observed point estimate 0. We also need to define the cdf 0 E [G-L(a), G-_(1 - a)]. In this case we just use the ob-
of the bootstrap distribution vious percentiles of the bootstrap distribution to form
an approximate confidence interval for 0, an approach
G(s) = j fo(0*)dO* = Pro{0* < s}. (3.2) called the percentile method in Efron (1981, 1982a). In
00
general zo and a do not equal zero, and formulas (3.8),
(3.9) make adjustments to the percentile method that are
The integral is replaced by a summation in discrete fam-
necessary to achieve second-order correctness.
ilies. The goal of bootstrap theory is to make inferential
Continuing example (3.3), the theory of Efron (1982b)
statements on the basis of the bootstrap distribution. In
shows that for the chi-squared scale family we can find a
this article the inferences are approximate confidence in-
tervals for 0.
transformation g very nearly satisfying (3.5)-(3.7). Schenker
(1985) proved the same result by a different method. The
Example (chi-squared scale family). Suppose that constants

0 0( /l9), (3.3) zn = .1082, a = .1077 (3.10)


the example considered in Schenker (1985). Then and the transformation g appropriate to family (3.3) are
derived in Section 10 and Remark E of Section 11. Sim,ple
f,o(O) = cQiIO)85e-95(o/o) for 0 > 0
ways of approximating z0 and a for general families 0
[c - 9,595 IF(9.5)]. (3.4) fo are given in Section 4.

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
174 Journal of the American Statistical Association, March 1987

Line 2 of Table 2 shows the central 90% BCa interval, The translation problem (3.12) gives a natural central
a = .05, for family (3.3), with G(s) = I9.5(9.5s/0) and zo 1 - 2a interval for C having observed C,
and a as in (3.10). The exact confidence interval is 0 E
Z E - W(l -) 4 - w(a)] (3.13)
[190/X21 a), l9O/X04a)], where x240 is the 100 a a percentile
point of a X29 distribution. Notice how closely the BCa where w(a) is the 100 a a percentile point for W, Pr{W <
endpoints match those of the exact interval (see line 1). W(a)} = a.
The standard interval (1.1) is quite inaccurate in this case. We will use the notation 0[a] for the a-level endpoint
Suppose that we set a = 0 in (3.9), so z[a] = 2zo + of a confidence interval for a parameter 0. For instance
Z(a). Interval (3.8) with this definition of z[a] and z[1 - (3.13) says that C[a] = - W(1-a), C[1- a] = - W(a).
a] is called the BC interval, short for bias-corrected boot- The interval (3.13) can be transformed back to the 4 scale
strap interval, in Efron (1981, 1982a). In other words, BC by the inverse mappings 4 = (ec - 1)/a, 4 = (eS - 1)/
= BCa, with a = 0. The constant zo is easier to obtain a, (Z - zo) = (eW - 1) /a. A little algebraic manipulation
than the constant a, as discussed in the next section, which shows that the resulting interval for 4 has a-level endpoint
is why the BC interval might be used. Line 3 of Table 2
+A (Z 0 + Z(a))
shows that for family (3.3) the BC interval is a definite
4)[a] = - + a(z 1 (Z + Z(a)) (3.14)
improvement over the standard interval but goes only about
half as far as it should toward achieving the asymmetry of
The cdf of 4 according to (3.6) is F((s - 0)/ho + z0),
the exact interval. so the bootstrap cdf of 0*, say H, is H(s) = F((s - 0/
The Fieller situation of Table 1 is an example of a class UT + zo). This has inverse H-1(a) = + + f({P-1(a) -
of multiparameter problems for which a = 0, so the BC zo}, which shows that HAP-(D(z[a])) equals (3.14) [see def-
and BCa intervals coincide. Efron (1985) showed that the inition (3.9)]. In other words, the BCa interval for 0, based
BC intervals are second-order correct for this class, as on 0, coincides with the correct interval (3.14), "correct"
discussed in Section 6. In general problems the full BCa meaning in agreement with the translation interval (3.13).
method is necessary to get second-order correctness, as The BCa intervals transform in the obvious way: if 4 =
shown in Section 5.
g(40), 0 = g(0), then the BCa interval endpoints satisfy
Bartlett (1953) and Schenker (1985) discussed problem 0 [a] = g(0[a]). This follows directly from (3.8), (3.9) and
(3.3). The BCa method can be thought of as a computer- the relationship H(g(s)) = G(s), equivalently H-1(a) -
based way to carry out Bartlett's program of improved
g(G-'(a)), between the two bootstrap cdf's. Lemma 1 has
approximate confidence intervals without having to do his
now been verified: the transformations 0 > - C and
theoretical calculations.
4) -* C reduce the problem to translation form (3.12);
Proof of Lemma 1. We begin by showing that the BCa the inverse transformations of the natural interval (3.13)
interval for 4 based on 4 is correct in a certain obvious
for C produce the BCa interval (3.8), (3.9).
sense: notice that (3.6), (3.7) give
4. THE TWO CONSTANTS
{1 + ao} = {1 + a4{1 + a(Z - zo)}. (3.11)
The BCa intervals require the statistician to calculate the
Taking logarithms puts the problem into standard trans-
bootstrap distribution G and also the two constants zo and
lation form, a. The bootstrap distribution is obtained directly from (3.2).
g = C + W, (3.12) This calculation does not require knowledge of the nor-
malizing transformation g occurring in (3.5). The two con-
C = log{1 + a4}, C = log{1 + a+}, and W = log{1 + stants zo and a can also be obtained, or at least approxi-
a(Z - zo)}. This example was discussed more carefully in
mated, directly from the bootstrap distribution f4(0*).
Sections 4 and 8 of Efron (1982b), where the possibility
These calculations, which are the subject of this section,
of the bracketed terms in (3.11) being negative was dealt
assume that a transformation g to form (3.6), (3.7) exists,
with. Here it will cause no trouble to assume them positive
but do not require g to be known.
so that it is permissible to take logarithms. In fact the
In fact the bias-correction constant zo is
transformation to (3.12) is only for motivational purposes.
A quicker but less informative proof of Lemma 1 is pos- G 4)lG0) (4.1)
sible, working directly on the 0 scale.
under assumptions (3.5)-(3.7), and so can be computed
directly from G. To verify (4.1) notice that
Table 2. Central 90% Confidence Intervals for 0 Having Observed
0X 02,119 Pro{ < 4} = Pr{Z < zo} = PD(zo) (4.2)
according to (3.6). However, (3.5) gives
RIL

1. Exact [.63150,1.880] 2.38 Pr{O{ < 0} = PrI{A < 4} = )(zo) (4.3)


2. BCa (a = .1077) [.6300, 1.880] 2.37
3. BC (a = 0) [.580r , 1.690] 1.64 for every value of 0. Substituting 0 = 0 gives G(6) -
4. Standard (1.1) [.4660, 1.530] 1.00 Pro{O* < 6} = P>(zo), which is (4.1).
5. Nonparametric BC. [.6406, 1.680] 1.88 What about the acceleration constant a? We will show
NOTE: The BCa interval, with a = .1077, the correct value, is nearly identical to the exact that a good approximation for a is
interval. The BC interval, a = 0, is only a partial improvement over the standard interval. The
nonparametric BCa interval is discussed in Section 7. a - 6 SKEW0o(10), (4.4)

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Efron: Better Bootstrap Confidence Intervals 175

where SKEW9o=(X) indicates


(4.4). Thethe
discussionskewness
is fairly technical andof
can be a random
deferred
variable X, ,u3(X)/,u2(X)312, evaluated at parameter point until Section 10 at the reader's preference.
0 equal to 0, and lo is the score function of the family If we make smooth one-to-one transformations 4 -
fo(o), g(O), 4 = h(6), then l0(o) = o(0Q)lh'(0) and SKEW
0)= a/a log fo(0). (4.5) (l0) = SKEW(lo). In other words, the right side of (4.4
is invariant under all mappings of this type. Suppose that
Formula (4.4) allows us to calculate a from the form of
for some choice of g and h, we can represent the family
the given density fo near 0 = 0, without knowing g. Sec-
of distributions of 4 as
tions 6 and 7 discuss the computation of a in families with
nuisance parameters. Section 10 gives a deeper discussion 4 = 4 + crq(Z), Z - N(0, 1), (4.8)
of a and its relationship to other quantities of interest. See
also Remark F, Section 11. where oT and q(Z) are functions of 4 and z, having at
least one and two derivatives, respectively, q'(Z) > 0.
Example. For 0 - 6(Z29/19), as in Table 2, standard
Situation (4.8), with the added conditions q(O) = 0, q'(0)
x2 calculations give SKEW(i 0) /6 = [8/ (19 * 36)11/2 = .1081,
= 1, is called a general scaled transformation family (GSTF)
which is quite close to the actual value a = .1077 derived in Efron (1982b). [Please note the corrigenda to Efron
in Section 10.
(1982b).]
Here is a simple heuristic argument that indicates the
role of the constant a in setting approximate confidence Lemma 2. The family (4.8) has score function lo(q)
intervals. Suppose that zo = 0 and a > 0 in (3.6), (3.7). satisfying
Having observed 4 = 0, and noticing cT = 1, the naive
interval for 4 [which is almost the same as the standard Ci(A) [Z q'(Z)-
+ q`(Z) 1+ q(Z)]
q'(Z)
interval (1.1)] is 4 E [z(a), Z(1-a)]. If, however, the stat-
istician checks the situation at the right endpoint z(1 -a), he Z - N(0, 1). (4.9)
finds that the hypothesized standard deviation of 4 has
increased from 1 to 1 + az(1-a). This suggests increasing Here &O,, = dr,ld4 and q' and q" are the first tw
atives of q.
the right endpoint to z(1 - a)(1 + az(l - a)). Now the hypoth-
esized standard deviation has further increased to 1 + Before presenting the proof of Lemma 2, we note that
az(l - a)(1 + az(l - a)), suggesting a still larger right endpoint,
it verifies (4.4): in situation (3.6), (3.7), where &$ = a,
and so forth. Continuing on in this way results in formula q'(Z) = 1, q"(Z) = 0, the distributional relationship (4.9)
(3.14), leading to Lemma 1. [Improving the standard in- becomes
terval (1.1) by recomputing a at its endpoints is a useful
idea. It was brought to my attention by John Tukey, who
(1 - azo)[Z + 1 a (Z2 1)].
pointed out its use by Bartlett (1953); see, e.g., Bartlett's
eq. (17). Tukey's (1949) unpublished talk anticipated many (4.10)
of the same points.]
We call a the acceleration constant because of its effect Let

of constantly changing the natural units of measurement a


as we move along the 4 (or 0) axis. Notice that we can = a (4.11)
write (3.7) as
a quantity discussed in Section 10. From the moments of
x = xo [1 + a(4 - Oo)/u,0], (4.6) Z - N(0, 1), (4.10) gives
so
SKEW(/~) 1 + 4,6
6 (1 + 2E)3/2 (4.12)

d((4 l - o)If+) We will see in Section 10 that for the usual repeated
sampling situation both a and zo are order of magnitude
for any fixed value of 00. This shows that a is the relative
O(n-1/2) in the sample size n. This means that eo = a [1
change in oT, per unit standard deviation change in 4), no
matter what value 4) has. + O(n-1)], (4.11), and that SKEW(Io)/6 = SKEW(Yo)/
6 = a[l + O(n-1)], (4.12), justifying approximation (4.4).
The point 00 = 0 is favored in definition (3.7), since co
The "constant" a actually depends on 0, but substituting
has been set equal to the convenient value 1. There is no
0 = 0 in (4.4) causes errors only at the third-order level,
harm in thinking of 0 as the true value of 4), the value
actually governing the distribution of 4) in (3.8), because like &Bna)/n in (1.2), and so does not affect the second-
order properties of the BCa intervals.
in theory we can always choose the transformation g so
that this is the case and, in addition, so that no = 1 (see Proof of Lemma 2. Starting from (4.8), the cdf of
Remark A, Sec. 11). The restriction 1 + a4 > 0 in (3.7)
+ is ?(q1((+ - 4))/tTk)), 50 4) has density f?,(4) -
causes no practical trouble for lal c .2, since it is thenex(-2
at ,I x,q(,) where Z4, = q-l((, - 4,)!
least 5 standard deviations to the boundary of the per-
Qk,x). This gives log-likelihood function
missible 4) region.
The remainder of this section is devoted to verifying l1+(q) = -2X- log(q'(Z+)) - log(o+). (4.13)

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
176 Journal of the American Statistical Association, March 1987

Lemma 2 follows by differentiating (4.13) with respect to k and q. Then the following orders of magnitude apply:
4 and noting that Z, - N(O, 1) when sampling from (4.8).
0(1) 0(n-112) 0(n-1) O(n -3/2)
CO (To, 13o, yO 6(T, to 5, yO I03, Yo,
5. SECOND-ORDER CORRECTNESS OF A

THE BC0 INTERVALS Theorem 1. If 0 has bias Po, standard error ao, skewne
yO, and kurtosis 60 satisfying (5.3), then the BCa intervals
The standard intervals are based on approximation (2.1). are second-order correct.
The BCa intervals, which improved considerably on the The theorem states that OBca[a], the a endpoint of the
standard intervals in Tables 1 and 2, are based on the more BCa interval, is asymptotically close to the exact endpoint,
general approximation (2.3). Is it possible to go beyond
(2.3), to find still further improvements over the standard (OBCa[a - OEx[a]) / = O(n 1). (5.4)
intervals? The answer is no, at least not in terms of second-
This is not true for the standard intervals (1.1) or the BC
order asymptotics. The theorem of this section states that intervals, a = 0. The proof of Theorem 1, which appears
for simple one-parameter problems the BCa intervals co- in Section 12, makes it clear that all three of the elements
incide through second order with the exact intervals. In in (2.3), the transformation g, the bias-correction constant
terms of (1.2), the BCa intervals have the correct second- zo, and the acceleration constant a, make necessary cor-
order asymptotic form 0 + (Z(a) + A +n*+). rections of Op(n -1/2) to the standard intervals.
We continue to consider the simple one-parameter prob-
lem 0 - fo. Suppose that the 100 ca percentile of 0 as a
6. NUISANCE PARAMETERS
function of 0, say 00, is a continuously increasing function
of 0 for any fixed a. In this case the usual confidence The discussion so far has centered on the simple case
interval construction gives an exact 1 - 2a central interval 0 fo, where we have only a real-valued parameter 0 and
for 0 having observed 0, say [OEX[a], OEXII - a]], where a real-valued summary statistic 0 from which we are trying
OEX[a] is the value of 0 satisfying 0ka) = 0. The exactto construct a confidence interval for 0. We have been able
interval in Table 2 is an example of this construction. to show favorable properties of the BCa intervals for the
It is not necessary that 0 be the MLE of 0. In (3.6), for simple case, but of course the simple case is where we
instance, 0 is not the MLE of k. The BCa method is quite least need a general method like the bootstrap.
insensitive to small changes in the form of the estimator This section discusses the more difficult situation where
(see Remark B, Sec. 11). It will be assumed, however, there are nuisance parameters besides the parameter of
that 0 behaves asymptotically like the MLE in terms of interest 0. Section 7 discusses the nonparametric situation,
the orders of magnitude of its bias, standard deviation, where the number of nuisance parameters is effectively
skewness, and kurtosis, infinite. Because of the inherently simple nature of the
bootstrap it will be easy to extend the BCa method to cover
0 - 0 - (B0ln, Co/ V-, DO/V-, Eoln). (5.1)
these cases, though we will not be able to provide as strong
a justification for the correctness of the resulting intervals.
Here n is the sample size upon which the summary statistic
Suppose then that the data y comes from a parametric
0 is based; B0? C0, D0, and Eo are bounded functions of 0
family 3 of density functions f,, say y - f,, where -1 is an
(and of n, which is suppressed in the notation). Then (5.1)
unknown vector of parameters, and we want a confidence
says that the bias of 0, B0/n, is O(n-1), the standard de-
interval for the real-valued parameter 0 = t(q). In Efron
viation COlv? is O(n -12), skewness O(n-1/2), and kurtosis
(1985), the multivariate normal case y Nk(Q, I) is ex-
O(n-1). Higher cumulants, which are typically of order amined in detail.
smaller than O(n -1), will be assumed negligible in proving
From y we obtain iq, the MLE of q, and 0 = t(i), the
the results that follow (see DiCiccio 1984; Hougaard 1982).
MLE of 0. The parametric bootstrap distribution of y is
In the simple situation 0 - fg, 0 is a sufficient statisticdefined to be
for 0. Later when we consider more complicated problems
we will take 0 to be the MLE of 0. This guarantees that
A
y*~fn, (6.1)
0 is first-order efficient and asymptotically sufficient (Ef-
the distribution of the data when q equals M. From y*
ron 1975).
we obtain A, the bootstrap MLE of -q, and then 6* =
The asymptotics of this article are stated relative to the
t( A)
size of the estimated standard error a of 0, as in (1.2). It
The distribution of 0* under model (6.1) is the para-
is often convenient in what follows to have a be OP(l). metric boostrap distribution of 0, generalizing (3.1). This
This is easy to accomplish by transforming to 4 =
gives the bootstrap cdf
0 S, Vn-- , so (5.1) becomes
G(s) = Pr,{0* < s}, (6.2)
' - 4' (/1, o7, y+, ~b), (5.2)
as in (3.2). The bias-correction constant zo equals
where ,B, = B/,n1I2n112, oT, = Cn,2, y.f = D/,n1I2n112, and A>1 (),as in (4.1)
,= E,n,,/n. Notice that,B, = O(n-112), /Jg,-d/3,g/d4' = To compute the BCa intervals (3.8), (3.9), we also need
O(n~1), and so forth. We can just assume to begin with to know the appropriate value of the acceleration constant
that 09 and 0 are the rescaled quantities previously called
a. We will find a by following Stein's (1956) construction,

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Efron: Better Bootstrap Confidence Intervals 177

which replaces the multiparameter family 5: = {f,} by a Nk(, I), the case considered in Efron (1985), and gives
leastfavorable one-parameter family 5:. a = 0, which is why the unaccelerated BC intervals worked
Let / be the vector with ith coordinate alaqi log f(y), well there.
so i,,(y) = 0 by definition of the MLE A1, and let 1,, be Table 3 relates to the following example:
the k x k matrix with ijth entry a2/(aIdaI) log f.(y)I,=,,
In addition, let V be the gradient vector of 0 = tQr) y N4( , C' I), [la, = 1 + a(llqll - 8)], (6.10)
evaluated at the MLE, Vi = (a/a,i)tQr)I11,. The least
where we observe y = (8, 0, 0, 0) and wish to set confi-
favorable direction at n = q is defined to be
dence intervals for the parameter 0 = t(,q) = 1111. The
p (j)1~~~~~~ (6.3) case a = 0 amounts to finding a confidence interval for
the noncentrality parameter of a noncentral x2 distribution
Then the least favorable family 5: is the one-parameter
and can be solved exactly. The theory of Efron (1985)
subfamily of 5: passing through q in the direction F,
applies to the a = 0 case, and we see that the BCo interval,
UAW{f,1(Y*) f +A1i(Y*)}* (6.4) that is, the BC interval, well-matches the exact interval.
Table 3 shows the result of varying the constant a from
Using y* to denote a hypothetical data vector from fA is
.10 to -.10. This example has a particularly simple ge-
intended to avoid confusion with the actual data vector y
that gave i1; n and FL are fixed in (6.4), only A being un- ometry: the sphere Co = {Jr: 11[1 = 0} is the set of
known. vectors having t(,Q) equal to the MLE value 0 = t(n); th
least favorable direction ,i is orthogonal to C6 at i; the
Consider the problem of estimating 0(i) t(iA + Ai )
distribution of 0 is nearly normal (see Efron 1985, Table
having observed y* fA. The Fisher information bound
2), with standard deviation changing in the least favorable
for an unbiased estimate of 0 in this one-parameter family
direction at a rate nearly equal to a, as in (4.7). The BCa
evaluated at A = 0 is V'(-1 )-1V, which is the same as
intervals alter predictably with a. For instance, comparing
the corresponding bound for estimating 0 = t(q), at 1 =
the upper endpoint at a = .10 with a = 0, notice that
ii, in the multiparameter family 5:. This is Stein's reason
for calling 5: least favorable. (9.70 - 8.00)/(9.44 - 8.00) = 1.18, closely matching the
expansion factor due to acceleration, 1 + .10 * 1.645 =
We will use 5: to calculate an approximate value for the
1.16.
acceleration constant a,
We could disguise problem (6.10) by making nonlinear
a {SKEWA=o[d log f.+Aj(y*)/Ia]/6}. (6.5) transformations

This is formula (4.4) applied to 5, assuming that A = 0


(which is the MLE of A in 5: when y* = y, the actual data
Y= g(y), q1 = h(q), (6.11)
vector). See Remark F, Section 11. in which case the geometry of the BCa intervals might not
Formula (6.5) is especially simple in the exponential be obvious from the form of the parameter 0 = t(h- 1(1))
family case where the densities f,(y) are of the form = IIh-1(ii)II and the transformed densities f,,(y). However,
fY) = enlL''Y-V(-q)]fo(y). (6.6) the BCa method is invariant under such transformations
(see Remark C, Sec. 11), so the statistician would auto-
The factor n in the exponent of (6.6) is not necessary, but get the same intervals as if he knew the nor-
matically
it is included to agree with the situation where the data malizing transformations y = g-l(y) q = h1a
consists of iid observations x1, x2, . . . , xn, each withCurrently we cannot justify the BCa method as being
density exp(-q'x - u(Q)), and y is the sufficient vector second-order correct in the multiparameter context of this
S= lxiIn. section, though it seems a likely conjecture that this is so.

Lemma 3. For the exponential family (6.6), formula We know that it is so in the one-parameter case (see Sec.
5) and in the restricted multiparameter case of Efron (1985),
(6.5) gives
where the BCa and BC methods coincide, and that the
1 (3 ) (6.7) BCa method makes a rather obvious correction to the BC
interval in the general multiparameter case.

where
Table 3. Central 90% Confidence Intervals for 0 = IIi61I, Having
Aj)0)- ai( + i, (6.8) Observed IInII = 8, From the Parametric Family y se N4(vj, cI),
Withe, =P1+am(ricFa - 8)

Proof. We have Exact (RIL) BCa (RIL) (6.5)

d log f-4+A(y*) - n,u'(y* - @(A)), (6.9)a = .10 [6.46, 9.69] (.96) [6.47, 9.70] (.97) .0984
a = .05 [6.32, 9.57] (.85) [6.34, 9.56] (.84) .0498
a = 0 [6.14,9.47] (.74) [6.19, 9.44] (.75) 0
a = -.05 [5.92, 9.38] (.65) [6.03, 9.35] (.66) - .0498
so SKEWl=0[(a log f,,+l.(y*))/d] equals the skewness of a = - .10 [5.62, 9.30] (.56) [5.89, 9.27] (.60) - .0984
jj/y* for y* -~ f,,. The fact that SKEW(,uVy*) equals
NOTE: The standard interval (1.1) is (6.36, 9.64] for all values
[@(3)(O)I~/(2)(O))312]I\/ - iS a standard exercise in expo- that (6.5) nearly equals the constant a in this case. The exact intervals are based on the
nential family theory. Note that Lemma 3 applies to y -~ noncentral x2 distribution.

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
178 Journal of the American Statistical Association, March 1987

7. THE NONPARAMETRIC CASE Table 4. The Law School Data and Values of the Empirical Influence
Function for the Correlation Coefficient p
This section concerns the nonparametric case where the
data y = (xl, x2, . . . , x,) consist of n iid observations ixi (LSAT, GPA) U, i (LSAT, GPA) U,
that may have come from any probability distribution F
1 (576, 3.39) -1.507 9 (651, 3.36) .310
on their common sample space XC.. There is a real-valued 2 (635, 3.30) .168 10 (605, 3.13) .004
parameter 0 - t(F) for which we desire an approximate 3 (558, 2.81) .273 11 (653, 3.12) -.526
4 (578, 3.03) .004 12 (575, 2.74) -.091
confidence interval. We will show how the BCa method
5 (666, 3.44) .525 13 (545, 2.76) .434
can be used to provide such an interval based on the ob- 6 (580, 3.07) -.049 14 (572, 2.88) .125
vious nonparametric estimate 0 = t(F). Here F is the 7 (555, 3.00) -.100 15 (594, 2.96) -.048
8 (661, 3.43) .477
empirical probability distribution of the sample, putting
mass 1/n on each observed value xi.
A bootstrap sample y* F consists in this case of an true correlation p. Table 4 also show
iid sample of size n from F, say y* = (x1, 4, .. ., 4). the statistic A, from which formula (7.3) produces a-
In other words, y* is a random sample of size n drawn -.0817. B = 100,000 bootstrap replications (about 100
with replacement from {x, x2, . .. , xnj. The bootstrap
times more than actually needed; see Sec. 9) gave G 0)
sample y* gives a bootstrap replicationof0 0* = t(F*) = .463, and so zo = - .0927. Using these values of a and
where F* puts mass 1/n on each x4. The bootstrap cdf
zo in (3.8), (3.9) resulted in the central 90% nonparametric
G(s) is the probability that a bootstrap replication is less BCa interval [.43, .92] for p. The usual bivariate normal
than s, interval, based on Fisher's tanh-1 transformation, is [.49,
G(s) = Prp{O* < }, (7.1) .90]. This is also the parametric BCa interval based on the
simple family Ap fp, where fp(p) is Fisher's density func-
as in (6.2) and (3.2). The bias-correction
tionconstant zo equals
for the correlation coefficient from bivariate normal
(p-'(G(()), as in (4.1). data. The standard interval (1.1), p ? 1.645a, using the
For most nonparametric problems the bootstrap cdf G
bootstrap estimate 'a = .133, is [.56, .99].
has to be determined by Monte Carlo sampling. Section
Formula (7.3) is invariant under monotone changes of
9 discusses how many Monte Carlo replications of 0* are
the parameter of interest. This results in the BCa intervals
necessary. Here we will continue to assume that G has
having correct transformation properties. Suppose, for ex-
been computed exactly-in effect, that we have taken an
ample, that we change parameters from p to 4 = g(p)
infinite number of bootstrap replications 6*.
tanh-1(p), with corresponding nonparametric estimate 4
At this point we could use G to form the BC interval for
= g(p). The central 90% BCa interval for 4 based on 4
0, but to obtain the BCa interval (3.8), (3.9) we also need is then the obvious transformation of the interval for 0
the value of the acceleration constant a. We will derive a
based on 0, [g(.43), g(.92)] = [.46, 1.59]. This compares
simple approximation for a, based on Lemma 3. It depends with Fisher's tanh-1 interval [g(.49), g(.90)] = [.54, 1.47]
on
and the standard interval 4 t 1.645a = [.49, 1.59]. The
standard interval is much more reasonable-looking on the
t((1 A)F + A5i) -t(t)
Ui=lim tanh-1 scale, as we might expect from Fisher's transfor-
mation theory. As commented before, a major advantage
i = 1, 2. . . , n, (7.2) of the BCa method is that the statistician need not know
the correct scale on which to work. In effect the method
the empirical influence function of 0 = t(ft). Here 6i is
effectively selects the best (most normal) scale and then
a point mass at xi, so Ui is the derivative of the esti-
transforms the interval back to the scale of interest.
mate 0 with respect to the mass on point xi. [Jaeckel's in-
finitesimal jackknife estimate of standard error for 0 is Example 2: The Mean. Suppose that F is a distribution
(>2Ui)112/n.] Definition (7.2) assumes that t(F) is smoothly
on the real line, and 0 = t(F) equals the expectation EFX.
defined for choices of F near Ft [see Efron 1982a, (6.3),The empirical influence function U = (xi - 4) so (7.3)
or Efron 1979, sec. 5]. Note that ElUi = 0. gives
The next section shows that Lemma 3, applied to a
family appropriate to the nonparametric situation, gives a = 6 (X, - y)3/[E (X, - y)2]3/2}
the following approximation for the constant a,
= (116\H)(43//2) = y/6V. (7.4)
F(1 T" n \3/21
Here ,u, = z (x - x)h/l, the hth sample central moment,
a [ .6 ) l (7.3) and A = fi3/2, the sample skewness. It turns out also
that z0 5J6V- in this case, by standard Edgeworth ar-
This is a convenient formula since the Ui can be evaluated
guments. Both a and z0 are typically of order n-112
easily by using finite differences in definition (7.2).
Because the sample mean is such a simple statistic, we
Example 1: The Law School Dat. Table 4 shows two can use Edgeworth methods to get asymptotic expressions
indexes of student excellence, LSAT and GPA, for each for the a-level endpoint of the BC,, interval:
of 15 American law schools (see Efron 1982a, sec. 2.2).
Ss3Ca[a] = x + vi{:(a) + (y/6\/)(2z() + 1) + Op(n-1)},
The Pearson correlation coefficient p3 between LSAT and
GPA equals .776; we want a confidence interval for the (7.5)

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Efron: Befter Bootstrap Confidence Intervals 179

a-(ft2In)'/2. This compares with A typical distribution supported on DC is

F(w) mass wi on xi, (8.1)


OBC[a] - X + (f{Z(a) + ( (Z + 1) +
(7.6) where w = (w1, w2, . . , w,) can be any
simplex ,,E = {w: wi > 0 Vi, I' wi = 1}.
for the BC interval, so the BCa intervals are shifted ap-
0 = t(F) is defined on ,,E by 0(w) = t(F(w)). The central
proximately (A/6v?)z(a)2 further right.
point of the simplex,
Johnson (1978) suggested modifying the usual t statistic
T = (X- - O)I& to TJ = T + (2/6\/?)(2T2 + 1) and then wO-1/n = (1/n, 1/n, . . . ,1/n), (8.2)
considering TJ to have a standard tn 1 distribution in order
to obtain confidence intervals for 0 = EFX. Efron (1981,
corresponds to F(wO) = F, the usual empirical distr
tion; 0(wO) = 0 = t(F), the nonparametric MLE of 0.
sec. 10) showed that this is much like using the bootstrap
The curved surface
distribution of T* = - -x)/a* as a pivotal quantity.
Interestingly enough, the Edgeworth expansion of 01[a], C0 = {w: 0(w) = 0(w0) = 0} (8.3)
the a endpoint of Johnson's interval, coincides with (7.5).
The BCa method makes a "t correction" in the case of 0 comprises those distributions F(w) having 0(w) = 0. The
= EFX, but it is not the familiar Student-t correction, vector Ui is orthogonal to CO at w?, as shown in Figure 1,
which operates at third order in (1.2), but rather a second- which follows from definition (7.2) of the empirical influ-
order correction, coming from the correlation between x ence function. U is essentially the gradient of 0(w) at w?
and ain nonnormal populations (see Remark D, Sec. 11). (see Efron 1982a, sec. 6.3).
I conjecture that the nonparametric BCa intervals will With w unknown, but DC = {xl, . .. , xn} considered
be second-order correct for any parameter 0. There is no fixed, one can imagine setting a confidence interval for
proof of this, a major difficulty being the definition of 0(w)iA on the basis of a hypothetical sample x*, x*,
second-order correctness in the nonparametric situation. Xn F(w). A sufficient statistic is the vector of proportions
Whether or not it is true, small-sample nonparametric con- Pi = #{x7 = xi}ln, say P = (P1, P2, ... , Pn), with
distribution
fidence intervals are far from well understood and, as em-
phasized in Schenker (1985), should be interpreted with P - multn(n, w)In, w E 6n (8.4)
some caution.
The notation here indicates n draws from an n-category
Example 3: The Variance. Suppose that DC is the real multinomial, having probability wi for category i. We sup-
line and 0 = var FX, the variance. Line 5 of Table 2 shows pose that we have observed P = w? in (8.4), that is, that
the result of applying the nonparametric BCa method to the hypothetical sample x*, . . ., x* equals the actual
data sets x1, x2, . . . , x20, which were actually iid samples sample x1, . . , xn.
from an N(0, 1) distribution. The number .640, for ex- Distributions (8.4) form an n-parameter exponential
ample, is the average of OBCa[.051/0 over 40 such datafamily sets, (6.6) with y = P, 'i = log(nwi) + c, and IQ() =
B = 4,000 bootstrap replications per data set. The upper log(El exp(qi)In). Here c can be any constant, since all
limit 1.68~ 0 is noticeably small. The reason is simple: the vectors q + cl correspond to the same probability vector
nonparametric bootstrap distribution of 0 * has a short w, namely wi = exp(i)/ In exp(ij).
upper tail compared with the parametric bootstrap distri- If one accepts the reduction of the original nonpara-
metric problem to (8.4), with observed value P = w?,
bution, which is a scaled Xy9 random variable. The results
of Beran (1984a), Bickel and Freedman (1981), and Singh then it is easy to carry through the least favorable family
(1981) show that the nonparametric bootstrap distribution calculations (6.3)-(6.5): (i) a = 0; (ii) A = U; (iii) fA is
is highly accurate asymptotically, but of course that is not the member of (7.4) corresponding to a + A =AU,
a guarantee of good small-sample behavior. Bootstrapping namely
from a smoothed version of FA as in Efron (1982a, sec n

5.3), alleviates the problem in this particular example. P* - mult(n, wA)/n, wi = exp(AUi) / exp(AUj);
j=l1

8. GEOMETRY OF THE NONPARAMETRIC CASE (8.5)

Formula (7.3), which allows us to apply the BCa method (iv) finally, formula (7.3) follows directly from Lemma 3,
nonparametrically, is based on a simple heuristic argu- by differentiating A(A) = log(En exp(A'Jj)In) (and re-
ment: instead of the actual sample-space DC of the data membering that I Ui = 0).
points xi, consider only distributions F supported on sC = Only step (ii) is not immediate, but it is a straightforward
{x1, x2, . . ., x,j, the observed data set. This is an n- consequence of definition (6.3) and standard properties of
category multinomial family, to which the results of Sec- the multinomial. It has already been noted that U is or-
tion 6 can be applied. Because the multinomial is an ex- thogonal to CO, so U is proportional to V in (6.3). However,
ponential family, Lemma 3 directly gives (7.3). = I - 11'/n, which has pseudo-inverse I. Thus , is
We will now examine this argument more carefully,proportional
with to U. Since (6.7), (6.8) produce the same
the help of a simple geometric representation. See Efron value of a if ,u is multiplied by any constant, this in effect
(1981, sec. 11) for further discussion of this approach to gives ,u = U.
nonparametric confidence intervals. An interesting case that provides some support for the

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
180 Journal of the American Statistical Association, March 1987

nonparametric BCa method is that where the sample space


is finite to begin with, say X = {1, 2, . . . , L}. A typ-
ical distribution on OX is f = (fi, . . . f), where ft =
Pr{xi = 1}. The observed sample proportions f = (f 1, f27
* fL), f #xi = lln, are sufficient, with distribu-
tion f -multL(n, f)/n. This is an L-parameter exponential
family, so the theory of Section 6 applies. It turns out that
Lemma 3 agrees with formula (7.3) in this case. Non-
parametric BCa intervals are the same as parametric BCa
intervals when OC is finite. See remarks G and H of Efron
(1979) for the first-order bootstrap asymptotics of finite 0~~~~
sample spaces.
Family (8.4) was used in Section 11 of Efron (1981) to Sn
motivate a method called nonparametric tilting, a nonpar-
ametric analog of the standard hypothesis-testing ap-
proach to confidence interval construction. The one-pa-
rameter tilting family, (11.12) of Efron (1981), is closely
related to the least favorable family 5 in Figure 1. Efron
(1981, table 5) considered samples of size n = 15 for the Figure 1. All probabfilty distributions supported on {xt, x2.x,}
one-sided exponential density f(x) = exp[ - (x + 1)] (x are represented as the simplex 6n. The central point w? corresponds
to the empirical distribution A. The curves indicate level surfaces of
> -1). Central 90% tilting intervals for 0 = EFX were
constant value of the parameter 0. In particular C- comprises those
constructed for each of 10 such samples, averaging [- .34, probability distributions having 0 equal to 6(wO) - 6, the MLE. The
.50]. The corresponding nonparametric BCa intervals av-least favorable family Spasses through w0 in the direction U, orthogonal
eraged [-.34, .52] and were quite similar to the tilting to 4i.
intervals on a sample-by-sample comparison. The non-
parametric BCa method is computationally simpler than 9. BOOTSTRAP SAMPLE SIZES
nonparametric tilting and seems likely to give similar re-
sults in most problems.
How many bootstrap replications of 0* need we take?
So far we have pretended that the number of replications
We end this section with a useful approximation formula
B = oo, but if Monte Carlo methods are necessary to obtain
for the bias-correction constant zo, developed jointly with
the bootstrap cdf G, then B must be finite, usually the
Timothy Hesterberg. In addition to (7.2) we need the
smaller the better. This section gives rough estimates of
second-order empirical influence function
how small B may be taken in practice. The results are
Vij = lim {[t((1 - 2A)F + A6i + A6) presented without proof, all being standard exercises in
error estimation (see, e.g., Kendall and Stuart 1958, chap.
-t((1 - A)F + A6) 10). They apply to any situation, parametric or nonpara-
metric, where G is obtained by Monte Carlo sampling.
-t((1 - A)F + A5j) + t(t)]/A2}. (8.6) First consider the easy problem of estimating the stan-
dard error of 0 via the bootstrap. The bootstrap estimate
Define zo1 (6) 1 U1/( 1 )312 [approximation (7.3)
based on B replications, 'CB = [ Ob=l( -
for a] and
(B - 1)]1/2, has conditional coefficient of variation (
dard deviation divided by expectation)
Z 02 3[Uv2 - tr Vl/(2nlJUJl), (8.7)
CV{&B I Y} [(S + 2)/4B]12 (9.1)

where V is the n x n matrix (Vij). where S is the kurtosis of the bootstrap distributi
The notation indicates that the observed data y is fixed in
Lemma 4. The bias-correction constant zo approxi-this calculation. As B -Q then (9.1) O-0 and cCB
mately equals the ideal bootstrap estimate of standard error.
Of course a itself will usually not estimate the true stan-
q> - 142 (D(ZO1)>(Dzo2)1} (8.8)
dard error a -SDo{0} perfectly. Let CV(&) be the coef-
For the law school data, Example 1 of Section 7, zo1 ficient
= of variation of , unconditional now, averaging over
the
- .0817 and z0 = - .0067, giving zo = - .0869 from (8.8) possible realizations of y [e.g., if n = 20, 0 = x
Xi i N(0, 1), then CV(a) -(1/40)112 = .16]. The unco
compared with zo = - .0927 ? .0039 from B = 100,000
bootstrap replications. ditional CV of &' iS then approximated by
The term zo0 relates to skewness in 5, and z02 is a geo-
metric term arising from the curvature of + at w0. It is C:V( A [CV2(A ES ++211/2
CV75 [Q71) 4B J(9.2)
analogous to formula (Al5) of Efron (1985). Lemma 4
will not be proved here but is important in the sample size Table 5 displays CV(&B) for various choices of B and
considerations of Section 9. CV(&), assuming that ?d = 0. For values of CV(&)2

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Efron: Better Bootstrap Confidence Intervals 181

for of
Table 5. Coefficient of Variation a > .50.
6B,Thisthe
gives larger CV's than (9.4):
Bootstrap Estimate of
Standard Error, as a Function of B, the Number of Bootstrap
Replications, and CV(6), the Limiting CV as B ->oo a : .75 .90 .95 .975 9 7
(9.6) x B1"2: 3.04 1.97 1.75 1.71 (9)
B->
Comparing (9.7) with (9.5) shows that we need B to be
CV(a) 25 50 100 200 00
about twice as large to get the same CV if zo is estimated
.25 .29 .27 .26 .25 .25 rather than calculated. Formula (8.8) can be very helpful!
.20 .24 .22 .21 .21 .20 Both (9.4) and (9.6) assume that the bootstrap cdf is
.15 .21 .18 .17 .16 .15
estimated by straightforward Monte Carlo sampling, as in
.10 .17 .14 .12 .11 .10
.05 .15 .11 .09 .07 .05 (9.3). M. V. Johns (personal communication) has devel-
0 .14 .10 .07 .05 0 oped importance sampling methods that greatly accelerate
NOTE: These data arethe estimation of
based G on
in some situations.
(9.2), assumin

10. ONE-PARAMETER FAMILIES


.10, typical in practice, there is little improvement past B
We return to the simple situation 0 - fo, where there
= 100. In fact, B as small as 25 gives reasonable results.
are no nuisance parameters and where we want a confi-
Now we return to bootstrap confidence intervals. In the
dence interval for the real-valued parameter 0 based on a
Monte Carlo situation the bootstrap cdf G must be esti-
real-valued summary statistic 0. This section gives a more
mated from bootstrap replications O1, 0*, . , O, say extensive discussion of the acceleration constant a, which
by
has played a basic role in our considerations. Three fa-

GB(S) = #{0 < S}lB. (9.3) miliar types of one-parameter families will be investigated:
exponential families, translation families, and transfor-
As B -> oo, then GB -- G, the ideal bootstrap cdf we have
mation families.
been using in the previous sections. Let OB[a] be the
Efron (1982b) level
considered the following question: for a
a endpoint of either the BC or BCa interval obtained from
given family 0 - fo, do there exist mappings + = g(0), +
GB(s) by substitution in (3.8), (3.9).
= h(0) such that 0 = 0 + uoq(Z), Z - N(O, 1), as in
The following formula for the conditional CV of OB[a]
(4.8)? This last form, a General Scaled Transformation
- 0 assumes that G is roughly normal and that zo and a Family (GSTF), generalizes the concept of the ideal nor-
are known, for example, from (8.8) and (6.5) or (7.3): malization, where q = q + Z. [We now add the conditions
q(O) = 0, q'(0) = 1, as in Efron (1982b).]

CV{0B[a] - 0 I Y} B1221' { } I (Za)2 (9.4)The question is answered in terms of the diagnostic func-
tion D(z, 0) a [y(0)/y(z)][F0( a))/F0(puo)]. Here p(z) is
the standard normal density (27r) - 1/2 exp( - z2/2); Fo is the
9(z) a exp(-2 Z2)/V2. Notice that since we condition
cdf Fo(s) = Pro{0 < s}; Fo(s) = (d/d0)Fo(s); a = 4?(z);
on y, the only random quantity on the left side of (9.4) is
J(a) is the 100 * a percentile of 0 given 0, 19(a) = Fj 1(a)
OB[a]. Formula (9.4) measures the variability in OB[a] -
and ,po is the median of 0 given 0, po = 0(5) - Fo1(.5).
0 due to taking only B bootstrap replications, rather than
It is shown that the form of ,, and q(z) in (4.8) can be
an infinite number.
inferred from D(z, 0), the main advantage being that D(z,
Here is a brief tabulation of (9.4) x B1/2:
0) is computed without knowledge of the normalizing
a : .75 .90 .95 .975 transformations g, h.
(9.4) x B1/2: 2.02 1.33 1.28 1.36 The connection of transformation family theory with the
acceleration constant a is the following: define
If B = 1,000, for instance, then CV{OB[.95] - 0 I
1.28/10001/2 = .040. Reducing B to 200 increases the con- ,eo (alI z)D(z7 O)lz=o. (lo. 1)
ditional CV to .091. This last figure may be too big. The
If q(z) in (4.8) is symmetrically distributed about zero, a
whole purpose of developing a theory better than (1.1) is
situation called a symmetric scaled transformation family
to capture second-order effects. As the examples have
(SSTF), then
indicated, these become interesting when the asymmetry
ratio R/L is larger than say, 1.25, or smaller than .80. In 80= d,gldo (10.2)
such borderline situations, an extra 9% error in each tail (see Efron 1982b, eq. 4.11). A more complicated rela-
due to inadequate bootstrap sampling may be unaccept- tionship holds for the GSTF case.
able. Notice that (10.2) is quite close to our original descrip-
If the bias-correction constant zo is estimated by Monte tion of "a" as the rate of change of standard deviation on
Carlo directly from zo = 1(DGB 0)), rather than from the normalized scale. As a matter of fact, we can transform
(8.8), then (3.6), (3.7) into an SSTF by considering the statistic
CV{B [aI -t 0 I Y}

1 fi1 2(1 - at) +(1 - )l/ 6 q=-?1ZO cJ ~+ 1 l0a (1+ a+),


Bl/z(a) LY(U)2 y(0)y(z(a)) P(OZ(a))2J (10.3)

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
182 Journal of the American Statistical Association, March 1987

instead of 0 itself. Then it is easy to show that lations being quite straightforward. Thus eg = y0/6[1 +
0(n-1)]. Since 1o(y) = n[y - A0], we have SKEWo(lo(y))
3= 0 + (1 + 80O)Z, 80 = al(1 - azo), (10.4) = SKEWo(y) = yy, verifying (10.5) for one-parameter
an SSTF with c,o = 1 + e0o, &,O = go for all exponential
0. [The families.
quantity ,c has the same definition in (10.4) as in (4.11).]
Example. If Y - Poisson(O), 0 = 15, then SKEW0
Example. For 0 0 OX19/19 as in Table 2, 80 = .1090
(1)/6 = 11(6 * 01/2) = .0430. For the continued version of
for all 0 [using Eq. (10.6)]. In addition, zo = ID-1 Pr{Xl9
the Poisson family used in Efron (1982b; note Corrigenda,
< 19} = .1082. The relationship a = so/(1 + cozo) ob- p. 1032), (alaz)D(z, O)Iz=o = .0425 for 0 = 15.
tained by solving for a in (10.4) gives a = .1077, the value
used in Table 2. This family is nearly in SSTF (see Remark Translation Families. Suppose that we observe a trans-
E, Sec. 11). lation family C = 4 + W, as in (3.12). Express W as a
We show below that under reasonable asymptotic con- function q(Z) of Z - N(0, 1), for simplicity assuming
ditions, that q(0) = 0 and q'(0) = 1, as in Efron (1982b). Then
zO = D- 'Pr{, < 0} = 0. In this case it looks like methods
SKEWO(io)/6 -s0 (10.5) based on the percentiles of the bootstrap distribution must
where e0 = (&laz)D(z, 0)1,=o, as in give wrong answers,
(10.1). This since if W
last is long-tailed to the right
def-
thenfamily
inition of go can be evaluated for any the correct 0
interval
- fo, (3.13)
as-is long-tailed to the left,
suming only that the necessary derivatives exist. The main and vice versa. However, the BCa method produces at
point here is that SKEWO(10)/6 always approximates eoleast roughly correct intervals, as we saw in the proof of
(10.1), and in SST families co has the acceleration inter- Lemma 1.
pretation (10.2). What happens is the following: for any constant A the
Now to show (10.5). It is possible to reexpress (10.1) transformation gA(t)= (exp(At) - 1)/A gives = g
as t) = gA(,) and ZA = gA(W) satisfying

- - (0) io(juo), (10.6) t + A ZA C = 1 + A . (10.9)


HOof(l'o)
The Taylor series for W = q(Z) begins W = Z + (ywI
where ,&o = (dIdO)yo, the rate of change of the median jio 6)Z2 + ..., where yw = SKEW(W). Then ZA = Z + (Yw/
with respect to 0. For notational convenience suppose that 6)2Z2 + (A12)Z2 +
0 = 0. Instead of 0, consider the statistic X lo(O)/io, The choice A = a - yw/3 results in Za = Z + cZ3
where io equals the Fisher information Eoi/(0)2. The + .--,
pa-the quadratic term canceling out; Za is then ap-
rameter go is invariant under one-to-one transformations proximately normal, so (10.9) is approximately situation
of 0, so we can evaluate the right side of (10.6) in terms (3.6), (3.7), with zo = 0, a = -YW/3. But we know that
of X, 80 = - (0)i o (J)Wo xo("O) . the BCa intervals are correct if we can transform to situation
For 0 = 0, X has expectation EOX = 0 and standard (3.6), (3.7). An application of Lemma 2, assuming that
deviation ax = i-112; in addition, ix(0) = 0, since X = 0 Za - N(0, 1), shows that a = -yw/3 SKEW( 4C))/6
implies that 0 = 0 is a solution of the MLE equation. for the translation family C = C + W, reverifying (4.4).
Assuming the usual asymptotic convergence properties, as
[If Za - N(0, 1) in (10.9), then a must equal e, the constant
value of e;, (10.1), for the translation family C = C + W;
in (5.1), (5.3), we have the following approximations:
one can show directly that E - ywI3 for such a family.]
*X 1; ,u- -yOi11/2/6; fo(pX) -o(O)il 2; lX(pX)
-\/0 yo/6. These are derived from standard Edgeworth In the example 0 - OX9' 19, the two constants zo and a
and Taylor series arguments, which will not be presented are nearly equal. This is no fluke.
here. Taken together they give eO SKEWO(ix)/6 =
SKEWO(l0)/6, which is (10.5). The quantity SKEWO(io)/6 Theorem 2. If 0 is the MLE of 0 in a one-parameter
is O(n-112), and the error of approximation in (10.5) is problem having standard asymptotic properties (5.1) or
quite small, (5.3), then zo a,

80 = [SKEWo(io)/6][1 + O(n-1)]. (10.7) ol A < o = SKEW(io)


ZO -= 4-'Pr 06 1+ O(n'1)].
Approximation (10.5) is particularly easy to understand
in one-parameter exponential families. Suppose that x1, (10.10)
x2, . . . X,n are iid observations from such a family, with
Proof. We follow the notation and results of DiCiccio
sufficient statistic y = x having density fo(Y) = exp{n[Oy
(1984): thus k1, k2, k3 equal the first three cumulants of
- V(0)]}fo(y). In this case formula (10.6) becomes
lo under 0; k01, kO2, ko3 the first three cumulants of 1i; kool,
the first cumulant of lo; and k1l = covo(1o, io). (So k2 =
- = 4p(0) [?0 Puo , (10.8) io, the Fisher information.) All cumulants are assumed to
be 0(n). Then the relative bias of 0 is
where 24 = Eo{y}, ja = mediano{y}, 14' = &1'/&0, and
so forth. The term [(4o - ,4')/oa'] = y'I6[1 + O(n-1)], b --o0 )=ko + 0(n-3/2), (10.11)
and qov(0)/14'fo'(14) = 1 + O(n -1), both of the calcu-

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Efron: Better Bootstrap Confidence Intervals 183

and 0 has skewness Remark D. The multiparametric theory of Section 5


gives an interesting result when applied to location-scale
y_ kool - 13 + O(n-312). (10.12)
families; y = (x, s), q = (0, a), and family of densities
k2
f(y) of the form
Both b and yo are O(n-1'2).
fo,0(x, s) = (1/o2)fo0((x - 0)/a, s/a), (11.2)
Standard Edgeworth theory now gives
fo1(x, s) being a known bivariate density function.
Pr0{0 < 0} = ( - b) - 6 p(b)(b2 - 1) + O(n312) Suppose that we wish to set a confidence interval for
6
the location parameter 0 on the basis of its MLE 0. Par-
(2k3 - koo1) + (koo1 - k3) ametric bootstrap intervals are based on the distribution
=.5 + yp(O)32 of 0* when sampling from f0,,(x*, s*). The BC interval
essentially amounts to pretending that a is known (and
+ O(n_312) equal to ) in (11.2) and that we have only a location
problem to deal with, rather than a location-scale problem.
.5 + q40) 6k312 + O(n-32). In contrast, the BCa interval takes account of the fact that
a is unknown. In particular the least favorable direction
Since SKEW0(i/) = k3/k3'2, this verifies (10.10). ,u, plotted in the (0, a) plane, is not parallel to the 0 axis.
It has a component in the a direction, whose magnitude
In multiparameter problems it is no longer true that zo
is determined by the correlation between x and s. This
a. The geometry of the level surface CO adds another
means that Stein's least favorable family (6.4) does not
term to zo, as in (8.8).
treat a as a constant.
11. REMARKS Table 6 relates to the following choice of fo1(x, s):
Remark A. Suppose that instead of (3.6), (3.7) we X ~ /30/30 - 1, 1 f X (1 + x)(/24/14)112 (11.3)
have co = (1 + A+)), so co = T (T # 1). The transfor-
the two x2 variates being independent. This is a compu-
mations /' =a 4)/ ' = 4/T, give O' = 4' + cr,(Z -
tationally more tractable version of the problem discussed
zo), where ca, = 1 + a+' and a = AT, so we are back in
in Efron (1982, tables 4 and 5). Approximate central 90%
form (3.6), (3.7). Notice that the derivative d(col,/co)ld(ol
intervals are given for 0, having observed (x, s) = (0, 1).
co) = a, as in (4.7). In a similar way we can transform
For any other observed (x, s) the intervals transform in
(3.6), (3.7) so that a,00 = 1 at any point 40; the resulting
value of a satisfies (4.7). the obvious way, 0jx[a] = x + s0Ol[a]. Line 3 of Table 6
shows the exact interval, based on inverting the distribu-
Remark B. Instead of using 4 to estimate 4 in (3.6), tion of the pivotal quantity T = (0 - 0)I for situations
(3.7) we might change to the estimator C) = - cc, (11.2), (11.3).
for some constant c. It turns out that we are still in situation
In this case the BCa method makes a large "second-
(3.6), (3.7) A?(c) = 4) + a(c)(Z - z(c)), where order t correction," as in Example 2 of Section 6, shifting
the BC interval a considerable way rightward and achiev-
= 1 +> a(C)(4 - ?>(C)), ?>c(C) = c/(1 - ac),ing the correct R/L ratio. The length of the BCa interval
(11.1) is 90% the length of the T interval. This deficiency is a
third-order effect, in the spirit of the familiar Student-t
and a(C) = a(1 - ac), z(c) - zO + +(c). The choice c =
correction. It arises from the variability of a as an estimate
-zo/(l - azo) gives z(c) = 0, as in (10.3), (10.4). The
of a, rather than the second-order effect due to the cor-
choice c = a gives approximately the MLE of 4. Inter-
relation of a with 0.
estingly enough, the BCa interval for 4 based on A(C) is the
same for all choices of c. Minor changes in the choice of Remark E. Section 3 says that the family 0 19
estimator seem to have little effect on the BCa intervals can be mapped into form (3.6), (3.7). What are the ap-
in general, though for computational reasons it is best not propriate mappings? It simplifies the problem to consider
to use very biased estimators having large values of zo. the equivalent family 0 - 0(Z2x9co), where co = 18.3337

Remark C. Section 6 uses the MLE 0 = t(A ). This has median( 19). Then 3 g1(0), 4 = g1(0), and W-
one major advantage: the BCa interval for 0, based on 0, g1(xl9/co) give a translation family (3.12), with median(W)
stays the same under all multivariate transformations (6.11).
Stein (1956) noted that the least favorable direction Ap
Table 6. Central 90% Intervals for 0, Having Observed
transforms in the obvious way under (6.11), Di = (x, s) = (0, 1) From the Location-Scale Family
where D is the matrix with ijth element &Fjlaqil&q= A, from (11.2), (11.3) so 0 = 0 and 6 = .966
which it is easy to check that formula (6.5) is invariant:
the constant a is assigned the same value no matter what RL Length

transformations (6.11) are applied. The bootstrap distri- 1. BC interval [- .336, .501] 1.49 .837
bution G~ is similarly invariant, as shown in Efron (1985), 2. BCB interval 1- .303, .603] 1.99 .906
3. T inten/al [- .336, .670] 1.99 1.006
and so is z0. This implies that the BCa intervals are invari-
ant under transformations (6.11). NOTE: Line 3 is based on the actual distribution of th

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
184 Journal of the American Statistical Association, March 1987

= 0, for any mapping g1(t) = of 4(log


given 4 =t)lc1.
0 perfectlyChoosing
normal. Because of (5.3),
c1 which
=
.3292 results in W = q(Z) having q(0) = 0, q'(0) = 1, says that the distributions of 0 are approaching normality
as in the discussion of translation families in Section 10. at the usual 0O(n"-I2) rate, the normalizing transformation
Section 10 suggests normalizing a translation family by g is asymptotically linear, g(O) = 0 + C202 + C303 +
gA(t) = (exp(At) - 1)/A, a good choice for A being the C2= O(n 112), C3 = O(n'1).
constant co, (10.1), which equals .1090 for all 0 in the We will assume that the problem is already in the form
family 2 0(xI9/co). The combined transformation g(t) 0 = 0, with the cdf of 0 for 0 = 0 normal, say
= gA(g1(t)) is g(t) = 9.1746[t3311 - 1]. The transformed
Go - N(-zo, 1). (12.1)
family + = g(0), + = g(0) is of form (3.6), (3.7),
Here zo = V'PD{6 < 0} must be included because it is
= + (1 + . 1090 q5)Z, not affected by any monotonic transformations; zo0 y/
Z = 9.1746[(X2 icO).3311 - 1]. (11.4) 6 is O(n-12) by (5.3). A simple exercise, using the mean
Numerical calculations verify that Z as defined in (11.4) value theorem of calculus, shows that if (5.4) is true in the

is very close to a standard normal variate. In fact we have transformed problem (12.1), then it is true in the original

automatically recovered, nearly, the Wilson-Hilferty cube problem.

root transformation (Johnson and Kotz 1970). Using (11.4), Assuming (5.3), 0 = 0, and (12.1), we will show that
the exact interval has endpoint
it is not difficult to show that g(t), as defined previously,
gives approximately (3.6), (3.7) when applied to the family
O O (X2o 19) considered in Section 3, with constants zo OEX[a] 0(a Z(Z(a
1 - a 6 (a) + P + (y I6)(z(a)2 - 1)
and a as stated. Schenker (1985) gave almost the same
result. +(?012)(zo + Z(a))3, (12.2)

Remark F Suppose that y = (x1? x2 ... , xn), where compared with


the xi are an iid sample from a regular one-parameter ZO + z(a)
family f0(xi), and that 0L(y) is a first-order efficient esti- OBCja] 1 - &0(z0 + Z(a)) (12.3)
mator of 0, like the MLE. The score function lo appearing
in (4.4) is that based just on 0, rather than the score for the BCa interval. In this section the symbol "=_' in-
function based on the entire data set y. However, it is easy dicates accuracy through O(n-1) or Op(n-'), with errors
to show from considerations like those in Efron (1975) O(n 32) or Op(n 312). Then
that the two score functions are asymptotically identical.
OBCja I] - OEx[a]
Their skewnesses differ only by amount Op(n- 1). It is often
more convenient to calculate a from the score function for
y rather than for 0, as was done, for example, in (6.5). OBCja]{j7OZO + fbo + (5o/6)(z(a)2 - 1)}
Remark G. McCullagh (1984) and Cox (1980) gave an
interesting approximate confidence interval for 0, which
- (QoI2)(zo + Z(a))3, (12.4)
for the simple case 0 - fo has endpoint which is Op(n-1), as claimed in Theorem 1
The proof of (12.2) begins by noting that (12.1) implies
OAPP[aI = 0 + 1I 2 that fo = -zo, co = 1, O = 0, 60 = 0. Then (5.3) gives
f()+(3k E 0 = 0+
I + fio (1 + 2kool
fto)0 - Zo,
A ~ ~~~~ ~ *.
GO 1 + 600 + aoO2/2,

Yo-YoS, bA 0, (12.5)
k001 = (E010)00.
for 0 = 0(1) [i.e., for 0 a bounded function of n, in the Fo
asymptotic approxim
sequence of situations referred to in (5.3)]. The 100 * a
(see also Barndorff-Nielsen 1984). percentile of 0 given 0 is
It can be shown, as indicated in Section 12, that OBcj[a]
also closely matches (11.5), (OBC, [a] - OAPP[a])Ia = A (0 + fZ) + 0a{Z + ( 0/6)(z(a)2 - 1)}
Op(n-1). We see again that the BCa method offers a way - [(1 + fio)0 - zo] + [1 + Coo + (/o/2)02]
to avoid theoretical effort, at the expense of increased
numerical computations. X [z(a) + (y00/6)(z(a)2 - 1)], (12.6)
using a Cornish-Fisher expansion and (12.5). The 0, how-
12. PROOF OF THEOREM 1
ever, that has 0(a) = 0 is by definition OEX[1 - ae]. Solving
A monotonic mapping + = g(O), = g(0) transforms the lower expression in (12.6) for 0 and substituting 1 -
the exact confidence interval in the obvious way, rx'[a] a for a gives (12.2).
= g(OEX[a]); likewise for the BCa interval. By using such The proof of (12.3) follows from (3.8), (3.9), and (12.1)
a mapping we can always make 4 = 0 and the distribution [which says that G~ - N( - z, 1)], if we can establish that

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms
Efron: Better Bootstrap Confidence Intervals 185

a - &. In fact, we show below that (1984b), "Jackknife Approximations to Bootstrap Estimates,"
The Annals of Statistics, 12, 101-118.
8 - o for 0 = O(n-1/2) (12.7) Bickel, P. J., and Freedman, D. A. (1981), "Some Asymptotic Theory
for the Bootstrap," The Annals of Statistics, 9, 1196-1217.
which combines with a = co/(l + ozo) -co to give the Cox, D. R. (1980), "Local Ancillarity," Biometrika, 67, 279-286.
DiCiccio, T. J. (1984), "On Parameter Transformations and Interval
required result.
Estimation," technical report, McMaster University, Dept. of Math-
Formula (12.7) follows from (12.5), which gives the sim- ematical Science.
pler expressions Efron, B. (1975), "Defining the Curvature of a Statistical Problem (With
Applications to Second Order Efficiency)" (with discussion), The An-
ES-0 - zO, CO 1 + &0o, yo 0, -0 nals of Statistics, 3, 1189-1242.
(1979), "Bootstrap Methods: Another Look at the Jackknife,"
(12.8) The Annals of Statistics, 7, 1-26.
(1981), "Nonparametric Standard Errors and Confidence Inter-
for 0 = O(n-1/2). The cdf of 0 given 0 is calculated to be vals" (with discussion), Canadian Journal of Statistics, 9, 139-172.
(1982a), "The Jackknife, the Bootstrap, and Other Resampling
G0(O) - (z=)to - (jo/6)(zo - 1), (12.9) Plans," CBMS 38, SIAM-NSF.
(1982b), "Transformation Theory: How Normal Is a Family of
-(0 - 0 - flo)Io, to = (aIaO)z0. Straightforward Distributions?," The Annals of Statistics, 10, 323-339. (NOTE Cor-
rigenda, The Annals of Statistics, 10, 1032.)
expansions give
(1984), "Comparing Non-nested Linear Models," Journal of the
American Statistical Association, 79, 791-803.
( ), 1 + OoZ(a) + ,Bo + (yo/6)(z(a)2 - 1) (1985), "Bootstrap Confidence Intervals for a Class of Parametric
D(Z(a)I0) ~ 1 + fo- ~0/6 Problems," Biometrika, 72, 45-58.
Fieller, E. C. (1954), "Some Problems in Interval Estimation," Journal
(12.10) of the Royal Statistical Society, Ser. B, 16, 175-183.
Hall, P. (1983), "Inverting an Edgeworth Expansion," The Annals of
from which co = (alaz)D(z, 0) k,=o &o/(1 + fio - Statistics, 11, 569-576.
Hougaard, P. (1982), "Parameterizations of Non-linear Models," Journal
6), verifying (12.7), (12.3), and the main result (12.4).
of the Royal Statistical Society, Ser. B, 44, 244-252.
The proof that OBca[a] also matches the Cox-McCullagh Johnson, N. J. (1978), "Modified t Tests and Confidence Intervals for
formula (11.5) is similar to the proof of Theorem 1 and Asymmetrical Populations," Journal of the American Statistical As-
will not be presented here. The main step is an expression sociation, 73, 536-544.
Johnson, N. L., and Kotz, S. (1970), Continuous Univariate Distribu-
for OBCa [a] involving Lemma 5, tions-2, Boston: Houghton-Mifflin.
Kendall, M., and Stuart, A. (1958), The Advanced Theory of Statistics,
OBCa[a] Z(a) + (k3/6k2'2){z(a)2 + 1} London: Charles W. Griffin.
McCullagh, P. (1984), "Local Sufficiency," Biometrika, 71, 233-244.
+ (k3/6k2/2)2{2z(a) + Z(a)3}. (12.11) Schenker, N. (1985), "Qualms About Bootstrap Confidence Intervals,"
Journal of the American Statistical Association, 80, 360-361.
[Received November 1984. Revised December 1985.] Singh, K. (1981), "On the Asymptotic Accuracy of Efron's Bootstrap,"
The Annals of Statistics, 9, 1187-1195.
REFERENCES Stein, C. (1956), "Efficient Nonparametric Testing and Estimation," in
Proceedings of the Third Berkeley Symposium, Berkeley: University
Abramovitch, L., and Singh, K. (1985), "Edgeworth Corrected Pivotal of California Press, pp. 187-196.
Statistics and the Bootstrap," The Annals of Statistics, 13, 116-132. Tukey, J. (1949), "Standard Confidence Points," Memorandum Report
Barndorff-Nielsen, 0. E. (1984), "Confidence Limits From ci!IL," Re- 26, unpublished address presented to the Institute of Mathematical
port 104, University of Aarhus, Dept. of Theoretical Statistics. Statistics.
Bartlett, M. S. (1953), "Approximate Confidence Intervals," Biome- Withers, C. S. (1983), "Expansions for the Distribution and Quantiles
trika, 40, 12-19. of a Regular Functional of the Empirical Distribution With Applica-
Beran, R. (1984a), "Bootstrap Methods in Statistics," Jber. d. Dt. Math- tions to Nonparametric Confidence Intervals," The Annals of Statistics,
Verein, 86, 14-30. 11, 577-587.

This content downloaded from


80.140.122.82 on Wed, 29 Mar 2023 15:06:18 UTC
All use subject to https://about.jstor.org/terms

You might also like