Professional Documents
Culture Documents
This Content Downloaded From 140.213.190.131 On Tue, 13 Apr 2021 09:26:31 UTC
This Content Downloaded From 140.213.190.131 On Tue, 13 Apr 2021 09:26:31 UTC
This Content Downloaded From 140.213.190.131 On Tue, 13 Apr 2021 09:26:31 UTC
Statistical Accuracy
Author(s): B. Efron and R. Tibshirani
Source: Statistical Science , Feb., 1986, Vol. 1, No. 1 (Feb., 1986), pp. 54-75
Published by: Institute of Mathematical Statistics
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
54
F indicate the empirical probability distribution, We will see that bootstrap confidence intervals can
automatically incorporate tricks like this, without re-
(1.4) F: probability mass 1/n on x1, x2, ... , xn.
quiring the data analyst to produce special techniques,
Then we can simply replace F by F in (1.2), obtaining like the tanh-1 transformation, for each new situation.
An important theme of what follows is the substitution
(1.5) a - F) = [U2(P)/nl . of raw computing power for theoretical analysis. This
is not an argument against theory, of course, only
as the estimated standard error for i. This is the
against unnecessary theory. Most common statistical
bootstrap estimate. The reason for the name "boot-
methods were developed in the 1920s and 1930s, when
strap" will be apparent in Section 2, when we evaluate
computation was slow and expensive. Now that com-
v(F) for statistics more complicated than x. Since
putation is fast and cheap we can hope for and expect
changes in statistical methodology. This paper dis-
(1.6) /2 a 82(F) = X (
cusses one such potential change, Efron (1979b) dis-
cusses several others.
a is not quite the same as a-, but the difference is too
small to be important in most applications.
2. THE BOOTSTRAP ESTIMATE OF STANDARD
Of course we do not really need an alternative
ERROR
formula to (1.3) in this case. The trouble begins when
we want a standard error for estimators more compli- This section presents a more careful description of
cated than x, for example, a median or a correlation the bootstrap estimate of standard error. For now we
or a slope coefficient from a robust regression. In most will assume that the observed data y = (xl, x2, * *
cases there is no equivalent to formula (1.2), which xn) consists of independent and identically distributed
expresses the standard error a(F) as a simple function
(iid) observations X1, X2- .-, Xn fiid F, as in (1.1).
of the sampling distribution F. As a result, formulas Here F represents an unknown probability distribu-
like (1.3) do not exist for most statistics. tion on r, the common sample space of the observa-
This is where the computer comes in. It turns out tions. We have a statistic of interest, say 0(y), to
that we can always numerically evaluate the bootstrap which we wish to assign an estimated standard error.
estimate a = a(F), without knowing a simple expres- Fig. 1 shows an example. The sample space r is
sion for a(F). The evaluation of a is a straightforward n2+, the positive quadrant of the plane. We have
Monte Carlo exercise described in the next section. In observed n = 15 bivariate data points, each corre-
a good computing environment, as described in the
sponding to an American law school. Each point xi
remarks in Section 2, the bootstrap effectively gives
consists of two summary statistics for the 1973 enter-
the statistician a simple formula like (1.3) for any ing class at law school i
statistic, no matter how complicated.
Standard errors are crude but useful measures of (2.1) xi= (LSATi, GPAi);
statistical accuracy. They are frequently used to give
approximate confidence intervals for an unknown
LSATi is the class' average score on a nationwide
parameter 0
exam called "LSAT"; GPAi is the class' average un-
dergraduate grades. The observed Pearson correlation
(1.7) 0 E 0 ? Sz(a),
coefficient for these 15 points is 6 = .776. We wish to Carlo algorithm will not converge to a' if the bootstrap
assign a standard error to this estimate. sample size differs from the true n. Bickel and Freed-
Let o(F) indicate the standard error of 0, as a man (1981) show how to correct the algorithm to give
function of the unknown sampling distribution F, a if in fact the bootstrap sample size is taken different
than n, but so far there does not seem to be any
(2.2) a(F) = [VarF{Ny)
practical advantage to be gained in this way.
Of course a (F) is also a function of the sample size n Fig. 2 shows the histogram of B = 1000 bootstrap
and the form of the statistic 0(y), but since both of replications of the correlation coefficient from the law
these are known they need not be indicated in the school data. For convenient reference the abscissa is
notation. The bootstrap estimate of standard error is plotted in terms of 0* - 0 = 0* - .776. Formula (2.4)
gives 6' = .127 as the bootstrap estimate of standard
(2.3) =
error. This can be compared with the usual normal
where F is the empirical distribution (1.4), putting theory estimate of standard error for 0,
probability 1/n on each observed data point xi. In the
(2.5) TNORM = (1 - )/(n - 3)1/ = .115,
law school example, F is the distribution putting mass
1/15 on each point in Fig. 1, and a is the standard [Johnson and Kotz (1970, p. 229)].
deviation of the correlation coefficient for 15 iid points
drawn from F. REMARK. The Monte Carlo algorithm leading to
In most cases, including that of the correlation U7B (2.4) is simple to program. On the Stanford version
coefficient, there is no simple expression for the func- of the statistical computing language S, Professor
tion a(F) in (2.2). Nevertheless, it is easy to numeri- Arthur Owen has introduced a single command which
cally evaluate a = 0(F) by means of a Monte Carlo bootstraps any statistic in the S catalog. For instance
algorithm, which depends on the following notation: the bootstrap results in Fig. 2 are obtained simply by
= (x4, 4, *, x*) indicates n independent draws typing
from F, called a bootstrap sample. Because F is the
tboot(lawdata, correlation, B = 1000).
empirical distribution of the data, a bootstrap sample
turns out to be the same as a random sample of size The execution time is about a factor of B greater than
n drawn with replacement from the actual sample that for the original computation.
{X1, X2, .. * * Xnl.
The Monte Carlo algorithm proceeds in three steps: There is another way to describe the bootstrap
(i) using a random number generator, independently standard error: F is the nonparametric maximum like-
draw a large number of bootstrap samples, say y*(1), lihood estimate (MLE) of the unknown distribution F
y*(2), ***, y*(B); (ii) for each bootstrap sample y*(b),(Kiefer and Wolfowitz, 1956). This means that the
evaluate the statistic of interest, say 0*(b) = 0(y*(b))gbootstrap estimate af = a(F) is the nonparametric
b = 1, 2, * , B; and (iii) calculate the sample standard MLE of v(F), the true standard error.
deviation of the 0*(b) values In fact there is nothing which says that the boot-
strap must be carried out nonparametrically. Suppose
A Zb=1 {8*(b)--A
*.)}0/
0()2A 1/2 for instance that in the law school example we believe
the true sampling distribution F must be bivariate
(2.4) B-i
normal. Then we could estimate F with its parametric
l*(.)= >20*(b) MLE FNORM, the bivariate normal distribution having
the same mean vector and covariance matrix as the
B~~~~~~
It is easy to see that as B - 60, 5B will approach
a = (F), the bootstrap estimate of standard error.
All we are doing is evaluating a standard deviation
by Monte Carlo sampling. Later, in Section 9, we NORMAL
data. The bootstrap samples at step (i) of the algo- expression for the standard error of an MLE), and in
rithm could then be drawn from FNORM instead of F, fact all analytic difficulties of any kind. The data
and steps (ii) and (iii) carried out as before. analyst is free to obtain standard errors for enor-
The smooth curve in Fig. 2 shows the results of mously complicated estimators, subject only to the
carrying out this "normal theory bootstrap" on the constraints of computer time. Sections 3 and 6 discuss
law school data. Actually there is no need to do the some interesting applied problems which are far too
bootstrap sampling in this case, because of Fisher's complicated for standard analyses.
formula for the sampling density of a correlation coef- How well does the bootstrap work? Table 1 shows
ficient in the bivariate normal situation (see Chapter the answer in one situation. Here r is the real line,
32 of Johnson and Kotz, 1970). This density can be n = 15, and the statistic 0 of interest is the 25%
thought of as the bootstrap distribution for B = oo. trimmed mean. If the true sampling distribution F is
Expression (2.5) is a close approximation to "1NORM = N(0, 1), then the true standard error is a(F) = .286.
o(FNORM), the parametric bootstrap estimate of stand- The bootstrap estimate 'a is nearly unbiased, averaging
ard error. .287 in a large sampling experiment. The standard
In considering the merits or demerits of the boot- deviation of the bootstrap estimate a' is itself .071 in
strap, it is worth remembering that all of the usual this case, with coefficient of variation .071/.287 = .25.
formulas for estimating standard errors, like g-l/2 (Notice that there are two levels of Monte Carlo
where J is the observed Fisher information, are es- involved in Table 1: first drawing the actual samples
sentially bootstrap estimates carried out in a para- y = (xl, x2, ..., x15) from F, and then drawing boot-
metric framework. This point is carefully explained in strap samples (x4, x2*, * * *, x15) with y held fixed. The
Section 5 of Efron (1982c). The straightforward non- bootstrap samples evaluate a' for a fixed value of y.
parametric algorithm (i)-(iii) has the virtues of avoid- The standard deviation .071 refers to the variability
ing all parametric assumptions, all approximations of a' due to the random choice of y.)
(such as those involved with the Fisher information The jackknife, another common method of assign-
ing nonparametric standard errors, is discussed in
Section 10. The jackknife estimate CJ' is also nearly
TABLE 1
unbiased for a(F), but has higher coefficient of vari-
A sampling experiment comparing the bootstrap and jackknife
estimates of standard error for the 25% trimmed mean,
ation (CV). The minimum possible CV for a scale-
sample size n = 15 invariant estimate of a(F), assuming full knowledge
of the parametric model, is shown in brackets. The
F standard F negative
normal exponential
nonparametric bootstrap is seen to be moderately
efficient in both cases considered in Table 1.
Ave SD CV Ave SD CV
Table 2 returns to the case of 0 the correlation
Bootstrap f .2s87 .071 .25 .242 .078 .32 coefficient. Instead of real data we have a sampling
(B = 200) experiment in which the true F is bivariate normal,
Jackknife 6f .280 .084 .30 .224 .085 .38
true correlation 0 = .50, sample size n = 14. Table 2
True (minimum CV) .286 (.19) .232 (.27)
is abstracted from a larger table in Efron (1981b), in
TABLE 2
Estimates of standard error for the correlation coefficient C and for X = tanh-'6; sample size n 14, distribu
correlation p = .5 (from a larger table in Efron, 1981b)
1. Bootstrap B = 128 .206 .066 .32 .067 .301 .065 .22 .065
2. Bootstrap B = 512 .206 .063 .31 .064 .301 .062 .21 .062
3. Normal smoothed bootstrap B = 128 .200 .060 .30 .063 .296 .041 .14 .041
4. Uniform smoothed bootstrap B = 128 .205 .061 .30 .062 .298 .058 .19 .058
5. Uniform smoothed bootstrap B = 512 .205 .059 .29 .060 .296 .052 .18 .052
which some of the methods for estimating a standard time (y) in weeks for two groups, treatment (x = 0)
error required the sample size to be even. and control (x = 1), and a 0-1 variable (bi) indicating
The left side of Table 2 refers to 0, while the right whether or not the remission time is censored (0) or
side refers to X = tanh-'(0) = .5 log(1 + 6)/(1 -). complete (1). There are 21 mice in each group.
For each estimator of standard error, the root mean The standard regression model for censored data is
squared error of estimation [E(a - )2]1/2 iS given in Cox's proportional hazards model (Cox, 1972). It as-
the column headed VMi.. sumes that the hazard function h(t I x), the probability
The bootstrap was run with B = 128 and also with of going into remission in next instant given no re-
B = 512, the latter value yielding only slightly better mission up to time t for a mouse with covariate x, is
estimates in accordance with the results of Section 9. of the form
Further increasing B would be pointless. It can be
(3.1) h(t I x) = ho(t)e:x.
shown that B = oo gives Vii-i = .063 for 0, only .001
less than B = 512. The normal theory estimate (2.5), Here ho(t) is an arbitrary unspecified function. Since
which we know to be ideal for this sampling experi- x here is a group indicator, this means simply that the
ment, has ../i-Si = .056. hazard for the control group is e: times the hazard for
We can compromise between the totally nonpara- the treatment group. The regression parameter d is
metric bootstrap estimate a' and the totally parametric estimated independently of ho(t) through maximiza-
bootstrap estimate C7NORM. This is done in lines 3, 4, tion of the so called "partial likelihood"
and 5 of Table 2. Let 2 = Sin-l (xi - )(x- i)'/n be e,3xi
the sample covariance matrix of the observed data. (3.2) PL = 11 e-xi
The normal smoothed bootstrap draws the bootstrap iED EiER, e i'
sample from F (D N2(0, .252), (D indicating convolu-where D is the set of indices of the failure times and
tion. This amounts to estimating F by an equal mix- Ri is the set of indices of those at risk at time yi. This
ture of the n distributions N2(xi, .252), that is bymaximization
a requires an iterative computer search.
normal window estimate. Each point xi* in a smoothed The estimate d for these data turns out to be 1.51.
bootstrap sample is the sum of a randomly selected Taken literally, this says that the hazard rate is e'5'
original data point xj, plus an independent bivariate = 4.33 times higher in the control group than in the
normal point zj - N2(0, .252). Smoothing makes littletreatment group, so the treatment is very effective.
difference on the left side of the table, but is spectac- What is the standard error of fA? The usual asymptotic
ularly effective in the k case. The latter result is maximum likelihood theory, one over the square root
suspect since the true sampling distribution is bivar- of the observed Fisher information, gives an estimate
iate normal, and the function q = tanh- O is specifi- of .41. Despite the complicated nature of the estima-
cally chosen to have nearly constant standard error in tion procedure, we can also estimate the standard error
the bivariate normal family. The uniform smoothed using the bootstrap. We sample with replacement
bootstrap samples from F (D W(0, .252), where from the triples I(y', xi, 5k), * ..*, (Y42, x42, 642)). For
WI(0, .252) is the uniform distribution on a rhombus each bootstrap sample $(y*, x*, 0), .., (y*, x4*2,
selected so VI has mean vector 0 and covariance matrix 6*)) we form the partial likelihood and numerically
.25Z. It yields moderate reductions in vMi-SR for both maximize it to produce the bootstrap estimate A*. A
sides of the table. histogram of 1000 bootstrap values is shown in Fig. 3.
Line 6 of Table 2 refers to the delta method, which The bootstrap estimate of the standard error of A
is the most common method of assigning nonpara- based on these 1000 numbers is .42. Although the
metric standard error. Surprisingly enough, it is badly bootstrap and standard estimates agree, it is interest-
biased downward on both sides of the table. The delta ing to note that the bootstrap distribution is skewed
method, also known as the method of statistical dif- to the right. This leads us to ask: is there other
ferentials, the Taylor series method, and the infinites- information that we can extract from the bootstrap
imal jackknife, is discussed in Section 10. distribution other than a standard error estimate? The
answer is yes-in particular, the bootstrap distribu-
3. EXAMPLES tion can be used to form a confidence interval for fA,
as we will see in Section 9. The shape of the bootstrap
Example 1. Cox's Proportional Hazards Model
distributiion will help determine the shape of the
In this section we apply bootstrap standard error confidence interval.
estimation to some complicated statistics. In this example our resampling unit was the triple
The data for this example come from a study of (yi, xi, bi), and we ignored the unique elements of the
leukemia remission times in mice, taken from Cox problem, i.e., the censoring, and the particular model
(1972). They consist of measurements of remission being used. In fact, there are other ways to bootstrap
m
a3
(3.4) E(Yi)= X sj(aj - xi).
j=l
TABLE 3
BHCG blood serum levels for 54 patients having metasticized breas
4
cancer in ascending order <
0.1, 0.1, 0.2, 0.4, 0.4, 0.6, 0.8, 0.8, 0.9, 0.9, 1.3, 1.3, 1.4, 1.5, 1.6,
3 1.6, 1.7, 1.7, 1.7, 1.8, 2.0, 2.0, 2.2, 2.2, 2.2, 2.3, 2.3, 2.4, 2.4, 2.4,
2.4, 2.4, 2.4, 2.5, 2.5, 2.5, 2.7, 2.7, 2.8, 2.9, 2.9, 2.9, 3.0, 3.1, 3.1,
3.2, 3.2, 3.3, 3.3, 3.5, 4.4, 4.5, 6.4, 9.4
So far we have discussed statistical error, or accu- (4.5) = 2.29 - 2.32 = -0.03.
racy, in terms of the standard error. It is easy to assess according to (4.4). (The estimated standard deviation
other measures of statistical error, such as bias or of fB- - due to the limitations of having B = 1000
prediction error, using the bootstrap. bootstraps is only 0.005 in this case, so we can ignore
Consider the estimation of bias. For a given statistic the difference between fB and A3.) Whether or not a
0(y), and a given parameter ,u(F), let bias of magnitude -0.03 is too large depends on the
(4.1) R(y, F) = 0(y) - A(F). context of the problem. If we attempt to remove the
bias by subtraction, we get 0- = 2.24 - (-0.03)-
(It will help keep our notation clear to call the param- 2.27. Removing bias in this way is frequently a bad
eter of interest A rather than 0.) For example, ,u might idea (see Hinkley, 1978), but at least the bootstrap
be the mean of the distribution F, assuming the sample analysis has given us a reasonable picture of the bias
space X is the real line, and 0 the 25% trimmed mean. and standard error of 0.
The bias of 0 for estimating it is Here is another measure of statistical accuracy,
(4.2) A(F) = EFR(Y, F) = EF10(y)) - A(F). different from either bias or standard error. Let 0(y)
be the 25% trimmed mean and M(F) be the mean of
The notation EF indicates expectation with respect to F, as in the serum example, and also let i(y) be the
the probability mechanism appropriate to F, in this interquartile range, the distance between the 25th and
case y = (xl, x2, - - *, xn) a random sample from F. 75th percentiles of the sample y = (x1, x2, *--, xn).
The bootstrap estimate of bias is Define
0=
By keeping track of the empirical distribution of (5.2)
R(y*(b), F), we can pick off the values of p which medianIVj - Ui, i = 1, 2, *.. m, j = 1, 2, ..., n}.
make (4.9) equal .05 and .95. These approach F(05)(F) Having observed U1 = ul, U2 = u2, *--, Vn = Vn,
and p(95)(F) as B -* oo. we desire an estimate for a(F, G), the standard error
For the serum data, B = 1000 bootstrap replications of 0.
gave p( 5)(F) = -.303 and p 95)(F) = .078. Substituting The bootstrap estimate of a(F, G) is a = G(F, G),
these values into (4.9), and using the observed esti- where F is the empirical distribution of u1, u2, * . *,
mates 0 = 2.24, i = 1.40, gives um, and G is the empirical distribution of v1, v2, ** ,
vn. It is easy to modify the Monte Carlo algorithm of
(4.10) L E [2.13, 2.66]
Section 2 to numerically evaluate v. Let y = (u1, u2,
as a central 90% "bootstrap t interval" for the true *-, vn) be the observed data vector. A bootstrap
mean ,u(F). This is considerably shorter than the sample y* = (uiu, -* * *, u*4, v , v, ... *, v*) consists
standard t interval for ,t based on 53 degrees of free-of a random sample U*, ..., U* from F and an
independent random sample V*, - - -, V* from G. Wit
dom, i ? 1.67W = [1.97, 2.67]. Here -a = .21 is the usual
estimate of standard error (1.3). only this modification, steps (i) through (iii) of the
Bootstrap confidence intervals are discussed further Monte Carlo algorithm produce fB, (2.4), approaching
in Sections 7 and 8. They require more bootstrap aas B -- oo.
replications than do bootstrap standard errors, on the Table 4 reports on a simulation experiment inves-
order of B = 1000 rather than B = 50 or 100. This tigating how well the bootstrap works on this problem.
point is discussed briefly in Section 9. 100 trials of situation (5.1) were run, with m = 6,
By now it should be clear that we can use any n = 9, F and G both Uniform [0, 1]. For each trial,
random variable R(y, F) to measure accuracy, not just both B = 100 and B = 200 bootstrap replications were
(4.1) or (4.6), and then estimate EFIR(y, F)} by its generated. The bootstrap estimate JB was nearly un-
bootstrap value EpIR(y*, F1) }- b=1 R(y*(b), F)/B. biased for the true standard error u(F, G) = .167 for
Similarly we can estimate EFR(y, F)2 by EpR(y*, F)2, either B = 100 or B = 200, with a quite small standard
etc. Efron (1983) considers the prediction problem, in deviation from trial to trial. The improvement in going
which a training set of data is used to construct a from B = 100 to B = 200 is too small to show up in
prediction rule. A naive estimate of the prediction this experiment.
rule's accuracy is the proportion of correct guesses it In practice, statisticians must often consider quite
makes on its own training set, but this can be greatly complicated data structures: time series models, mul-
over optimistic since the prediction rule is explicitly
constructed to minimize errors on the training set. In TABLE 4
this case, a natural choice of R(y, F) is the over Bootstrap estimate of standard error for the Hodges-Lehmann
optimism, the difference between the naive estimate two-sample shift estimate; 100 trials
and the actual success rate of the prediction rule for Summary statistics for OB
new data. Efron (1983) gives the bootstrap estimate
Ave SD CV
of over optimism, and shows that it is closely related
to cross-validation, the usual method of estimating B = 100 .165 .030 .18
B = 200 .166 .031 .19
over optimism. The paper goes on to show that some
True o .167
modifications of the bootstrap estimate greatly out
perform both cross-validation and the bootstrap. Note: m = 6, n = 9; true distributions F a
tifactor layouts, sequential sampling, censored and iates; and g is a known function of / and ti, for inst
missing data, etc. Fig. 8 illustrates how the bootstrap e:'ti. The ei are an iid sample from some unknown
estimation process proceeds in a general situation. distribution F on the real line,
The actual probability mechanism P which generates
(5.4) e1, 2 . . . En F,
the observed data y belongs to some family P of
possible probability mechanism. In the Hodges-Leh- where F is usually assumed to be centered at 0 in some
mann example, P = (F, G), a pair of distributions on sense, perhaps EHe} = 0 or Prob$e < 0} = .5. The
the real line, P equals the family of all such pairs, and probability model is P = (A, F); (5.3) and (5.4) describe
y = (u1, U2, *--, Um, V1, v2, *--, vn) is generated the step P -* y in Fig. 8. The covariates t1, t2, *. . , tn
by random sampling m times from F and n times like the sample size n in the simple problem (1.1), are
from G. considered fixed at their observed values.
We have a random variable of interest R(y, P), For every choice of A we have a vector g(3) =
which depends on both y and the unknown model P, (g(/, t1), g(B, t2), * * , g(/, tn)) of predicted values f
and we wish to estimate some aspect of the dis- y. Having observed y, we estimate A by minimizing
tribution of R. In the Hodges-Lehmann example, some measure of distance between g(/3) and y,
R(y, P) = 0(y) - Ep{0j, and we estimated o(P) =
(5.5) /: min D(y, g(/)).
lEpR(y, p)2}1/2, the standard error of 0. As before, the
notation Ep indicates expectation when y is generated
The most common choice of D is D(y, g) =
according to mechanism P.
We assume that we have some way of estimating = 1$yi - g(/, ti)12.
How accurate is as an estimate of d? Let R(y, P)
the entire probability model P from the data y, pro-
equal the vector / -. A familiar measure of accuracy
ducing the estimate called P in Fig. 8. (In the two-
is the mean square error matrix
sample problem, P = (F, G), the pair of empirical
distributions.) This is the crucial step for the bootstrap.
(5.6) Z(P) = Ep( - )( - /)'
It can be carried out either parametrically or nonpar-
= EpR(y, P)R(y, P)'.
ametrically, by maximum likelihood or by some other
estimation technique. The bootstrap estimate of accuracy Z = AP) is ob-
Once we have P, we can use Monte Carlo methods tained by following through Fig. 8.
to generate bootstrap data sets y*, according to the There is an obvious choice for P = (/3, F) in this
same rules by which y is generated from P. The case. The estimate /3 is obtained from (5.5). Then F is
bootstrap random variable R(y*, P) is observable, the empirical distribution of the residuals,
since we know P as well as y*, so the distribution of
F: mass(1/n) on 90-9g(/, ti),
R(y*, P) can be found by Monte Carlo sampling. The (5.7)
bootstrap estimate of EpR(y, P) is then EfR(y*, P), i = 1, ... , n.
and likewise for estimating any other aspect of
A bootstrap sample y* is obtained by following rules
R(y, P)'s distribution. (5.3) and (5.4),
A regression model is a familiar example of a com-
plicated data structure. We observe y = (Yi, Y2, * (5.8) Y' = g(, ti) + + , i = 1,2, * * * n,
Yn), where where er c* ',2 e * is an iid sample from F. Notice
that the e* are independent bootstrap variates, even
(5-3) yi = g(#, ti) + ei i = 1, 2, *--, n.
though the e'i are not independent variates in the usual
Here A is a vector of unknown parameters we wish to sense.
estimate; for each i, ti is an observed vector of covar- Each bootstrap sample y*(b) gives a bootstrap value
Al*(b),
FAMILY OF
pe ----------- P y P y
as in (5.5). The estimate
Section 7 of Efron (1979a) shows that the bootstrap The statistician gets to observe
estimate, B = oo, can be calculated without Monte
(5.14) Xi= min{X?, WiJ
Carlo sampling, and is
and
150
producing a value of -.06 for X with an estimated 95%
confidence interval of (-.18, .06).
Fig. 13 shows the transformation selected by the
100 ACE algorithm. For comparison, the log function is
plotted (normalized) on the same figure.
The similarity is truly remarkable! In order to assess
50
the variability of the ACE curve, we can apply the
bootstrap. Since the X matrix in this problem is fixed
by design, we resampled from the residuals instead of
1.1 1.2 1.3 1.4 1.5
from the (xi, y-) pairs. The bootstrap procedure was
the following:
FIG. 11. Bootstrap histogram of & &,*.,&o for the Wolfer sun-
spot data, model (6.2).
Calculate residuals , = s(y,) - xi. *3, i = 1, 2, ... , n.
Repeat B times
300 Choose a sample l , *,n
with replacement from el, E n
Calculate y* = s-'(x, - + i = 1, 2, n
Compute s*(.) = result of ACE algorithm
applied to (x1, y ), .*., (xn, y*)
200 End
izes (6.3) to
(6.4) S(y1) = x- ~
0
where s(.) is an unspecified smooth function. (In its
Log Function
most general form, ACE allows for transformations of
the covariates as well.) The function s(.) and param-
eter j3 are estimated in an alternating fashion, utilizing
a nonparametric smoother to estimate s(-).
In the following example, taken from Friedman and
Tibshirani (1984), we compare the Box and Cox pro- -2
, , , , I , I I , . , .
cedure to ACE and use the bootstrap to assess the o 1000 2000 3000
y
variability of ACE.
The data from Box and Cox (1964) consist of a 3 X FIG. 13. Estimated transformation from ACE and the log function
3q X 3q expenriment on the strength of yar-ns, the re- for Box and Cox example.
TABLE 5
2- Exact and approximate central 90% confidence intervals for 0, the
true correlation coefficient, from the law schoQl data of Fig. 1
TABLE 6
Four methods of setting approximate confidence intervals for a real valued parameter 6
mapping k = g(0), q = g(0) transforms this family Central 90% confidence intervals for 6 having observed
H(X2 /19)
to - N(0, T2), as in (7.6). If there were such a g,
then Proba9G < 01 = Probofo < 14 = .50, but for1. Exact [.631 6 6, 1.88 * 6] R/L = 2.38
2. Standard (1.7) [.466 * 6,1.53 * 6] R/L = 1.00
O = .776 integrating the density function f776(0)
3. BC (7.9) [.580 * 6,1.69 * 6] R/L = 1.64
gives Prob=.776{0 < 01 = .431.
4. BCa (7.15) [.630 *0, 1.88 6 8] R/L = 2.37
The bias-corrected percentile method (BC method), 5. Nonparametric BCa [.640 6 8, 1.68 6 6] R/
line 3 of Table 6, makes an adjustment for this type
Note: The exact interval is sharply skewed to the r
of bias. Let
BC method is only a partial improvement over the standard interval.
The BCa interval, a = .108, agrees almost perfectly with the exact
(7.8) zo 3 1G(O)1,
interval.
where FV- is the inverse function of the standard
normal cdf. The BC method has a level endpoint makes 6 unbiased for 6.) A confidence interval is
(7.9) OBC[a] - G-1({2Z + Z(a)). desired for the scale parameter 6. In this case the BC
interval based on 6 is a definite improvement over the
Note: if G(O) = .50, that is if half of the bootstrap
standard interval (1.7), but goes only about half as far
distribution of J* is less than the observed value 0,
as it should toward achieving the asymmetry of the
then zo = 0 and OBC[a] = Op[a]. Otherwise definition
exact interval.
(7.9) makes a bias correction.
It turns out that the parametric family 0 -
Section 10.7 of Efron (1982a) shows that the BC
#(X29/19) cannot be transformed into (7.10), not even
interval for 0 is exactly correct if
approximately. The results of Efron (1982b) show that
(7.10) X -N(M - zor, - there does exist a monotone transformation g such
that X = g(O), 4 = g(6) satisfy to a high degree of
for some monotone transformation X = g(0), q = g(0)
approximation
and some constant zo. It does not look like (7.10) is
much more general than (7.6), but in fact the bias (7.14) N(O- zor, r) (To = 1 + a+ ).
correction is often important.
In the example of Table 5, the percentile method The constants in (7.14) are zo = .1082, a = .1077.
(7.2) gives central 90% interval [.536, .911] compared The BCa method (Efron, 1984), line 4 of Table 6,
to the BC interval [.488, .900] and the exact interval is a method of assigning bootstrap confidence intervals
[.496, .898]. By definition the endpoints of the exact which are exactly right for problems which can be
interval satisfy mapped into form (7.14). This method has a level
endpoint
(7.11) Probo=.49610> .7761 = .05 zo + z~
= Prob6.898$0 < .776}.
(7.15) OBCakY] G + 1 - a(zo + z(a))
The corresponding quantities for the BC endpoints
are If a = 0 then 6BC[aI = OBC[a], but otherwise the BCa
intervals can be a substantial improvement over the
(7.12) Prob=.488{ I> .7761 = .0465,
BC method as shown in Table 7.
Prob0=.go0O < .7761 = .0475, The constant zo in (7.15) is given by zo =
compared to 4r-1G(O)}, (7.8), and so can be computed directly fr
the bootstrap distribution. How do we know a? It
Prob=.53610 > .7761 = .0725, turns out that in one-parameter families fo(O), a good
Probo=.9110 < .7761 = .0293. approximation is
challenging problem of setting confidence intervals for Note: The BC intervals, line 2, are based on the parametric boot-
a parameter 0 in a multiparameter family, and also in strap distribution of 6 = Y2/Y1.
nonparametric situations where the number of nui-
sance parameters is effectively infinite.
To summarize this section, the progression from the mean vector q and covariance matrix the identity,
standard intervals to the BCa method is based on a (8.1) y - N2(1, I)-
series of increasingly less restrictive assumptions, as
shown in Table 6. Each successive method in Table 6 The parameter of interest, for which we desire a
requires the statistician to do a greater amount of confidence interval, is the ratio
computation; first the bootstrap distribution G, then
(8.2) 0 = fl2/f11 -
the bias correction constant z0, and finally the con-
stant a. However, all of these computations are algo- Fieller (1954) provided well known exact intervals for
rithmic in character, and can be carried out in an 6 in this case. The Fieller intervals are based on a
automatic fashion. clever trick, which seems very special to situation
Chapter 10 of Efron (1982a) discusses several other (8.1), (8.2).
ways of using the bootstrap to construct approximate Table 8 shows Fieller's central 90% interval for 6
confidence intervals, which will not be presented here. having observed y = (8, 4). Also shown is the Fieller
One of these methods, the "bootstrap t," was used in interval for X = 1/0 = lql/fl2, which equals [.76-1, .29-1],
the blood serum example of Section 4. the obvious transformation of the interval for 0. The
standard interval (1.7) is satisfactory for 0, but not for
8. NONPARAMETRIC AND MULTIPARAMETER 0. Notice that the standard interval does not trans-
CONFIDENCE INTERVALS form correctly from 6 to k.
Line 2 shows the BC intervals based on applying
Section 7 focused on the simple case - Jo, where
definitions (7.8) and (7.9) to the parametric bootstrap
we have only a real valued parameter 0 and a real
distribution of 0 Y2/Y1 (or X = y1/y2). This is the
valued summary statistic 0 from which we are trying
to construct a confidence interval for 0. Various fa-
distribution of 0* = Y*/y* when sampling y* =
vorable properties of the bootstrap confidence inter- (Y1, Y2*) from FNORM - N2((y1, Y2), I). The bootstrap
intervals transform correctly, and in this case they
vals were demonstrated in the simple case, but of
agree with the exact interval to three decimal places.
course the simple case is where we least need a general
method like the bootstrap.
Example 2. Product of Normal Means
Now we will discuss the more common situation
where there are nuisance parameters besides the pa- For most multiparameter situations, there do not
rameter of interest 0; or even more generally the exist exact confidence intervals for a single parameter
n6nparametric case, where the number of nuisance of interest. Suppose for instance that (8.2) is changed
parameters is effectively infinite. The discussion is to
limited to a few brief examples. Efron (1984 and 1985)
(8.3) 0= 1772,
develops the theoretical basis of bootstrap approxi-
mate confidence intervals for complicated situations, still assuming (8.1). Table 9 shows approximate inter-
and gives many more examples. The word "approxi- vals for 0, and also for / = 02, having observed y =
mate" is important here since exact nonparametric (2, 4). The "almost exact" intervals are based on an
confidence intervals do not exist for most parameters analog of Fieller's argument (Efron, 1985), which with
(see Bahadur and Savage, 1956). suitable care can be carried through to a high degree
of accuracy. Once again, the parametric BC intervals
Example 1. Ratio Estimation
are a close match to line 1. The fact that the standard
The data consists of y = (Yi, Y2), assumed to come intervals do not transform correctly is particularly
from a bivariate normal distribution with unknown obvious here.
TABLE 9
on bootstrapping 0 = t(F), which is the nonparametric
Central 90% confidence intervals for 6 = fllfl2 and 0 = 62 having
MLE of 0. In this case a good approximation to the
observed y = (2, 4), where y N2(iq, I)
constant a is given in terms of the empirical influence
For O For 0 function U?, defined in Section 10 at (10.11),
1. Almost exact [1.77, 17.03] [3.1, 290.0]
1 (U?)3
2. Parametric boot'(BC) [1.77, 17.12] [3.1, 239.1] (8.5) a 1 --= --I
3. Standard (1.7) [0.64, 15.36] [-53.7, 181.7]
The good performance of the parametric BC inter- For 6 the correlation coefficient, the values of U?
vals is not accidental. The theory developed in Efron corresponding to the 15 data points shown in Fig. 1
(1985) shows that the BC intervals, based on boot- are -1.507, .168, .273, .004, .525, -.049, -.100, .477,
strapping the MLE 0, agree to high order with the .310, .004, -.526, -.091, .434, .125, -.048. (Notice
almost exact intervals in the following class of prob- how influential law school 1 is.) Formula (8.5) gives
lems: the data y comes from a multiparameter family a - -.0817. B = 100,000 bootstrap replications,
about 100 times more than was actually necessary
of densities f,(y), both y and q k-dimensional vectors;
the real valued parameter of interest 0 is a smooth (see Section 10), gave zQ = -.0927, and the central
function of q, 0 = t(q); and the family f,(y) can be interval 0 E [.43, .92] shown in Table 5. The
90%
transformed to multivariate normality, say nonparametric BCa interval is quite reasonable in this
example, particularly considering that there is no
(8.4) g(y) - Nk(h(,q), I), guarantee that the true law school distribution F is
by some one-to-one transformations g and h. anywhere near bivariate normal.
Just as in Section 7, it is not necessary for the
Example 4. Mouse Leukemia Data
statistician to know the normalizing transformations
(the First Example in Section 3)
g and h, only that they exist. The BC intervals are
obtained directly from the original densities fr,: we find The standard central 90% interval for ,B in formula
11 = ?f(y), the MLE of q; sample y* - f,; compute (3.1) 0*, is [.835, 2.18]. The bias correction constant zo-
the bootstrap MLE of 0; calculate G, the bootstrap cdf .0275, giving BC interval [1.00, 2.39]. This is shifted
of 0*, usually by Monte Carlo sampling, and finally far right of the standard interval, reflecting the long
apply definitions (7.8) and (7.9). This process gives right tail of the bootstrap histogram seen in Fig. 3.
the same interval for 0 whether or not the transfor- We can calculate "a" from (8.5), considering each of
mation to form (8.4) has been made. the n = 42 data points to be a triple (yi, xi, bi): a-
Not all problems can be transformed as in (8.4) to -.152. Because a is negative, the BCa interval is
a normal distribution with constant covariance. The shifted back to the left, equaling [.788, 2.10]. This
case considered in Table 7 is a one-dimensional contrasts with the law school example, where a, zo,
counter example. As a result the BC intervals do not and the skewness of the bootstrap distribution added
always work as well as in Tables 8 and 9, although to each other rather than cancelling out, resulting in
they usually improve on the standard method. How- a BCa interval much different from the standard in-
ever, in order to take advantage of the BCa method, terval.
which is based on more general assumptions, we need Efron (1984) provides some theoretical support for
to be able to calculate the constant a. the nonparametric BCa method. However the problem
Efron (1984) gives expressions for "a" generalizing of setting approximate nonparametric confidence in-
(7.16) to multiparameter families, and also to non- tervals is still far from well understood, and all meth-
parametric situations. If (8.4) holds, then "a" will have ods should be interpreted with some caution. We end
value zero, and the BCa method reduces to the BC this section with a cautionary example.
case. Otherwise the two intervals differ.
Example 5. The Variance
Here we will discuss only the nonparametric situa-
tion: the observed data y = (xl, x2, ... , xn) consists of Suppose X is the real line, and 0 = VarFX, the
iid observations X1, X2, ... , - F, where F can be variance. Line 5 of Table 2 shows the result of applying
any distribution on the sample space 2; we want a the nonparametric BCa method to data sets xl, x2,
confidence interval for 0 = t(F), some real valued *--, x20 which were actually iid samples from a N(0,
functional of F; and the bootstrap interval are based 1) distribution. The number .640 for example is the
where 6 is the kurtosis of the bootstrap distribution of = (p*9, p*, * *, p*). The vector p* has a rescaled
&'*, given the data y, and E{II its expected value multinomial distribution
averaged over y. For typical situations, CV(Uf) lies
between .10 and .30. For example, if 0 = i, n = 20, p* - MUltn(n, p?)/n
Xi -fid N(0, 1), then CV(j) -.16. (10.2) (p0 = (1/n, 1/n, *.., 1/n)),
Table 10 shows CV(&B) for various values of B and
CV(GD), assuming E{&} = 0 in (9.1). For values of where the notation indicates the proportions obse
CV(Uj) > .10, there is little improvement past B = 100. from n random draws on n categories, each with
In fact B as small as 25 gives reasonable results. Even probability 1/n.
smaller values of B can be quite informative, as we For n = 3 there are 10 possible bootstrap vectors
saw in the Stanford Heart Transplant Data (Fig. 7 of p*. These are indicated in Fig. 15 along with their
Section 3). multinomial probabilities from (10.2). For example,
p* = (1/3, 0, 2/3), corresponding to x* = (x1, X3, X3) or
TABLE 10 any permutation of these values has bootstrap proba-
Coefficient of variation of aB, the bootstrap estimate of standard
bility 1/9.
error based on B Monte Carlo replications, as a function of B and To make our discussion easier suppose that the
CV(f), the limiting CV as B oo
statistic of interest 0 is of functional form: 0 = 0(F),
B--
where 8(F) is a functional assigning a real number to
25 50 100 200- X any distribution F on the sample space X. The mean,
the correlation coefficient, and the trimmed mean are
CV(&) .25 .29 .27 .26 .25 .25
all of functional form. Statistics of functional form
l .20 .24 .22 .21 .21 .20
.15 .21 .18 .17 .16 .15 have the same value as a function of F, no matter
.10 .17 .14 .12 .11 .10 what the sample size n may be, which is convenient
.05 .15 .11 .09 .07 .05 for discussing the jackknife and delta method.
0 .14 .10 .07 .05 0
For any vector p = (P1, P2, , Pn) having non-
Note: Based on (9.1), negative weights
assuming Ef&} summing
= 0.to 1, define the weighted
x3
FIG. 15. The bootstrap and jackknife sampling points in the case ___ ]~~~1/2 [ U211/2
n = 3. The bootstrap points (.) are shown with their probabilities. (10.9) .0 [n E {-(i) -())2J = 0nn-
The shortened notation 0(p) assumes that the data In other words, the jackknife estimate is itself almost
(x1, x2, *--, xn) is considered fixed. Notice that 0(p0) a bootstrap estimate applied to a linear approximation
= 0(F) is the observed value of the statistic of interest. of 0. The factor [n/(n - 1)/2 in (10.10) makes &Tj
The bootstrap estimate a, (2.3), can then be written unbiased for u2 in the case where 0 = i, the sample
mean. We could multiply the bootstrap estimate a' by
(10.5) a =[var* 0(p*)Il/,
this same factor, and achieve the same unbiasedness,
where var* indicates variance withbut there does not
respect to seem to be any consistent advantage
distri-
bution (10.2). In terms of Fig. 15, 'a is the standard to doing so. The jackknife requires n, rather than B =
deviation of the ten possible bootstrap values 9(p*) 50 to 200 resamples, at the expense of adding a linear
weighted as shown. approximation to the standard error estimate. Tables
It looks like we could always calculate 'a simply by 1 and 2 indicate that there is some estimating effi-
doing a finite sum. Unfortunately, the number of ciency lost in making this approximation. For statis-
bootstrap points is (2n-1), 77,558,710 for n = 15 so tics like the sample median which are difficult to
straightforward calculation of 'a is usually impractical. approximate linearly, the jackknife is useless (see
That is why we have emphasized Monte Carlo ap- Section 3.4 of Efron, 1982a).
prbximations to v. Therneau (1983) considers the There is a more obvious linear approximation to
question of methods more efficient than pure Monte 0(p) than Oj(p). Why not use the first-order Taylor
Carlo, but at present there is no generally better series expansion for 0(p) about the point p = p0? This
method available. is the idea of Jaeckel's infinitesimal jackknife (1972).
However, there is another approach to approximat- The Taylor series approximation turns out to be
ing (10.5). We can replace the usually complicated
O^(P) 0(P0) + (P - P
function 0(p) by an approximation linear in p, and
then use the well known formula for the multinomial where
variance of a linear function. The jackknife approxi-
mation Oj(p) is the linear function of p which matches (10.11) U? = lim 0((i - )P + e5i) -"0(Po)
0(p), (10.4), at the n points corresponding to the
deletion of a single xi from the observed data set 5, being the ith coordinate vector. This suggests the
infinitesimal jackknife estimate of standard error names for the same method. Notice that the results
reported in line 7 of Table 2 show a" severe downward
(10.12) ^ =-[var *T(p*)Il/2= [=U02/n ]
bias. Efron and Stein (1981) show that the ordinary
with var* still indicating variance under (10.2). The jackknife is always biased upward, in a sense made
ordinary jackknife can be thought of as taking e = precise in that paper. In the authors' opinion the
-1/(n - 1) in the definition of U?, while the infini- ordinary jackknife is the method of choice if one does
tesimal jackknife lets e -*0, thereby earning the name. not want to do the bootstrap computations.
The U? are values of what Mallows (1974) calls the
empirical influence function. Their definition is a ACKNOWLEDGMENT
nonparametric estimate of the true influence function This paper is based on a previous review article ap-
pearing in Behaviormetrika. The authors and Editor
IF(x) = lim 0((1 - )F + ek) - 0(F) are grateful to that journal for graciously allowing this
e--O e
revision to appear here.
Ax being the degenerate distribution putting mass 1
REFERENCES
on x. The right side of (10.12) is then the obvious
estimate of the influence function approximation ANDERSON, 0. D. (1975). Time Series Analysis and Forecasting:
The Box-Jenkins Approach. Butterworth, London.
to the standard error of 0 (Hampel, 1974), v(F)-
BAHADUR, R. and SAVAGE, L. (1956). The nonexistence of certain
[ IF2(x) dF(x)/n]1/2. The empirical influence function
statistical procedures in nonparametric problems. Ann. Math.
method and the infinitesimal jackknife give identical Statist. 27 1115-1122.
estimates of standard error. BERAN, R. (1984). Bootstrap methods in statistics. Jahrb. Math.
How have statisticians gotten along for so many Ver. 86 14-30.
BICKEL, P. J. and FREEDMAN, D. A. (1981). Some asymptotic theory
years without methods like the jackknife and the
for the bootstrap. Ann. Statist. 9 1196-1217.
bootstrap? The answer is the delta method, which is
Box, G. E. P. and Cox, D. R. (1964). An analysis of transforma-
still the most commonly used device for approximating tions. J. R. Statist. Soc. Ser. B 26 211-252.
standard errors. The method applies to statistics of BREIMAN, L. and FRIEDMAN, J. H. (1985). Estimating optimal
the form t(Q1, Q2, . , QA), where t(-, ., -. ., .) is a transformations for multiple regression and correlation. J.
Amer. Statist. Assoc. 80 580-619.
known function and each Qa is an observed average,
Cox, D. R. (1972). Regression models and life tables. J. R. Statist.
Qa = X7n=1 Qa(Xi)/n. For example, the correlation 0 is Soc. Ser. B 34 187-202.
a function of A = 5 such averages; the average of the CRAMER, H. (1946). Mathematical Methods of Statistics. Princeton
first coordinate values, the second coordinates, the University Press, Princeton, New Jersey.
first coordinates squared, the second coordinates EFRON, B. (1979a). Bootstrap methods: another look at the jack-
knife. Ann. Statist. 7 1-26.
squared, and the cross-products.
EFRON, B. (1979b). Computers and the theory of statistics: thinking
In its nonparametric formulation, the delta method
the unthinkable. Soc. Ind. Appl. Math. 21 460-480.
works by (a) expanding t in a linear Taylor series EFRON, B. (1981a). Censored data and the bootstrap. J. Amer.
about the expectations of the Qa; (b) evaluating the Statist. Assoc. 76 312-319.
standard error of the Taylor series using the usual EFRON, B. (1981b). Nonparametric estimates of standard error: the
jackknife, the bootstrap, and other resampling methods, Biom-
expressions for variances and covariances of averages;
etrika 68 589-599.
and (c) substituting -y(F) for any unknown quantity EFRON, B. (1982a). The jackknife, the bootstrap, and other resam-
'y(F) occurring in (b). For example, the nonparametric pling plans. Soc. Ind. Appl. Math. CBMS-Natl. Sci. Found.
delta method estimates the standard error of the cor- Monogr. 38.
relation 0 by EFRON, B. (1982b). Transformation theory: how normal is a one
parameter family of distributions? Ann. Statist. 10 323-339.
[ A240 + 04 + 222 4,22 4231 413 / EFRON, B. (1982c). Maximum likelihood and decision theory. Ann.
,-;- 2 A A + - +A A Statist. 10 340-356.
I4n fL20 L02 L20L02 A11 A11/02 Al11L02 J
EFRON, B. (1983). Estimating the error rate of a prediction rule:
improvements in cross-validation. J. Amer. Statist. Assoc. 78
where, in terms of xi = (yi, zi),
316-331.
-Lg z (Y -) g -)h
EFRON, B. (1984). Better bootstrap confidence intervals. Tech. Rep.
Stanford Univ. Dept. Statist.
(Cramer (1946), p. 359).
EFRON, B. (1985). Bootstrap confidence intervals for a class of
parametric problems. Biometrika 72 45-58.
EFRON, B. and GONG, G. (1983). A leisurely look at the bootstrap,
THEOREM. For statistics of
the jackknife, and the form
cross-validation. 0 37=
Amer. Statistician t(Q
QA), the nonparametric delta
36-48. method and the inf
imal jackknife give the same estimate of standard error EFRON, B. and STEIN, C. (1981). The jackknife estimate of variance.
Ann. Statist. 9 586-596.
(Efron, 1982c).
FIELLER, E. C. (1954). Some problems in interval estimation. J. R.
Statist. Soc. Ser. B 16 175-183.
The infinitesimal jackknife, the delta method, and FRIEDMAN, J. H. and STUETZLE, W. (1981). Projection pursuit
the empirical influence function approach are three regression. J. Amer. Statist. Assoc. 76 817-823.
FRIEDMAN, J. H. and TIBSHIRANI, R. J. (1984). The monotone MALLOWS, C. (1974). On some topics in robustness. Memorandum,
smoothing of scatter-plots. Technometrics 26 243-250. Bell Laboratories, Murray Hill, New Jersey.
HAMPEL, F. R. (1974). The influence curve and its role in robust MILLER, R. G. (1974). The jackknife-a review. Biometrika 61
estimation. J. Amer. Statist. Assoc. 69 383-393. 1-17.
HASTIE, T. J. and TIBSHIRANI, R. J. (1985). Discussion of Peter MILLER, R. G. and HALPERN, J. (1982). Regression with censored
Huber's "Projection Pursuit." Ann. Statist. 13 502-508. data. Biometrika 69 521-531.
HINKLEY, D. V. (1978). Improving the jackknife with special refer- RAO, C. R. (1973). Linear Statistical Inference and Its Applications.
ence to correlation estimation. Biometrika 65, 13-22. Wiley, New York.
HYDE, J. (1980). Survival Analysis with Incomplete Observations. SCHENKER, N. (1985). Qualms about bootstrap confidence intervals.
Biostatistics Casebook. Wiley, New York. J. Amer. Statist. Assoc. 80 360-361.
JAECKEL, L. (1972). The infinitesimal jackknife. Memorandum MM SINGH, K. (1981). On the asymptotic accuracy of Efron's bootstrap.
72-1215-11. Bell Laboratories, Murray Hill, New Jersey. Ann. Statist. 9 1187-1195.
JOHNSON, N. and KOTZ, S. (1970). Continuous Univariate Distri- THERNEAU, T. (1983). Variance reduction techniques for the boot-
butions. Houghton Mifflin, Boston, Vol. 2. strap. Ph.D. thesis, Stanford University, Department of Statis-
KAPLAN, E. L. and MEIER, P. (1958). Nonparametric estimation tics.
from incomplete samples. J. Amer. Statist. Assoc. 53 457-481. TIBSHIRANI, R. J. and HASTIE, T. J. (1984). Local likelihood esti-
KIEFER, J. and WOLFOWITZ, J. (1956). Consistency of the maximum mation. Tech. Rep. Stanford Univ. Dept. Statist. 97.
likelihood estimator in the presence of infinitely many inciden- TUKEY, J. (1958). Bias and confidence in not quite large samples,
tal parameters. Ann. Math. Statist. 27 887-906. abstract. Ann. Math. Statist. 29 614.
Comment
J. A. Hartigan
Efron and Tibshirani are to be congratulated on a empirical distribution, and suppose that t(Fn) is an
wide-ranging persuasive survey of the many uses of estimate of some population parameter t(F). The sta-
the boostrap technology. They are a bit cagey on what tistic t(F,) is computed for several random subsamples
is or is not a bootstrap, but the description at the end (each observation appearing in the subsample with
of Section 4 seems to cover all the cases; some data y probability 1/2), and the set of t (Fn) values obtained is
comes from an unknown probability distribution F; it regarded as a sample from the posterior distribution
is desired to estimate the distribution of some function of t(F). For example, the standard deviation of the
R(y, F) given F; and this is done by estimating the t(F,) is an estimate of the standard error of t(FJ)
distribution of R (y*, F) given F where F is an estimate from t(F); however, the procedure is not restricted to
of F based on y, and y* is sampled from the known F. real valued t.
There will be three problems in any application of The procedure seems to work not too badly in
the bootstrap: (1) how to choose the estimate F? getting at the first- and second-order behaviors of
(2) how much sampling of y* from F? and (3) how t(Fn) when t(Fn) is near normal, but it not effective
close is the distribution of R(y*, F) given F to in handling third-order behavior, bias, and skewness.
R(y, F) given F? Thus there is not much point in taking huge samples
Efron and Tibshirani suggest a variety of estimates t(F,) since the third-order behavior is not relevant;
F for simple random sampling, regression, and auto- and if the procedure works only for t(Fn) near normal,
regression; their remarks about (3) are confined there are less fancy procedures for estimating standard
mainly to empirical demonstrations of the bootstrap error such as dividing the sample up into 10 subsam-
in specific situations. ples of equal size and computing their standard devia-
I have some general reservations about the boot- tion. (True, this introduces more bias than having
strap based on my experiences with subsampling tech- random subsamples each containing about half the
niques (Hartigan, 1969, 1975). Let X1, ..., X, be a observations.) Indeed, even if t(Fn) is not normal, we
random sample from a distribution F, let F, be the can obtain exact confidence intervals for the median
of t(Fn110) using the 10 subsamples. Even five sub-
J. A. Hartigan is Eugene Higgins Professor of Statistics, samples will give a respectable idea of the standard
Yale University, Box 2179 Yale Station, New Haven, error.
CT 06520. Transferring back to the bootstrap: (A) is the boot-