Professional Documents
Culture Documents
Bootstrap
Bootstrap
Bootstrap
Douglas Curran-Everett
Advan in Physiol Edu 33:286-292, 2009. doi:10.1152/advan.00062.2009
Updated information and services including high resolution figures, can be found at:
http://advan.physiology.org/content/33/4/286.full.html
Additional material and information about Advances in Physiology Education can be found at:
http://www.the-aps.org/publications/advan
Advances in Physiology Education is dedicated to the improvement of teaching and learning physiology, both in specialized
courses and in the broader context of general biology education. It is published four times a year in March, June, September and
December by the American Physiological Society, 9650 Rockville Pike, Bethesda MD 20814-3991. Copyright © 2009 by the
American Physiological Society. ISSN: 1043-4046, ESSN: 1522-1229. Visit our website at http://www.the-aps.org/.
Adv Physiol Educ 33: 286–292, 2009;
Staying Current doi:10.1152/advan.00062.2009.
Curran-Everett D. Explorations in statistics: the bootstrap. Adv Boot.R4 to your Advances folder and install the extra pack-
Physiol Educ 33: 286–292, 2009; doi:10.1152/advan.00062.2009.— age boot.
Learning about statistics is a lot like learning about science: the To install boot, open R and then click Packages | Install
learning is more meaningful if you can actively explore. This fourth package(s) . . . 5 Select a CRAN mirror6 close to your location
installment of Explorations in Statistics explores the bootstrap. The and then click OK. Select boot and then click OK. When you
bootstrap gives us an empirical approach to estimate the theoretical
have installed boot, you will see
variability among possible values of a sample statistic such as the
sample mean. The appeal of the bootstrap is that we can use it to make package ‘boot’ successfully unpacked and MD5 sums checked
an inference about some experimental result when the statistical
theory is uncertain or even unknown. We can also use the bootstrap to in the R Console.
THIS FOURTH PAPER in Explorations in Statistics (see Refs. 3–5) The Simulation: Data for the Bootstrap
explores the bootstrap, a recent development in statistics (8, In our early explorations (3–5) we drew random samples of
11–16) that evolved from the jackknife (25, 28, 36). Despite its 9 observations from a standard normal distribution with mean
brief history, the bootstrap is discussed in textbooks of statis- ⫽ 0 and standard deviation ⫽ 1. These were the obser-
tics (26) and used in manuscripts published by the American vations–the data–for samples 1, 2, and 1000:
Physiological Society (2, 10, 17–24, 35, 37).
⬎ # Sample Observations
The bootstrap1 gives us an empirical approach to estimate
[1] 0.422 1.103 1.006 1.034 0.285 ⫺0.647 1.235 0.912 1.825
the theoretical variability among possible values of a sample [2] 0.154 ⫺0.654 ⫺0.147 1.715 0.720 0.804 0.256 1.155 0.646
interval for the population mean. This too was grounded in [,1]
1
[,2]
0.797
[,3]
0.702
[,4]
0.234
[,5]
3.407
[,6]
0.362
[,7]
1.232
theory. The beauty of the bootstrap is that it can transcend 2
:
0.517 0.707 0.236 2.193 0.079 0.955
theory: we can use the bootstrap to make an inference about 1000 0.233 0.975 0.325 0.718 ⫺0.371 0.838
some experimental result when the theory is uncertain or even In contrast to our early explorations that used 1000 samples
unknown (14, 26). We can also use the bootstrap to assess how from our standard normal population, our exploration of the
well the theory holds: that is, whether an inference we make bootstrap uses just the 9 observations from sample 1:
from a hypothesis test or confidence interval is justified.
0.422 1.103 1.006 1.034 0.285 ⫺0.647 1.235 0.912 1.825
Table 1. Illustration of bootstrap samples We use the notation y *j to represent the mean of bootstrap
sample j (11, 16).
Bootstrap Sample j Suppose we generate 10,000 bootstrap replications of the
y 1 2 … b sample mean (Fig. 1). If we treat these 10,000 sample means as
observations, we can calculate their average and standard
0.422 ⫺0.647 0.285 … 1.034
1.103 1.285 1.235 0.285
deviation:
1.006 0.912 1.825 1.103
1.034 0.912 1.034 0.912 Ave兵y *其 ⫽ 0.798 and SD兵y *其 ⫽ 0.219 .
0.285 1.825 1.006 1.103
⫺0.647 1.825 1.103 0.422 The standard deviation SD{y *} describes the variability among
1.235 0.285 0.912 ⫺0.647 the b bootstrap means and estimates the standard deviation
0.912 1.006 1.103 0.912 of the theoretical distribution of the sample mean (16). The
1.825 ⫺0.647 0.422 1.825 commands in lines 85– 86 of Advances_Statistics_Code_
0.797 0.640 0.992 0.772
Boot.R return these values. Your values will differ slightly.
y y *1 y *2 y *b We can do more than just calculate Ave{y *} and SD{y *}:
we can assess whether the distribution of these 10,000 boot-
y, Actual observations from sample 1. Because the bootstrap procedure samples strap means is consistent with a normal distribution. How? By
with replacement, an original observation can appear more than once in a bootstrap
sample. The bootstrap sample means y *1, y *2, . . . , y *b cluster around 0.797, the using a normal quantile plot (Fig. 2). It turns out that these
observed sample mean y. The commands in lines 40–49 of Advances_Statistics_ 10,000 bootstrap means are not consistent with a normal
a* ⫽ z ␣ /2 䡠 SD兵y *其 .
Suppose we want to calculate a 90% bootstrap confidence
interval for the population mean. In this situation, ␣ ⫽ 0.10 and
z␣/2 ⫽ 1.645. Therefore, the allowance a* is
Bias-corrected-and-accelerated confidence interval. If the What happens, however, if we doubt that the theoretical
number of observations in the actual sample is too small, if the distribution of the sample mean is consistent with a normal
average Ave{y *} of the bootstrap sample means differs from distribution? Our suspicion would be natural: this distribution
the sample mean y , or if the bootstrap distribution of the sample is, after all, a theoretical one. One approach is to transform the
mean is skewed, then normal-theory and percentile confidence sample observations. Common transformations include the
intervals are likely to be inaccurate (16, 26). A bias-corrected-and- logarithm and the inverse. Box and Cox (1) described a family
accelerated confidence interval adjusts percentiles of the bootstrap of power transformations in which an observed variable y is
distribution to account for bias and skewness (16, 26). In this explo- transformed into the variable w using the parameter :
再 共yln y⫺ 1兲/
ration, the 90% bias-corrected-and-accelerated confidence interval is
for ⫽ 0 , and
关0.376, 1.113兴 ⬟ 关0.38, 1.11兴 . w⫽ for ⫽ 0 .
The commands in lines 196 –199 of Advances_Statistics_ The logarithmic, ⫽ 0, and inverse, ⫽ ⫺1, transformations
Code_Boot.R return these values. This confidence interval is belong to this family.
shifted slightly to the left of the percentile interval of [0.43, Draper and Smith (9) summarized the steps needed to
1.14]. In many situations, a bias-corrected-and-accelerated estimate and its approximate 100(1 ⫺ ␣)% confidence
confidence interval will provide a more accurate estimate of interval. These steps include, for each trial value of , the
our uncertainty about the true value of a population parameter calculation of the maximum likelihood ᐉmax:
such as the mean (16, 26).
冘
salvage the statistical analysis of a small sample. Why not? If the ᐉmax ⫽ ⫺ n ln 共SSresidual/n兲 ⫹ 共 ⫺ 1兲 ln yi , (1)
2 i⫽1
sample is too small, then it may be atypical of the underlying
population. When this happens, the bootstrap distribution will not where SSresidual is the residual sum of squares in the fit of a
mirror the theoretical distribution of the sample statistic. The general linear model to the observations. The optimum esti-
trouble is, it may not be obvious how small too small is. A statistician mate of maximizes ᐉmax (Fig. 5).
(see Ref. 6, guideline 1) and an estimate of power can help. Without question, transformation can be useful (1, 9). But
In our second and third explorations (4, 5) we concluded that really, who wants to identify the optimum transformation by
the observations from sample 1 were consistent with having solving Eq. 1? The bootstrap provides another way.
come from a population that had a mean other than 0. Only The textbook I use in my statistics course provides the data:
because we had defined the underlying population (see Ref. 3) measurements of C-reactive protein (Table 2). Problem 7.26 (Ref.
could we have known that we had erred. When we do a single 26, p. 442– 443) asks students to study the distribution of
experiment, we can have enough unusual observations so that these 40 observations and to calculate a 95% confidence
it just appears the observations came from a different popula- interval for the true C-reactive protein mean. The real value
tion (5). A small sample size exacerbates the potential for this of problem 7.26 is that it then asks students if a confidence
phenomenon. This is what happened when we bootstrapped the interval is even appropriate for these data.
sample mean using observations from the first sample. Cursory examinations of the histogram and normal quantile
We know the theoretical distribution of the sample mean is plot reveal that the C-reactive protein values are skewed and
exactly normal (3), but the distribution of those bootstrap means inconsistent with a normal distribution (Fig. 6, top). Still, if the
is clearly not normal (see Fig. 3). This happens because the theoretical distribution of the sample mean is roughly normal–if
observations from the first sample are atypical of the underlying
population. The message? All bets are off if you bootstrap a
sample statistic using observations from a sample that is too small.
The Bootstrap in Data Transformation
In our previous exploration (4) we delved into confidence
intervals by drawing random samples from a normal distribu-
tion. That we chose a normal distribution for our population
was no accident. For normal-theory confidence intervals to be
meaningful, one thing must be approximately true: if the
random variable Y represents the physiological thing we care
about, then the theoretical distribution of the sample mean Y
with n observations must be distributed normally with mean
and standard deviation /公n.9 In our simulations (3–5) this
assumption was satisfied exactly (7). In a real experiment we
never know if this assumption is satisfied, but we know it will
be satisfied at least roughly–regardless of the population dis-
tribution from which the sample observations came–as long as Fig. 5. Maximum likelihood approach to data transformation. For each value
the sample size n is big enough (27). of , the maximum likelihood ᐉmax is calculated according to Eq. 1. For the
C-reactive protein values (see Table 2), the maximum likelihood estimate of
that maximizes ᐉmax is ⫺0.1, and an approximate 95% confidence interval for
9
In our second exploration (5) we used the test statistic t to investigate is [⫺0.4, ⫹0.1]. Because this interval includes 0, a log transformation of the
hypothesis tests, test statistics, and P values. A t statistic shares this assumption. C-reactive protein values is reasonable.
Table 2. C-reactive protein data C-reactive protein observations by adding 1 to each observa-
tion and taking the logarithm of that number:
Observations, n ⫽ 40
y w y w y w y w
w ⫽ ln共y ⫹ 1兲 .
0 0 73.20 4.307 0 0 15.74 2.818 Problem 7.27 then asks students to study the distribution of the
3.90 1.589 0 0 0 0 0 0 40 transformed observations and to calculate a 95% confidence
5.64 1.893 46.70 3.865 4.81 1.760 0 0 interval for the true transformed C-reactive protein mean.
8.22 2.221 0 0 9.57 2.358 0 0 A histogram and normal quantile plot show that the trans-
0 0 0 0 5.36 1.850 0 0
5.62 1.890 26.41 3.311 0 0 9.37 2.339 formed values are less skewed but still inconsistent with a
3.92 1.593 22.82 3.171 5.66 1.896 20.78 3.081 normal distribution (Fig. 6, bottom). As my students tell me,
6.81 2.055 0 0 0 0 7.10 2.092 “Better, but not great.” The 95% confidence interval [1.05,
0 0 30.61 3.453 59.76 4.107 7.89 2.185 1.94] reverts to
0 0 3.49 1.502 12.38 2.594 5.53 1.876
y, Actual C-reactive protein observations (in mg/l) (26). The average, standard 关e 1.05 ⫺ 1, e 1.94 ⫺ 1兴 ⬟ 关1.86, 5.96兴 mg/l .
deviation, and standard error of these observations are Ave{y} ⫽ 10.03, SD{y} ⫽
16.56, and SE{y} ⫽ 2.62. The transformed observations, w, result from ln (y ⫹ 1). The At this point, we have another problem: we have assumed
average, standard deviation, and standard error of the transformed observations are the log transformation ln (y ⫹ 1) is useful–that it gives us a
Ave{w} ⫽ 1.495, SD{w} ⫽ 1.391, and SE{w } ⫽ 0.220. For each set of observations, meaningful confidence interval– but what evidence do we
if we want to calculate a 95% confidence interval for the true mean, then ␣ ⫽ 0.05, the have that it really is? You guessed it: the bootstrap.
Fig. 7. Bootstrap distributions (left) and normal quantile plots (right) of 10,000
sample means, each with 40 observations. The bootstrap replications were drawn
at random from the actual or transformed C-reactive protein observations in Table
Fig. 6. Distributions (left) and normal quantile plots (right) of the actual and 2. The bootstrap sample means from the actual observations are inconsistent with
transformed C-reactive protein observations. The transformation changed the a normal distribution. In contrast, the bootstrap sample means from the trans-
distribution of the observations, but the transformed values remain inconsistent formed observations are consistent with a normal distribution. The commands in
with a normal distribution. The commands in lines 232–252 of Advances_ lines 259 –279 of Advances_Statistics_Code_Boot.R create this data graphic. To
Statistics_Code_Boot.R create this data graphic. To generate this data graphic, generate this data graphic, highlight and submit the lines of code from Figure 7:
highlight and submit the lines of code from Figure 6: first line to Figure 6: last line. first line to Figure 7: last line.
Summary I was four or five miles from the earth at least, when it
broke; I fell to the ground with such amazing violence, that
As this exploration has demonstrated, the bootstrap gives us I found myself stunned, and in a hole nine fathoms deep
an approach we can use to assess whether an inference we at least, made by the weight of my body falling from so great
make from a normal-theory hypothesis test or confidence a height: I recovered, but knew not how to get out again;
interval is justified: if the bootstrap distribution of a statistic however, I dug slopes or steps with my [finger] nails (the
Baron’s nails were then of forty years’ growth), and easily
such as the sample mean is roughly normal, then the accomplished it.
inference is justified. But the bootstrap gives us more than Refs. 31 (1786) and 32 (2001)
that. We can use a bootstrap confidence interval to make an
It was from this 9-fathom-deep hole that the Baron is rumored to have
inference about some experimental result when the statisti-
extricated himself by his bootstraps.
cal theory is uncertain or even unknown (14, 26). Although Although the notion of pulling yourself up by your bootstraps is
we have explored the bootstrap using the sample mean, we entirely consistent with Baron Munchausen’s flair for the dramatic, I
can use the bootstrap to make an inference about other failed to find any evidence that the Baron availed himself of this
sample statistics such as the standard deviation or correla- life-saving technique.
tion coefficient.
In the next installment of this series, we will explore power, ACKNOWLEDGMENTS
a concept we mentioned in our exploration of hypothesis tests I thank John Ludbrook (Department of Surgery, The University of Mel-
(5). Power is the probability that we reject the null hypothesis bourne, Victoria, Australia) and Matthew Strand (National Jewish Health,
20. Grenier AL, Abu-ihweij K, Zhang G, Ruppert SM, Boohaker R, 29. R Development Core Team. R: a Language and Environment for
Slepkov ER, Pridemore K, Ren JJ, Fliegel L, Khaled AR. Apoptosis- Statistical Computing. Vienna, Austria: R Foundation for Statistical Com-
induced alkalinization by the Na⫹/H⫹ exchanger isoform 1 is mediated puting, 2008; http://www.R-project.org.
through phosphorylation of amino acids Ser726 and Ser729. Am J Physiol 30. Raspe RE. Baron Munchausen’s Narrative of his Marvellous Travels and
Cell Physiol 295: C883–C896, 2008. Campaigns in Russia. Oxford: Smith, 1785, p. 45.
21. Kelley GA, Kelley KS, Tran ZV. Exercise and bone mineral density in 31. Raspe RE. Gulliver Revived; or The Singular Travels, Campaigns,
men: a meta-analysis. J Appl Physiol 88: 1730 –1736, 2000. Voyages, and Adventures of Baron Munikhouson, commonly called Mun-
22. Kristensen M, Hansen T. Statistical analyses of repeated measures in chausen. London: Kearsley, 1786, p. 45.
physiological research: a tutorial. Adv Physiol Educ 28: 2–14, 2004. 32. Raspe RE. The Surprising Adventures of Baron Munchausen. Rockville,
23. Lemley KV, Boothroyd DB, Blouch KL, Nelson RG, Jones LI, Olshen MD: Wildside, 2001, p. 58 –59.
33. Raspe RE. Singular Travels, Campaigns and Adventures of Baron Mun-
RA, Myers BD. Modeling GFR trajectories in diabetic nephropathy. Am J
chausen, edited by Carswell JP, others. London: Cresset, 1948, p. 22.
Physiol Renal Physiol 289: F863–F870, 2005.
34. Raspe RE. The Singular Adventures of Baron Munchausen, edited by
24. Lott MEJ, Hogeman C, Herr M, Bhagat M, Sinoway LI. Sex differ- Carswell JP, others. New York: Heritage, 1952, p. 18.
ences in limb vasoconstriction responses to increases in transmural pres- 35. Rozengurt N, Wu SV, Chen MC, Huang C, Sternini C, Rozengurt E.
sures. Am J Physiol Heart Circ Physiol 296: H186 –H194, 2009. Colocalization of the ␣-subunit of gustducin with PYY and GLP-l in L
25. Miller RG. The jackknife–a review. Biometrika 61: 1–15, 1974. cells of human colon. Am J Physiol Gastrointest Liver Physiol 291:
26. Moore DS, McCabe GP, Craig BA. Introduction to the Practice of G792–G802, 2006.
Statistics (6th ed.). New York: Freeman, 2009. 36. Tukey J. Bias and confidence in not-quite large samples (abstract). Ann
27. Moses LE. Think and Explain with Statistics. Reading, MA: Addison- Math Statistics 29: 614, 1958.
Wesley, 1986. 37. Yoshimura H, Jones KA, Perkins WJ, Kai T, Warner DO. Calcium
28. Quenouille M. Approximate tests of correlation in time-series. J R Stat sensitization produced by G protein activation in airway smooth muscle.