Bootstrap Sampling and Estimation - Stata

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Products Purchase Learn Support Company    

   »  Home »  Products »  Features »  Bootstrap sampling & estimation

Bootstrap sampling and estimation Order Stata

Bootstrap of Stata commands


Boostrap of community-contributed programs
Standard errors and bias estimation

Stata’s programmability makes performing bootstrap sampling and estimation possible (see Efron 1979, 1982; Efron and Tibshirani 1993; Mooney and
Duval 1993). We provide two options to simplify bootstrap estimation. bsample draws a sample with replacement from a dataset. bsample may be used
in community-contributed programs.

It is easier, however, to perform bootstrap estimation using the bootstrap pre x. bootstrap allows the user to supply an expression that is a function of the
stored results of existing commands, or you can write a program to calculate the statistics of interest. bootstrap then can repeatedly draw a sample with
replacement, run the community-contributed program, collect the results into a new dataset, and present the results. The community-contributed
calculation program is easy to write because every Stata command saves the statistics it calculates.

For instance, assume that we wish to obtain the bootstrap estimate of the standard error of the median of a variable called mpg. Stata's feature calculates
and displays summary statistics with summarize; it calculates means, standard deviations, skewness, kurtosis, and various percentiles. Among those
percentiles is the 50th percentile—the median. In addition to displaying the calculated results, summarize stores them, and looking in the manual, we
discover that the median is stored in r(p50). To get a bootstrap estimate of its standard error, all we need to do is type

. bootstrap r(p50), reps(1000): summarize mpg, detail

and bootstrap will do all the work for us. We'll also specify a seed() option so that you can reproduce our results.

. webuse auto
(1978 Automobile Data)
. bootstrap r(p50), reps(1000) seed(1234): summarize mpg, detail
(running summarize on estimation sample)

Warning: Because summarize is not an estimation command or does not set


e(sample), bootstrap has no way to determine which observations are
used in calculating the statistics and so assumes that all
observations are used. This means that no observations will be
excluded from the resampling because of missing values or other
reasons.
If the assumption is not true, press Break, save the data, and drop
the observations that are to be excluded. Be sure that the dataset
in memory contains only the relevant data.
Bootstrap replications (1000)
(output omitted)

Bootstrap results Number of obs = 74


Replications = 1,000
command: summarize mpg, detail
_bs_1: r(p50)

Observed Bootstrap Normal-based


Coef. Std. Err. z P>|z| [95% Conf. Interval]

_bs_1 20 .9584585 20.87 0.000 18.12146 21.87854

Use estat bootstrap to report a table with alternative con dence intervals and an estimate of bias.

. estat bootstrap, all


Bootstrap results Number of obs = 74
Replications = 1000
command: summarize mpg, detail
_bs_1: r(p50)

Observed Bootstrap
Coef. Bias Std. Err. [95% Conf. Interval]
_bs_1 20 .174 .95845847 18.12146 21.87854 (N)
19 22 (P)
19 22 (BC)

(N) normal confidence interval


(P) percentile confidence interval
(BC) bias-corrected confidence interval

For an example of when we would need to write a program, consider the case of bootstrapping the ratio of two means.

We rst de ne the calculation routine, which we can name whatever we wish,

program myratio, rclass


version 15
summarize length
local length = r(mean)
summarize turn
local turn = r(mean)
return scalar ratio = `length'/`turn'
end

Our program calls summarize and stores the mean of the variable length in a local macro. The program then repeats this procedure for the second
variable turn. Finally, the ratio of the two means is computed and returned by our program in the stored result we call r(ratio).

With our program written, we can now obtain the bootstrap estimate by simply typing

. bootstrap r(ratio), reps(#): myratio

This means that we will execute bootstrap with our myratio program for # replications. Below we request 1,000 replications and specify a random-number
seed so you can reproduce our results:

. bootstrap r(ratio), reps(1000) seed(4567): myratio


(running myratio on estimation sample)

Warning: Because myratio is not an estimation command or does not set


e(sample), bootstrap has no way to determine which observations are
used in calculating the statistics and so assumes that all
observations are used. This means that no observations will be
excluded from the resampling because of missing values or other
reasons.
If the assumption is not true, press Break, save the data, and drop
the observations that are to be excluded. Be sure that the dataset
in memory contains only the relevant data.
Bootstrap replications (1000)
(output omitted)
Bootstrap results Number of obs = 74
Replications = 1,000
command: myratio
_bs_1: r(ratio)

Observed Bootstrap Normal-based


Coef. Std. Err. z P>|z| [95% Conf. Interval]

_bs_1 4.739945 .0330492 143.42 0.000 4.67517 4.804721

The ratio, calculated over the original sample, is 4.739945; the bootstrap estimate of the standard error of the ratio is 0.0344786. Had we wanted to keep
the 1,000-observation dataset of bootstrapped results for subsequent analysis, we would have typed

. bootstrap r(ratio), reps(1000) seed(4567) saving(mydata): myratio

bootstrap can be used with any Stata estimator or calculation command and even with community-contributed calculation commands.

We have found bootstrap particularly useful in obtaining estimates of the standard errors of quantile-regression coe cients. Stata performs quantile
regression and obtains the standard errors using the method suggested by Koenker and Bassett (1978, 1982). Rogers (1992) reports that these standard
errors are satisfactory in the homoskedastic case but that they appear to be understated in the presence of heteroskedastic errors. One alternative is to
bootstrap the estimated coe cients to obtain the standard errors. For instance, say that you wish to estimate a median regression of price on variables
weight, length, and foreign. Typing qreg price weight length foreign will produce the estimates along with Koenker–Bassett standard errors. To obtain
bootstrap standard errors, we could issue the command

. bootstrap, reps(#): qreg price weight length foreign

We recommend this procedure so highly that Gould (1992) wrote a new command in Stata’s programming language to further automate this procedure for
quantile regression. Typing bsqreg price weight length foreign will also produce the bootstrapped results.

References
Efron, B. 1979.
Bootstrap methods: another look at the jackknife. Annals of Statistics 7: 1–26.

------. 1982.
The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.

Efron, B. and R. J. Tibshirani. 1993.


An Introduction to the Bootstrap. New York: Chapman & Hall.

Gould, W. 1992.
sg11.1: Quantile regression with bootstrapped standard errors. Stata Technical Bulletin 9: 19–21. Reprinted in Stata Technical Bulletic Reprints, vol.
2, pp. 137–139.

Koenker, R., and G. Bassett, Jr. 1978.


Asymptotic theory of least absolute error regression. Journal of the American Statistical Association 73: 618–622.

------. 1982.
Robust tests for heteroskedasticity based on regression quantiles. Econometrica 50: 43–61.

Mooney, C. Z., and R. D. Duval. 1993.


Bootstrapping: A Nonparametric Approach to Statistical Inference.
Inference Newbury Park, CA: Sage.

Rogers, W. H. 1992.
sg11: Quantile regression standard errors. Stata Technical Bulletin 9: 16–19. Reprinted in Stata Technical Bulletin Reprints, vol. 2, pp. 133–137.

Stata Shop Support Company

 New in Stata  Order Stata  Training  Contact us


 Why Stata?  Bookstore  Video tutorials  Customer service
 All features  Stata Press books  FAQs  Announcements
 Features by disciplines  Stata Journal  Statalist: The Stata Forum  Search
 Stata/MP  Gift Shop  Resources
 Which Stata is right for me?  Technical support
 Order Stata  Customer service

© Copyright 1996–2019 StataCorp LLC   •   Terms of use   •   Privacy   •   Contact us

You might also like