Professional Documents
Culture Documents
Bootstrap Confidence Interval
Bootstrap Confidence Interval
confidence intervals
(Instructor : Nishant Panda)
Additional References
1. (IB) : An Introduction to the Bootstrap, Efron and Tibshirani, Chapman & Hall/CRC
Introduction
Last lecture we used the bootstrap technique to estimate the standard error of an estimator. In this lecture,
we will go further and see how we can use bootstrap to estimate bias and construct confidence intervals. The
following notation from last lecture is written down for completeness.
Let X F be a random variable (could be multidimensional!) whose c.d.f is given by F (x). Let us denote
the expectation and variance of X by F and F2 to emphasize the distribution. These are all parameters of
the distribution F . In general, we denote a parameter as a function the distribution F , denoted as = T (F )
or F for short . Let X1 , X2 , . . . , Xn be a random sample of size n of X. An estimator , b is a function of
the sample, i.e = h(X1 , X2 , . . . , Xn ). For example, if F = F , then = X is an estimator for the mean.
b b
Similarly, if F = F2 , then b = S 2 is an estimator for the variance. Let Fb be an estimated c.d.f, then bFb is
the plug-in estimator for .
As usual if F = F2 , then S 2 is an estimator for the variance given by
n
1 X
b = S 2 = (Xi X)2 .
n 1 i=1
h i
b ) = EF b F .
F (,
As you must now have guessed, the Bootstrap estimate of the bias is given by the plug-in estimator,
h i
b ) = E b c
Fb(, Fb F
b
R
h i 1 X\
EFb b (xj )
R j=1
where the notation is from last lecture. Thus, Bootstrap bias estimate of b is
1
R
b ) 1
X
\
Fb(, (xj ) F
c.
R j=1 b
2
}
b
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = obs.data, statistic = sample.var, R = R)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 5.976929 -0.1527511 1.195859
The Bias seems different!. The boot pacakage seems to be computing bias using
h i
b ) = E b b
Fb(, Fb
That is it does not use the plug-in estimator in the last term but uses the original estimator. If b was a plug-in
estimator the boot package would have given the same answer. (There is no way for a code to know what is
the plug-in estimator for a general estimator. It can only take the estimator you have as an argument.)
# Compute the bias as the difference between
# bootstrap expectation and original estimator
bias.boot.new = mu.Fhat - var(obs.data)
print(bias.boot.new)
## [1] -0.187678
This should not be a problem when the size of your observed data is large but it is something to keep in mind.
(Home Assignment!) For the obs.data here, assume that your estimator is not S 2 but in-fact
the plug-in estimator
n
2 1X
b = (Xi X)2 .
F n i=1
That is, you are estimating 2 using 2 . Mimic the code above and compute the Standard
F
b
Error and Bias of 2 . Also check with boot package.
Fb
3
where, [] denotes the greatest integer function. Then, to compute the (1 ) confidence interval,
\
1. Order the bootstrap estimates {(x
j )} in increasing order
d
(1) , (2) , . . . , (R)
d d
Studentized Bootstrap
The percentile Bootstrap can get narrow in practice (undercoverage). The studentized Bootstrap (also known
as Bootstrap-t) is a technique that prevents this flaw in the percentile Bootstrap. First some theory. For
some unbiased estimator b for if C.L.T holds, then
b
Z= AN (0, 1)
SE
But SE is typically not known. Student found that if b = X for data drawn from normal distribution, then
b
T = tn1
SE
d
4
For any arbitrary estimator ,
b this may not be true. Bootstrap-t method tries to get an approximation of T
directly from data using the fact that b is distributionally closer to b ,
b where b is the bootstrap
distribution of i.e
b
b b
T
d
SE
d is the plug-in estimate of SE . Here is an algorithm to get the distribution of Z.
where, SE
1. Get a bootstrap sample x = (x1 , x2 , . . . xn ). Suppose (and this is a big assumption) you can estimate
the SE of b for this sample x , then
\
(x )
b
T (x ) =
d )
SE(x
Now,
h i h i2
varFc b = EFc b EFc b
h i
c ! If b is complicated, we need to use Bootstrap to estimate var
Note the F b
Fc ! This is double bootstrap.
In order to get the distribution of a statistic we take bootstrap samples from the data. In order to
get the distribution of the bootstrap statistic we take bootstrap samples from the bootstrap data!
Population : c.d.f is F , parameter is F
You sample from F , you get data!
Data : approx c.d.f is Fb, estimator is b
You sample from Fb you get the Bootstrap sample
Bootstrap sample : approx c.d.f is Fb , Bootstrap estimator is b
1. standard errorSE.
q
SE(X) = varF X
5
, we get
F
SE(X) = .
n
2. the plug-in estimator c
b By definition, plug-in estimator is the parameter evaluated w.r.t F .
F
b
c
b = Fb = Fb
F
Thus,
n
1 X 2
varFb X = 2 Xi X
n i=1
s
Pn 2
Xi X
i=1
plug-in SE
d=
n
4. Bootstrap estimator b Let X1 , X2 , . . . , Xn be a bootstrap sample and let F
c be the emperical c.d.f of
the bootstrap sample. Then, if = h(X1 , X2 , . . . , Xn ), then = h(X1 , X2 , . . . , Xn ). Thus,
b b
n
1X
b = X = X
n i=1 i
5. Convince yourself that s
n 2
Xi X
P
i=1
d =
plug-in SE
n
(Home Assignment!): With the same observed data obs.data as in the examples, say you are
now interested in the mean and your estimator is X. Get the studentized bootstrap confidence
interval for this estimator. You will need Example 3 i.e you dont need to do double bootstrap.
In the next Lecture we will see how to implement double bootstrap to construct a studentized bootstrap
confidence interval for S 2 .