Professional Documents
Culture Documents
03 06 Resampledinference
03 06 Resampledinference
03 06 Resampledinference
Statistical Inference
The jackknife
The jackknife is a tool for estimating standard errors and the bias of estimators
As its name suggests, the jackknife is a small, handy tool; in contrast to the bootstrap, which is
then the moral equivalent of a giant workshop full of tools
Both the jackknife and the bootstrap involve resampling data; that is, repeatedly creating new data
sets from the original data
2/21
The jackknife
1 n
The jackknife deletes each observation and calculates an estimate based on the remaining
of them
It uses this collection of estimates to do things like estimate the bias and the standard error
Note that estimating the bias and having a standard error are not needed for things like sample
means, which we know are unbiased estimates of population means and what their standard
errors are
3/21
The jackknife
We'll consider the jackknife for univariate data
be the estimate of
n
1= i
1
n
Let
X , , 1X
Let
Let
Let
4/21
Continued
Then, the jackknife estimate of the bias is
) () 1 n(
(how far the average delete-one estimate is from the actual estimate)
The jackknife estimate of the standard error is
2/1
n
2
] )
i (
1= i
1 n
[
n
(the deviance of the delete-one estimates from the average delete-one estimate)
5/21
Example
We want to estimate the bias and standard error of the median
lbayUig)
irr(snR
dt(ahrsn
aafte.o)
x< fte.o$hih
- ahrsnsegt
n< lnt()
- eghx
tea< mda()
ht - einx
j < spl( :n
k - apy1
,
fnto()mda([i)
ucini einx-]
)
teaa < ma(k
htBr - enj)
baEt< ( -1 *(htBr-tea
iss - n
)
teaa
ht)
sEt< sr(n-1 *ma(j -teaa)2)
es - qt(
)
en(k
htBr^)
6/21
Example
cbaEt sEt
(iss, es)
[]000 011
1 .00 .04
lbaybosrp
irr(otta)
tm < jckiex mda)
ep - aknf(, ein
ctm$akba,tm$aks)
(epjc.is epjc.e
[]000 011
1 .00 .04
7/21
Example
Both methods (of course) yield an estimated bias of 0 and a se of 0.1014
0
Odd little fact: the jackknife estimate of the bias for the median is always
observations is even
It has been shown that the jackknife is a linear approximation to the bootstrap
Generally do not use the jackknife for sample quantiles like the median; as it has been shown to
have some poor properties
8/21
Pseudo observations
Another interesting way to think about the jackknife uses pseudo observations
Let
i
) 1 n( n = sbO oduesP
Note when
is the sample mean, the pseudo observations are the data themselves
Then the sample standard error of these observations is the previous jackknife estimated standard
error.
9/21
The bootstrap
The bootstrap is a tremendously useful tool for constructing confidence intervals and calculating
standard errors for difficult statistics
For example, how would one derive a confidence interval for the median?
The bootstrap procedure follows from the so called bootstrap principle
10/21
11/21
12/21
Bootstrap procedure for calculating confidence interval for the median from a data set of
observations
i. Sample
observations with replacement from the observed data resulting in one simulated
complete data set
n
iv. These medians are approximately drawn from the sampling distribution of the median of
observations; therefore we can
times, resulting in
and
ht
ht
5. 2
- Take the
13/21
Example code
B< 10
- 00
rsmls< mti(apex
eape - arxsml(,
n*B
,
rpae=TU)
elc
RE,
B n
, )
mdas< apyrsmls 1 mda)
ein - pl(eape, , ein
s(ein)
dmdas
[]0056
1 .84
qatl(ein,c.2,.7)
uniemdas (05 95)
25 9.%
.% 75
6.36.2
84 88
14/21
15/21
16/21
Group comparisons
Consider comparing two independent groups.
Example, comparing sprays B and C
dt(netpas
aaIscSry)
bxltcut~sry dt =IscSry)
opo(on
pa, aa
netpas
17/21
Permutation tests
Consider the null hypothesis that the distribution of the observations from each group is the same
Then, the group labels are irrelevant
We then discard the group levels and permute the combined data
n
first
and
Evaluate the probability of getting a statistic as large or large than the one observed
An example statistic would be the difference in the averages between the two groups; one could
also use a t-statistic
18/21
STATISTIC
TEST NAME
Ranks
rank sum
Binary
hypergeometric prob
Raw data
Also, so-called randomization tests are exactly permutation tests, with a different motivation.
For matched data, one can randomize the signs
- For ranks, this results in the signed rank test
Permutation strategies work for regression as well
- Permuting a regressor of interest
Permutation tests work very well in multivariate settings
19/21
[]1.5
1 32
ma(emttos>osreSa)
enpruain
bevdtt
[]0
1
20/21
Histogram of permutations
21/21