Machine Learning

Resampling Methods
Resampling methods are an indispensable tool in modern statistics. They involve repeatedly
drawing samples from a training set and refitting a model of interest on each sample in order
to obtain additional information about the fitted model.
Resampling approaches can be computationally expensive, because they involve fitting the
same statistical method multiple times using different subsets of the training data.
Dr. A V Prajeesh August 21, 2022 1 / 23

Two of the most commonly used resampling methods, cross-validation and the bootstrap.
cross-validation can be used to estimate the test error associated with a given statistical
learning method in order to evaluate its performance.
The bootstrap is used in several contexts, most commonly model selection to provide a
measure of accuracy of a parameter estimate or of a given statistical learning method.

The Validation Set Approach
Suppose that we would like to estimate the test error associated with fitting a particular
statistical learning method on a set of samples. The validation set approach involves randomly
dividing the available set of samples into two parts, a training set and a validation set or
hold-out set. The model is fit on the training set, and the fitted model is used to predict the
responses for the observations in the validation set.

The validation set approach is conceptually simple and is easy to implement.
But it has two potential drawbacks.
1. The validation estimate of the test error rate can be highly variable, depending on precisely
which observations are included in the training set and which observations are included in the
validation set.
2. In the validation approach, only a subset of the observations - those that are included in the
training set rather than in the validation set - are used to fit the model. Since statistical
methods tend to perform worse when trained on fewer observations, this suggests that the
validation set error rate may tend to overestimate the test error rate for the model fit on the
entire data set.

To overcome that drawbacks a refined technique is introduced.
Leave-One-Out Cross-Validation(LOOCV)
Like the validation set approach, LOOCV involves splitting the set of observations into two
parts. However, instead of creating two subsets of comparable size, a single observation (x1 , y1 )
is used for the validation set, and the remaining observations {(x2 , y2 ), ..., (xn , yn )} make up
the training set. The statistical learning method is fit on the n − 1 training observations, and a
prediction ŷ1 is made for the excluded observation, using its value x1 . Since (x1 , y1 ) was not
used in the fitting process, MSE1 = (y1 − ŷ1 )2 provides an approximately unbiased estimate
for the test error. But even though MSE1 is unbiased for the test error, it is a poor estimate
because it is highly variable, since it is based upon a single observation (x1 , y1 )

We can repeat the procedure by selecting (x2 , y2 ) for the validation data, training the statistical
learning procedure on the n − 1 observations {(x1 , y1 ) , (x3 , y3 ) , . . . , (xn , yn )}, and computing
MSE2 = (y2 − ŷ2 )2 . Repeating this approach n times produces n squared errors, MSE1 , . . .,
MSEn . The LOOCV estimate for the test MSE is the average of these n test error estimates:
n
1X
CV(n) = MSEi
n
i=1

LOOCV has a couple of major advantages over the validation set approach.
In LOOCV, we repeatedly fit the statistical learning method using training sets that contain
n − 1 observations, almost as many as are in the entire data set. This is in contrast to the
validation set approach, in which the training set is typically around half the size of the original
data set. Consequently, the LOOCV approach tends not to overestimate the test error rate as
much as the validation set approach does.
Second, in contrast to the validation approach which will yield different results when applied
repeatedly due to randomness in the training/validation set splits, performing LOOCV multiple
times will always yield the same results: there is no randomness in the training/validation set
splits.

k-Fold Cross-Validation
An alternative to LOOCV is k-fold CV . This approach involves randomly dividing the set of
observations into k groups, or folds, of approximately equal size. The first fold is treated as a
validation set, and the method is fit on the remaining k − 1 folds. The mean squared error,
MSE1 , is then computed on the observations in the held-out fold. This procedure is repeated
k times; each time, a different group of observations is treated as a validation set. This
process results in k estimates of the test error, MSE1 , MSE2 , . . . , MSEk . The k-fold CV
estimate is computed by averaging these values,
k
1X
CV(k) = MSEi .
k
i=1

It is not hard to see that LOOCV is a special case of k-fold CV in which k is set to equal n. In
practice, one typically performs k-fold CV using k = 5 or k = 10. What is the advantage of
using k = 5 or k = 10 rather than k = n ? The most obvious advantage is computational.
LOOCV requires fitting the statistical learning method n times. This has the potential to be
computationally expensive.

The Bias-Variance Trade-Off
The expected test MSE, for a given value x0 , can always be decomposed into the sum of three
fundamental quantities: the variance of fˆ (x0 ), the squared bias of fˆ (x0 ) and the variance of
the error terms ϵ. That is,
2 h i2
E y0 − fˆ (x0 ) = Var fˆ (x0 ) + Bias fˆ (x0 ) + Var(ϵ).
2
Here the notation E y0 − fˆ (x0 ) defines the expected test MSE, and refers to the average
test MSE that we would obtain if we repeatedly estimated f using a large number of training
sets, and tested each at x0 . The overall expected test MSE can be computed by averaging
2
E y0 − fˆ (x0 ) over all possible values of x0 in the test set.

What do we mean by the variance and bias of a statistical learning method? Variance refers to
the amount by which fˆ would change if we estimated it using a different training data set.
Since the training data are used to fit the statistical learning method, different training data
sets will result in a different fˆ. But ideally the estimate for f should not vary too much
between training sets. However, if a method has high variance then small changes in the
training data can result in large changes in fˆ.

On the other hand, bias refers to the error that is introduced by approximating a real-life
problem, which may be extremely complicated, by a much simpler model. For example, linear
regression assumes that there is a linear relationship between Y and X1 , X2 , . . . , Xp . It is
unlikely that any real-life problem truly has such a simple linear relationship, and so performing
linear regression will undoubtedly result in some bias in the estimate of f . Suppose the true f
is substantially non-linear, so no matter how many training observations we are given, it will
not be possible to produce an accurate estimate using linear regression. In other words, linear
regression results in high bias in this example.
The relationship between bias, variance, and test set MSE is referred to as the bias-variance
trade-off.

Cross-Validation on Classification Problems
We have illustrated the use of cross-validation in the regression setting where the outcome Y
is quantitative, and so have used MSE to quantify test error. But cross-validation can also be
a very useful approach in the classification setting when Y is qualitative. In this setting,
cross-validation works just as described earlier, except that rather than using MSE to quantify
test error, we instead use the the number of misclassified observations. For instance, in the
classification setting, the LOOCV error rate takes the form
n
1X
CV(n) = Erri
n
i=1
where Erri = I (yi ̸= ŷi ). The k-fold CV error rate and validation set error rates are defined
analogously.
Bootstrap
The bootstrap is a widely applicable and extremely powerful statistical tool bootstrap that can
be used to quantify the uncertainty associated with a given estimator or statistical learning
method. As a simple example, the bootstrap can be used to estimate the standard errors of
the coefficients from a linear regression fit. However, the power of the bootstrap lies in the
fact that it can be easily applied to a wide range of statistical learning methods, including
some for which a measure of variability is otherwise difficult to obtain and is not automatically
output by statistical software.

Suppose that we wish to invest a fixed sum of money in two financial assets that yield returns
of X and Y , respectively, where X and Y are random quantities. We will invest a fraction α of
our money in X , and will invest the remaining 1 − α in Y . Since there is variability associated
with the returns on these two assets, we wish to choose α to minimize the total risk, or
variance, of our investment. In other words, we want to minimize Var(αX + (1 − α)Y ). Using
some statistics one can find an expression for α that minimizes the risk.
σY2 − σXY
α= ,
σX2 + σY2 − 2σXY
where σX2 = Var(X ), σY2 = Var(Y ), and σXY = Cov(X , Y ).

In reality, the quantities σX2 , σY2 , and σXY are unknown. We can compute estimates for these
quantities, σ̂X2 , σ̂Y2 , and σ̂XY , using a data set that contains past measurements for X and Y .
We can then estimate the value of α that minimizes the variance of our investment using
σ̂Y2 − σ̂XY
α̂ = .
σ̂X2 + σ̂Y2 − 2σ̂XY
So to get a better estimates we need to have data say 100 or 1000 sets of data and generate
the estimates of α̂ and Standard error (α̂) (the standard deviation of the estimates). So in the
real world situation we can use the concept of Bootstrap for generating such data sets from a
single data set.

This approach is illustrated in Figure A on a simple data set, which we call Z , that contains
only n = 3 observations. We randomly select n observations from the data set in order to
produce a bootstrap data set, Z ∗1 . The sampling is performed with replacement, which means
that the same observation can occur more than once in the bootstrap data set. In this
example, Z ∗1 contains the third observation twice, the first observation once, and no instances
of the second observation. Note that if an observation is contained in Z ∗1 , then both its X
and Y values are included. We can use Z ∗1 to produce a new bootstrap estimate for α, which
we call α̂∗1 . This procedure is repeated B times for some large value of B, in order to produce
B different bootstrap data sets, Z ∗1 , Z ∗2 , . . . , Z ∗B , and B corresponding α estimates,
α̂∗1 , α̂∗2 , . . . , α̂∗B . We can then compute the standard error as required.

Python executable python Code for Validation set approach, LOOCV and K- fold Cross
validation in the case of a linear regression model fitting
Figure 1
Figure 2: Validation set approach

Figure 3: K-fold cross validation

Figure 4: LOOCV

Bootstrapping Technique used in k-fold cross validation for a linear regression problem
Figure 5: Importing required data

Figure 6: Bootstrapping technique used in K-fold CV for a Linear regression problem

Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning

Uploaded by

Copyright:

Available Formats

Resampling Methods

Dr. A V Prajeesh August 21, 2022 1 / 23

Dr. A V Prajeesh August 21, 2022 2 / 23

Dr. A V Prajeesh August 21, 2022 3 / 23

But it has two potential drawbacks.

Dr. A V Prajeesh August 21, 2022 4 / 23

Dr. A V Prajeesh August 21, 2022 5 / 23

Dr. A V Prajeesh August 21, 2022 6 / 23

Dr. A V Prajeesh August 21, 2022 7 / 23

Dr. A V Prajeesh August 21, 2022 8 / 23

Dr. A V Prajeesh August 21, 2022 9 / 23

Dr. A V Prajeesh August 21, 2022 10 / 23

Dr. A V Prajeesh August 21, 2022 11 / 23

Dr. A V Prajeesh August 21, 2022 12 / 23

Dr. A V Prajeesh August 21, 2022 14 / 23

where σX2 = Var(X ), σY2 = Var(Y ), and σXY = Cov(X , Y ).

Dr. A V Prajeesh August 21, 2022 15 / 23

Dr. A V Prajeesh August 21, 2022 16 / 23

Dr. A V Prajeesh August 21, 2022 17 / 23

Dr. A V Prajeesh August 21, 2022 19 / 23

Dr. A V Prajeesh August 21, 2022 20 / 23

Dr. A V Prajeesh August 21, 2022 21 / 23

Figure 5: Importing required data

Dr. A V Prajeesh August 21, 2022 22 / 23

Dr. A V Prajeesh August 21, 2022 23 / 23

You might also like