IML 8 - Grid Search and Cross Validation

INTRODUCTION TO MACHINE LEARNING
UNIT # 8
FALL 2023 Sajjad Haider 1

ACKNOWLEDGEMENT
 The slides and code in this lecture are primarily taken from
 Machine Learning with PyTorch and Scikit-Learn by Rischka et al.
 Principles of Data Mining by Bremer
 Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow by Geron

TODAY’S AGENDA
 Recap of the previous lecture

 Holdout Method
 Testing vs Validation Set
 K-Fold Cross Validation
 Python Implementation

HOLD OUT METHOD (TRAIN-TEST SPLIT)
 The holdout method is what we have alluded to

so far in our discussions about accuracy.
 In this method, the given data are randomly
partitioned into two independent sets: a training
set and a test set.
 The training set is used to derive the model,
whose accuracy is estimated with the test set.
 The estimate is pessimistic because only a
portion of the initial data is used to derive the
model.

TRAIN-TEST SPLIT
 The error rate on new cases is called the generalization error (or out-of-
sample error), and by evaluating your model on the test set, you get an
estimate of this error.
 This value tells you how well your model will perform on instances it has
never seen before.
 If the training error is low (i.e., your model makes few mistakes on the
training set) but the generalization error is high, it means that your model
is overfitting the training data.

TRAIN-TEST SPLIT PERCENTAGE
 It is common to use 70-80% of the data for training and hold out 30-20%
for testing.
 However, this depends on the dataset size: if it contains 10 million
instances, then holding out 1% means your test set will contain 100,000
instances, probably more than enough to get a good estimate of the
generalization error.

LIMITATION OF TRAIN-TEST SPLIT IN PARAMETER TUNING
 However, in typical machine learning applications, we are also interested in tuning and
comparing different parameter settings to further improve the performance for making
predictions on unseen data.
 This process is called model selection, with the name referring to a given classification
problem for which we want to select the optimal values of tuning parameters (also called
hyperparameters).
 However, if we reuse the same test dataset over and over again during model selection, it
will become part of our training data and thus the model will be more likely to overfit.
 Thus, using the test dataset for model selection is not a good machine learning practice.

TESTING AND VALIDATION DATASET
 A better way of using the holdout method for model selection is to

separate the data into three parts: a training dataset, a validation dataset,
and a test dataset.
 The training dataset is used to fit the different models, and the
performance on the validation dataset is then used for model selection.
 The advantage of having a test dataset that the model hasn’t seen before
during the training and model selection steps is that we can obtain a less
biased estimate of its ability to generalize to new data.

TESTING AND VALIDATION DATASET (CONT’D)
 The figure illustrates the

concept of holdout cross-
validation, where we use a
validation dataset to repeatedly
evaluate the performance of the
model after training using
different hyperparameter values.

 More specifically, you train multiple models with various hyperparameters on

the reduced training set (i.e., the full training set minus the validation set) and
select the model that performs best on the validation set.
 After this holdout validation process, you train the best model on the full
training set (including the validation set), and this gives you the final model.
 Lastly, you evaluate this final model on the test set to get an estimate of the
generalization error.

A DISADVANTAGE OF THE HOLDOUT METHOD
 A disadvantage of the holdout method is that the performance estimate

may be very sensitive to how we partition the training dataset into the
training and validation subsets; the estimate will vary for different
examples of the data.
 This can be addressed by using cross-validation method.

CROSS VALIDATION
 Cross Validation helps to get a more accurate
performance evaluation
 K-fold Cross Validation
 Divide the labeled data set into k subsets
 Repeat k times: Take subset i as the test data and the
rest of the subsets as the training data. Train the
classifier and asses the testing error on subset i.
 Average the testing error from the k iterations
 Leave one out Cross Validation
 Cross validation with k = 1
K-FOLD CROSS VALIDATION
 Cross Validation helps to get a more

accurate performance evaluation
 In k-fold cross-validation, we randomly
split the training dataset into k-folds
without replacement.
 k – 1 folds are used for the model training,
and one fold, is used for performance
evaluation.
 This procedure is repeated k times.
K-FOLD CROSS VALIDATION
 We then calculate the average performance of the models based on the different, independent
test folds to obtain a performance estimate that is less sensitive to the sub-partitioning of the
training data compared to the holdout method.
 Typically, we use k-fold cross-validation for model tuning.
 Once we have found satisfactory hyperparameter values, we retrain the model on the
complete training dataset and obtain a performance estimate using the independent test
dataset.
 The rationale behind fitting a model to the whole training dataset after k-fold cross-validation
is that first, we are typically interested in a single, final model (versus k individual models),
and second, providing more training examples to a learning algorithm usually results in a
more accurate and robust model.
 A good standard
value for k in k-
fold cross-
validation is 10,
as empirical
evidence shows.

LOOCV
 A special case of k-fold cross-validation is the leave-one-out cross-

validation (LOOCV) method. In LOOCV, we set the number of folds
equal to the number of training examples (k = n) so that only one training
example is used for testing during each iteration, which is a
recommended approach for working with very small datasets.

CROSS-VALIDATION IN SKLEARN
 An extremely useful feature of the cross_val_score approach is that we can

distribute the evaluation of the different folds across multiple central processing
units (CPUs) on our machine.
 If we set the n_jobs parameter to 1, only one CPU will be used to evaluate the
performances.
 However, by setting n_jobs=2, we could distribute the 10 rounds of cross-
validation to two CPUs (if available on our machine), and by setting n_jobs=-1,
we can use all available CPUs on our machine to do the computation in parallel.

PIPELINE FUNCTION IN SKLEARN
 The make_pipeline function takes an arbitrary number of scikit-learn transformers (objects that
support the fit and transform methods as input), followed by a scikit-learn estimator that
implements the fit and predict methods.
 We can think of a scikit-learn Pipeline as a meta-estimator or wrapper around those individual
transformers and estimators. If we call the fit method of Pipeline, the data will be passed down a
series of transformers via fit and transform calls on these intermediate steps until it arrives at the
estimator object (the final element in a pipeline).
 The estimator will then be fitted to the transformed training data.
 we should note that there is no limit to the number of intermediate steps in a pipeline; however, if
we want to use the pipeline for prediction tasks, the last pipeline element has to be an estimator.

PIPELINE FUNCTION IN SKLEARN
 Similar to calling fit on a pipeline, pipelines also implement a predict

method if the last step in the pipeline is an estimator.
 If we feed a dataset to the predict call of a Pipeline object instance, the
data will pass through the intermediate steps via transform calls.
 In the final step, the estimator object will then return a prediction on the
transformed data.

 The most important rule to remember is that both the validation set and
the test set must be as representative as possible of the data you expect to
use in production.
 After training your model, if you observe that the performance of the
model on the validation set is disappointing, you will not know whether
this is because your model has overfit the training set, or whether this is
just due to the mismatch between the web pictures and the mobile app
pictures.

IML 8 - Grid Search and Cross Validation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IML 8 - Grid Search and Cross Validation

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO MACHINE LEARNING

FALL 2023 Sajjad Haider 1

FALL 2023 Sajjad Haider 2

 Recap of the previous lecture

FALL 2022 Sajjad Haider 3

 The holdout method is what we have alluded to

FALL 2022 Sajjad Haider 4

FALL 2023 Sajjad Haider 5

FALL 2023 Sajjad Haider 6

FALL 2023 Sajjad Haider 7

 A better way of using the holdout method for model selection is to

FALL 2023 Sajjad Haider 8

 The figure illustrates the

FALL 2023 Sajjad Haider 9

 More specifically, you train multiple models with various hyperparameters on

FALL 2023 Sajjad Haider 10

 A disadvantage of the holdout method is that the performance estimate

FALL 2023 Sajjad Haider 11

 Cross Validation helps to get a more

FALL 2023 Sajjad Haider 15

 A special case of k-fold cross-validation is the leave-one-out cross-

FALL 2023 Sajjad Haider 16

 An extremely useful feature of the cross_val_score approach is that we can

FALL 2023 Sajjad Haider 17

FALL 2023 Sajjad Haider 18

 Similar to calling fit on a pipeline, pipelines also implement a predict

FALL 2023 Sajjad Haider 19

FALL 2023 Sajjad Haider 22

You might also like