Professional Documents
Culture Documents
CPE EL3 Lab 4 Training and Test Sets
CPE EL3 Lab 4 Training and Test Sets
LABORATORY EXERCISES
I. OBJECTIVES
Internet connection
Google account
Google Colaboratory
III. CONCEPTS/THEORY/CONTENT
This exercise looks into the technique of splitting the data set into training and test sets, address the issues
of overfitting, an issue that results from over-training the model to have good prediction results using
trained data, but performs poorly on new-unforeseen data. This is expected as all the data have been used
for training, and the model has been conditioned to perform well for the set.
However, the ultimate goal of machine learning is to have a model that predicts well in a true probability
distribution. This can be done by separating the data sets into training set, the part of a data set that is
used to train the model, and a test set, the part of data set that attempts to represent new, unforseen data
for the model.
The model is evaluated and adjusted based on the prediction performance against the test set. The process
cycle is shown in Figure 1.
https://colab.research.google.com/drive/1EL5nRL0ezzs63JnB6sZWBgbb7kkj5_U4?usp=sharing
2. Save a personal copy of the notebook: File > Save a copy in Drive
3. Do the exercises and tasks until completion. Basic codes have been provided, but you have to add
other code especially in the last part of the exercise.
https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-data