Professional Documents
Culture Documents
Unit 1b - Fundamentals of Machine Learning
Unit 1b - Fundamentals of Machine Learning
Unit 1b - Fundamentals of Machine Learning
machine learning
1
Four branches of Machine Learning
Supervised learning
Unsupervised learning
Self-supervised learning
Reinforcement learning
2
Types of ML – supervised
In supervised learning, the training set we
feed to the algorithm includes the desired
solutions, called labels.
Generally, almost all applications of deep
learning that are in the spotlight these days
belong in this category, such as optical
character recognition, speech recognition,
image classification, and language
translation.
3
Types of ML – Supervised
Classification
Regression
Sequence generation: Given a picture, predict a
caption describing it.
Syntax tree prediction: Given a sentence,
predict its decomposition into a syntax tree.
Object detection: Given a picture, draw a
bounding box around certain objects in it.
Image segmentation: Given a picture, draw a
pixel-level mask on a specific object.
4
Types of ML – Unsupervised
Finding interesting transformations of the input
data without the help of any targets, for the
purposes of data visualization, data compression,
or data denoising, or to better understand the
correlations present in the data at hand.
Bread and butter of data analytics.
As a preprocessing step before supervised
learning.
Dimensionality reduction and clustering are well-
known categories of unsupervised learning.
5
Types of ML – Self-supervised
Supervised learning without any humans in
the loop.
Autoencoders, where the generated targets
are the input, unmodified.
Trying to predict the next frame in a video,
given past frames, or the next word in a text,
given previous words (temporally supervised
learning - supervision comes from future input
data).
Not a standard definition…
6
Types of ML – Reinforcement learning
7
4.2 Evaluating ML models
In ML, the goal is to achieve models that
generalize - that perform well on never-
before-seen data.
Evaluating ML models involves measuring
generalization.
Ways to measure generalization
Simple hold-out validation
K-fold validation
Iterated K-fold validation with shuffling
8
Evaluating ML models
Evaluating a model always boils down to
splitting the available data into three sets:
training, validation, and test.
We train on the training data and evaluate
(and fine tune) our model on the validation
data.
After a few iterations, we train on the data
containing both training and validation data.
Once our model is ready for prime time, we
test it one final time on the test data.
9
Evaluating ML models
Developing a model always involves tuning its
configuration: for example, choosing the number
of layers or the size of the layers.
We do this tuning by using the performance of
the model on the validation data.
Tuning is a form of learning: a search for a
10
Evaluating ML models
We care about performance on completely new
data, not the validation data, so we need to use a
completely different, never-before-seen dataset
to evaluate the model: the test dataset.
Our model shouldn’t have had access to any
information about the test set, even indirectly.
Splitting data into training, validation, and test
sets becomes tricky when little data is available.
Simple hold-out validation,
K-fold validation, and
Iterated K-fold validation with shuffling.
11
Evaluating ML models
SIMPLE HOLD-OUT VALIDATION
12
Evaluating ML models
SIMPLE HOLD-OUT VALIDATION
data[num_validation_samples:]
13
Evaluating ML models
SIMPLE HOLD-OUT VALIDATION
If little data is available, then our validation
and test sets may contain too few samples
to be statistically representative of the data
at hand.
Different random shuffling rounds of the data
before splitting end up yielding very different
measures of model performance.
K-fold validation and iterated K-fold
validation are two ways to address this.
14
Evaluating ML models
K-FOLD VALIDATION
We split our data into K partitions of equal size.
For each partition i, we train a model on the remaining
K – 1 partitions, and evaluate it on partition i.
The final score is the average of those K scores.
15
Evaluating ML models
ITERATED K-FOLD VALIDATION WITH SHUFFLING
Particularly useful when we have relatively little data
available and we need to evaluate our model as
precisely as possible.
found to be extremely helpful in Kaggle competitions.
It consists of applying K-fold validation multiple times,
shuffling the data every time before splitting.
The final score is the average of the scores obtained
in training and evaluating P × K models (where P is
the number of iterations we use).
16
Evaluating ML models
Things to keep in mind
Data representativeness - we want both our
training set and test set to be representative of
the data at hand.
The arrow of time - If we’re trying to predict the
future given the past (i.e., stock movements), we
should not shuffle our data before splitting.
Redundancy in our data - If some data points
appear twice (fairly common with real-world
data), then it can result in redundancy between
the training and validation sets.
17
4.4 Overfitting and underfitting
After just a few epochs, all three models of chapter 3
began to overfit.
Learning how to deal with overfitting is essential to
mastering ML.
At the beginning of training, the loss on both training and
test data is high, our model is said to be an underfit:
The network hasn’t yet modeled all relevant patterns.
18
Overfitting and underfitting
20
Overfitting and underfitting
Reducing the network’s size
The simplest way to prevent overfitting is to
reduce the size of the model: the number of
learnable parameters (memorization capacity).
A model with 500,000 binary parameters could easily
be made to learn the class of every digit in the
MNIST training set (10 for each of the 50,000 digits).
Probably, useless on the data unseen.
A network with limited memorization resources is
forced to learn features that have predictive
power regarding the targets.
21
Overfitting and underfitting
Reducing the network’s size
At the same time, we should use models that
have enough parameters that they don’t underfit -
our model shouldn’t be starved for memorization
resources.
There is a compromise to be found between too much
capacity and not enough capacity.
Unfortunately, there is no magical formula.
Start with relatively few layers and parameters,
and keep increasing the size until we see
diminishing returns with regard to validation loss.
22
Overfitting and underfitting
24
Overfitting and underfitting
The model with L2
regularization has
become much more
resistant to overfitting
than the reference
model, even though both
models have the same
number of parameters.
model = models.Sequential()
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, kernel_regularizer=regularizers.l2(0.001),
activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
25
Overfitting and underfitting
Adding dropout
Dropout is one of the most effective and most
commonly used regularization techniques for
neural networks, developed by Geoff Hinton.
Dropout, applied to a layer, consists of randomly
dropping out (setting to zero) a number of output
features of the layer during training.
The dropout rate is the fraction of the features
that are zeroed out; it’s usually set between 0.2
and 0.5.
26
Overfitting and underfitting
Adding dropout
x
27
Overfitting and underfitting
Adding dropout
Either
At training time, we zero out at random a fraction of
the values in the matrix.
At test time, we scale down the output by the dropout
rate.
Or
At training time, we zero out at random a fraction of
the values in the matrix.
Then, we scale up the output by the dropout rate.
At test time, we do not do anything.
28
Overfitting and underfitting
This is a clear improvement
over the reference model - it
also seems to be working
much better than L2
regularization, since the lowest
validation loss reached has
improved.
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
29
Overfitting and underfitting
To recap, these are the most common
ways to prevent overfitting in neural
networks:
Get more training data.
Reduce the capacity of the network.
Add weight regularization.
Add dropout.
Weights have less chance of “collusion”.
Each weight “trains harder” to capture a feature,
since other weights may dropout during training.
30
Chapter summary
The purpose of a machine learning model is to
generalize: to perform accurately on never-
before-seen inputs.
Many model evaluation methods.
Holdout validation, K-fold cross-validation, etc.
The fundamental problem in machine learning is
the tension between optimization and
generalization.
First work on optimization; tuning hyperparameters.
Then work on generalization; model regularization.
31