Professional Documents
Culture Documents
COMPX310-19A Machine Learning Chapter 3: Classification
COMPX310-19A Machine Learning Chapter 3: Classification
COMPX310-19A Machine Learning Chapter 3: Classification
Machine Learning
Chapter 3: Classification
An introduction using Python, Scikit-Learn, Keras, and Tensorflow
Unless otherwise indicated, all images are from Hands-on Machine Learning with
Scikit-Learn, Keras, and TensorFlow by Aurélien Géron, Copyright © 2019 O’Reilly Media
House keeping
Outline
03/08/2021 COMPX310 2
MNIST: the “hello world”of ML
Scikit-learn provides some benchmark datasets,
03/08/2021 COMPX310 4
Preparing Y
03/08/2021 COMPX310 5
Train/test, binary class
03/08/2021 COMPX310 6
Yet another learner: SGD
03/08/2021 COMPX310 7
Cross-validation
03/08/2021 COMPX310 8
Cross-validation
Cross-validation is an alternative to Train+Validation
Train is split up into k equal-sized folds (default: 10 folds)
Use k-1 folds together as the new train, validate on the
remaining fold
Repeat this k times, always choosing another fold => k results
Compute mean + standard deviation
[can also repeat this multiple times with different random seeds
to reduce the variance of the result]
03/08/2021 COMPX310 9
Cross-validation
Workhorse in ML, therefore direct support in scikit_learn:
03/08/2021 COMPX310 10
Are we really that good?
03/08/2021 COMPX310 11
Getting predictions from CV
03/08/2021 COMPX310 12
Compare to perfection
03/08/2021 COMPX310 13
Precision and Recall
Precision: how many of the predicted 5s are really 5s
03/08/2021 COMPX310 14
TN, TP, FN, FP and the confusion matrix
[[5, 1], TN=5, FP=1, FN=2, TP=3
[2, 3]]. Rows: row0 info about class0, …
Columns: col0 info about predictedAs0, …
03/08/2021 COMPX310 15
F1: harmonic mean of recall & precision
[[5, 1], TN=5, FP=1, FN=2, TP=3
[2, 3]]. Rows: row0 info about class0, …
Columns: col0 info about predictedAs0, …
03/08/2021 COMPX310 16
Some results
03/08/2021 COMPX310 17
Thresholds: precision/recall trade-off
03/08/2021 COMPX310 18
Classifiers return numeric scores
03/08/2021 COMPX310 19
Precision recall curves
03/08/2021 COMPX310 20
Precision recall curves
03/08/2021 COMPX310 21
Recall @ precision == 0.9
03/08/2021 COMPX310 22
Precision-recall curve
03/08/2021 COMPX310 23
Alternative: ROC curve
03/08/2021 COMPX310 24
Alternative: ROC curve
Plot true positive rate (TPR)
over false positive rate (FPR)
for all possible thresholds.
Best @ (0,1).
Diagonal is a random classifier.
03/08/2021 COMPX310 25
Compare to Random Forest
03/08/2021 COMPX310 26
Compare to Random Forest
03/08/2021 COMPX310 27
Compare to Random Forest
03/08/2021 COMPX310 28
Multi-class classification
03/08/2021 COMPX310 29
Multi-class classification
03/08/2021 COMPX310 30
One-vs-One for Multiclass
03/08/2021 COMPX310 31
Random forest for multi-class
03/08/2021 COMPX310 32
Error analysis: confusion matrix from CV
03/08/2021 COMPX310 33
Error analysis: confusion matrix from CV
03/08/2021 COMPX310 34
Error analysis: confusion matrix from CV
03/08/2021 COMPX310 35
Multilabel: more than one binary target
03/08/2021 COMPX310 36
Multilabel: cross-validation
03/08/2021 COMPX310 37
MultiOutput: multiple multiclass target
E.g.: reconstruct image from a corrupted version
X y
03/08/2021 COMPX310 38
Adding noise, train & predict
03/08/2021 COMPX310 39