Lec1 Introduction

ML: Lecture 1
Introduction to Machine Learning
Panagiotis Papapetrou, PhD

Professor, Stockholm University
Course logistics
• Responsible teacher:
– Panagiotis Papapetrou - panagiotis@dsv.su.se
• Course Assistant:
– Zed Lee - zed.lee@dsv.su.se
• Schedule:
– 12 Lectures
– 5 Lab sessions
– Written Exam: Mar 14
• Guest lecturers:
– Ioanna Miliou - ioanna.miliou@dsv.su.se
– Korbinean Randl - korbinian.randl@dsv.su.se
– Alejandro Kuratomi - alejandro.kuratomi@dsv.su.se
– Sindri Magnusson - sindri.magnusson@dsv.su.se
• Office hours: by appointment only

Syllabus
Jan 15 Introduction to machine learning
Jan 17 Regression analysis
Jan 18 Laboratory session I: numpy and linear regression
Jan 22 Machine learning pipelines
Jan 25 Laboratory session II: ML pipelines and grid search
Jan 29 Training neural networks
Jan 31* Laboratory session III: training NNs and tensorflow basics
Feb 6 Convolutional neural networks
Feb 7 Recurrent neural networks and autoencoders
Feb 12 Time series classification
Feb 15 Laboratory session IV: CNNs and RNNs
Feb 19 Deep time series classification
*to be moved to Jan 31 from Feb 1.

Syllabus
Feb 20 Attention mechanisms
Feb 22 Laboratory session V: Time series classification
Feb 26 Transformers
Mar 4 Explainable machine learning
Mar 6 Reinforcement learning
Mar 14 Examination
Apr 29 Re-take Examination
Course workload
• Assignments 3hp
– Three programming assignments (python and
tensorflow)
• Written Exam 4.5hp

Assignments
• To be done individually
• Will involve python/tensorflow programming
• Five lab sessions
• Tutoring sessions will be offered
• Grading scheme: Pass – Fail
• To pass: you should pass each assignment with full
points
• You have two attempts per assignment + a re-take!
Exam
• Two parts:
– Part A: multiple-choice questions
– Part B: free-text questions
• It will examine both your knowlegde on several concepts

discussed in the lectures are well as your critical ability on
applying what you have learned
• Will include some coding tasks
• To pass you need to obtain at least 60% of the total points
• Grading scheme: A – F
Learning Objectives
• Describe, implement and apply machine learning methods
and techniques for data exploration and data analysis
• Reason about the selected algorithms and solutions for
machine learning
• Implement and apply machine learning methods to large
and complex amounts of data
• Describe, implement, and apply ML solutions for a given
problem, as well as evaluate ML results
• Become fluent in python and tensorflow
Textbooks
• Elements of Statistical Learning
Edition: 2nd Publisher: Springer
ISBN: 0387848576
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:

Concepts, Tools, and Techniques to Build Intelligent Systems
Edition: Second Publisher: O'REILLY
ISBN: 978-1492032649
• Python Machine Learning

Edition: Third
ISBN: 978-1-78995-575-0
Research papers (pointers will be provided)

Above all
• The goal of the course is to learn and enjoy
• The basic principle is to ask questions when you don’t

understand
• Say when things are unclear; not everything can be

clear from the beginning
• Participate in the class as much as possible

Today
• What is machine learning?
• What are the different types of machine learning methods?
• What are examples of machine learning tasks in connection to
the methods?
• What is the bias-variance trade-off?
• How do we evaluate machine learning models?

Machine Learning
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
“Leaerning is the study of algorithms that improve

their performance with experience given a task.”
- Tom Michael Mitchell
Real-world examples of Machine Learning
*https://www.aiiottalk.com/machine-learning-examples-in-real-life/
Types of Machine Learning
• Supervised learning
• Unsupervised learning
• Semi-supervised learning
• Active learning
• Reinforcement learning
• Federated learning
• …
Supervised Learning
• Setup: a model is trained using labeled data, where the input is
paired with the correct output labels, with the objective of
learning to predict the output label, given an input data example.
• Learning: by analyzing a training set of labeled examples and
adjusting the model parameters in order to minimize the error
between the model’s predictions and the correct output labels.
Drawbacks:
• Labelled data is required
• Susceptible to bias
• Generalization issues
• May require feature engineering
Unsupervised Learning
• Setup: a model is trained using unlabeled data, with the objective
of finding underlying hidden structures and patterns in the data.
• Learning: by analyzing a training set of unlabeled examples and
adjusting the model parameters in order to maximize a learning
objective.
Drawbacks:
• Lack of ground truth
• Difficult to evaluate
• Curse of dimensionality
• Computationally intensive
Semi-supervised Learning
• Setup: a model is trained using both labelled and unlabeled data,
with the objective of learning to predict the output label given an
input data example.
• Learning: by iteratively training the model using the most
confident predictions, and augmenting the training dataset.
Drawbacks:
• Sensitive to data quality
• Difficult to evaluate
• Difficult to interpret
Semi-supervised learning: self-learning
• Pick a small amount of labeled data and use this dataset to train a base model.
• Then apply the process known as pseudo-labeling by taking the partially
trained model and use it to make predictions for the unlabeled data
• Take the most confident predictions made with by the model and add them
into the labeled dataset; hence creating a new, combined input to train an
improved model.
• Repeat the process for several iterations.
Semi-supervised learning: co-learning
• Train two individual classifiers
based on two views of data.
• Views: different sets of features
but same instances.
• The bigger pool of unlabeled
data receives pseudo-labels
from both classifiers.
• Classifiers co-train one another
using the pseudo-labels with
the highest confidence level.
• The predictions of the two
classifiers are combined.
Active Learning
• Setup: a model is trained using both labelled and unlabeled data,
with the objective of learning to predict the output label given an
input data example.
• Learning: by iteratively training the model using a human expert
annotator for the least confident predictions.
Drawbacks:
• Sensitive to data quality
• Dependence on human
annotator
Types of Machine Learning…
• Reinforcement learning: an agent learns to interpret its
environment and optimize a given objective by interacting with
the environment through actions that are either rewarded or
penalized.
• Federated learning: training data is locally stored in inter-
connected nodes, and it and cannot be shared. A central model
is learned using the distributed local data by only sharing model
parameters and not the data points themselves.
Examples of supervised learning
• Predict whether a patient, hospitalized due to
heart attack, will have a second heart attack
• Identify the strongest predictors for heart attack
• Prediction based on many data modalities:
demographics, diet, clinical measurements, clinical
history
H
• Predict the price of a stock in 6 months from now, based on
– the performance of the company (e.g., sales, future outlook)
– other economic metrics (e.g., market value, Price-to-Earnings ratio)
– other exogenous variables (e.g., social media)
• Identify the handwritten numbers in a
a digitized image
• Identify animals in an image
Examples of reinforcement learning
Define the measures that need to be taken by the public health authorities
under a pandemic situation
training
(“rewards” / “penalties”)
RL Output
Input “actions”
System
Examples of federated learning
Given data from distributed hospitals learn a central model that can propose
the optimal patient treatment without sharing any data
Medical Images
Electronic patient records
Smart phone data
Global
model
Frac. of Zagreb No.

Solu- rotatable Geom Log group Topol No. C heavy
Name bility bonds diam. P index diam. atoms bonds
methylpentane good 0.40 3.46 2.44 19 4 6 5

methylcyclohexene good 0 3.00 2.51 31 4 7 7
nonene med. 0.75 6.93 3.53 28 8 9 8
hexadiene good 0.60 4.36 2.14 16 5 6 5
butadiene good 0.33 2.65 1.36 8 3 4 3
naphthalene good 0 3.61 2.84 57 5 10 11
acenaphthylene good 0 3.58 3.32 83 5 12 14
pyrene poor 0 5.00 4.58 117 7 16 19
Sensor reading
dimethylanthracene poor 0 5.29 4.61 108 7 16 18
hexahydropyrene med. 0 5.00 3.82 117 7 16 19
triphenylene poor 0 5.00 5.15 126 7 18 21
benzo(e)pyrene poor 0 5.29 5.64 152 7 20 24
Chemical compound data
Onboard maintenance tracking

The classical learning setup
Attributes (or variables)
categorical, discrete, ordinal, numercial
denoted as X
Output
Input Machine Learning responses or
predictors or
independent variables function f dependent variables
denoted as Y
Classification is the task of learning a Regression is the task of learning a

target function f that maps an input target function f that maps an input
(attribute set) X to a class label Y (attribute set) X to a real value Y in a
given range of real values
How good is f ?
Appropriate-fitting
Underfitting: an ML model is unable to capture the relationship between the input and
output variables accurately, generating a high error rate on both training and test sets
Overfitting: an ML model is unable to generalize, hence the relationship it learns between
the input and output variables in the training set does not reflect the true relationship in the
test set, generating a low error rate on the training set but a high error rate on the test set
What determines a good fit?

How good is f ?
the U-Shape effect
training error
test error
training error
Underfitting: an ML model is unable to capture the relationship between the input and
output variables accurately, generating a high error rate on both training and test sets
Overfitting: an ML model is unable to generalize, hence the relationship it learns between
the input and output variables in the training set does not reflect the true relationship in the
test set, generating a low error rate on the training set but a high error rate on the test set
What determines a good fit?

The bias-variance decomposition
• Assume the data complies to the following statistical model:
• The relationship between X and Y is non-deterministic (hence the

noise factor ε)
• But we assume:
• We also assume that f is fixed and unknown, and we let 𝒇! denote

the obtained predictions
• The metric to optimize is MSE (mean squared error):

• Bias: the error induced by erroneous assumptions by the learning algorithm.
• High bias can cause an algorithm to miss the relevant relations between
features and target outputs (underfitting).
• Bias: over different realizations of the training set, how much is

𝒇! 𝑿 affected in approximating f:
• Variance: the error from sensitivity to small fluctuations in the training set.
• High variance may cause an algorithm to model random noise or erroneous

relations in the training dataset (overfitting).
• Variance: over different realizations of the training set, how much does
𝒇! 𝑿 deviate from its expectation:
• Bias–variance decomposition: a way of analyzing the expected

generalization error of a learning algorithm with respect to a particular
problem, as a sum of three terms:
bias2 + variance + irreducible error
irreducible error: resulting from noise in the problem itself.
Proof
Proof
Recall:
Proof
• square expansion
• linear property of expectation
Proof
• Properties of error
• independence of random variables and 𝒇!
Proof
Let us expand this now

Continuing the decomposition
: is a constant, hence
the expectation is also constant
linearity of
expactation
is applied
Model selection & assessment
Ideal scenario: plenty of data, so divide it into three compartments:

(1) training, (2) validation, (3) test
assesment of the
model generalization error of
model fitting selection the final model
training validation test

Hold out & subsampling
• Holdout (or using a validation set)
– reserve 50% for training, 25% for validation, and 25% for testing
• Random subsampling
training validation test
– repeated holdout
• Stratified subsampling
– preserve the class balance in the samples
• Shortcoming:
- depends highly on which data points end up in the training set and
which end up in the test set
k-fold Cross Validation
• Partition the N data samples into k disjoint subsets
– train on k-1 partitions
– test on the remaining one
D 1 2 3 … k-1 k
• Each subset is given a chance to be in the test set

• Performance is averaged over the k iterations:
If k=N then leave-one-out Cross

Validation (LOOCV)
What is good about LOOCV?

LOOCV vs k-fold
• LOOCV:
o the test error is approximately unbiased to the true (expected)
prediction error; training on “almost” the whole training set
o training on almost identical training sets produces models that are
highly correlated, hence the error estimate will have high variance
if different data samples are used
o the computational burden will be very high: the training algorithm
has to be rerun from scratch N times
• k-fold:
o less correlated models
o the computational time is lower
o the bias of the procedure increases as the training set will contain
fewer examples
o the lower the k the higher the bias!
Learning Curve
• Learning curve: a plot that shows
changes in learning performance (y-
axis) over time in terms of
experience (x-axis).
• Learning curves can be defined to
depict model performance on the (1)
the training set or (2) validation set.
• They can be used to diagnose an
underfit, overfit, or well-fit model.
• They can be used to diagnose
Train learning curve: how
whether the train or validation sets well the model is learning
are not representative of the
problem domain. Validation learning curve:
how well the model is
generalizing
Learning Curve: size of the training set
Two observations:
• performance improves as we move to
100 examples
• increasing to 200 brings a very small
benefit
What training set size

works for 5-fold CV?
What if training set = 200 examples?

• 5-fold: training sets of size 160 What if training set = 50 examples?
• Same as the performance for 200 • 5-fold: training sets of size 40
examples; hence no bias • Error will be overestimated; hence bias
Learning Curve: underfitting
The training loss remains flat The training loss continues to

regardless of training. decrease until the end of training.
Learning Curve: overfitting
The training loss continues to decrease The validation loss decreases to a

with experience, while the validation point and begins increasing again.
loss is stable.
Learning Curve: unrepresentative datasets
The training set is not representative The validation set is not representative
enough (few examples to support the enough (few examples to support the
validation set). training set).
Comparing Learning Curves
• Indication of high bias
As # of training examples increases:
o training error increases too
o model is not fitting the data with
an appropriate curve
Bias leads to comparable errors in

cross-validation and training
Adding more data does not help!
What should one do to reduce bias?

increase the number of features
add more complex features
Comparing Learning Curves
• Indication of high variance
Training error grows slowly but
well within the desired range
High variance leads to overfitting

and hence high test error
The model is forced to learn

more generalized properties
Adding more examples reduces

test error and the gap between
the curves closes down
What should one do to reduce variance?

get more data
reduce the number of features
Nested Cross Validation (5x2 CV)
Inner loop: optimize
parameters (2 splits)
Outer loop: evaluate

with optimal
parameters (5 splits)
Which model do
we finally deploy?
Cross-validation tests a model
building procedure and not a
single model instance
We deploy the feature set and

parameters obtained by
applying the winning procedure
to the whole dataset
Bootstrapping
• Create B replications (of the same size) of a training set of size N
by selecting examples uniformly with replacement
Predicted value of model

built on the bth replication
(i.e., bootstrap sample)
• Train on each replication b Any problem with that?

• Test on the whole training set
Leave-one-out bootstrap
• Train on each replication b
• Test each sample not in b, i.e., the out-of-bag set C-i against all replications
it does not appear in
Leave-one-out bootstrap
• Train on each replication b
• Test each sample not in b, i.e., the out-of-bag set C-i against all replications
it does not appear in
• Problems?
– the average number of distinct observations in each bootstrap sample is
about 0.632 N
– upward bias on the validation error
– behaves roughly like two-fold Cross Validation
.632 bootstrap
• Main idea:
– extension of leave-one-out bootstrap
– intuition: pulls the leave-one out bootstrap estimate down towards the
training error rate, and hence reduces its upward bias
training error Leave-one-out

(on the whole bootstrap error
training set)
Coming up next…
Jan 15 Introduction to machine learning
Jan 17 Regression analysis
Jan 18 Laboratory session I: numpy and linear regression
Jan 22 Machine learning pipelines
Jan 25 Laboratory session II: ML pipelines and grid search
Jan 29 Training neural networks
Jan 31* Laboratory session III: training NNs and tensorflow basics
Feb 6 Convolutional neural networks
Feb 7 Recurrent neural networks and autoencoders
Feb 12 Time series classification
Feb 15 Laboratory session IV: CNNs and RNNs
Feb 19 Deep time series classification
TODOs
• Reading:
– Main course book (ESL): Chapters 1, 2, and 7
• Prepare for this week’s lab:

– Start reading HML: Chapter 4

Lec1 Introduction

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec1 Introduction

Uploaded by

Copyright:

Available Formats

ML: Lecture 1

Introduction to Machine Learning

Panagiotis Papapetrou, PhD

• Office hours: by appointment only

*to be moved to Jan 31 from Feb 1.

• Written Exam 4.5hp

• It will examine both your knowlegde on several concepts

• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:

• Python Machine Learning

Research papers (pointers will be provided)

• The basic principle is to ask questions when you don’t

• Say when things are unclear; not everything can be

• Participate in the class as much as possible

• What are the different types of machine learning methods?

• What are examples of machine learning tasks in connection to

• What is the bias-variance trade-off?

• How do we evaluate machine learning models?

“Leaerning is the study of algorithms that improve

Electronic patient records

Smart phone data

Frac. of Zagreb No.

methylpentane good 0.40 3.46 2.44 19 4 6 5

Chemical compound data

Onboard maintenance tracking

Classification is the task of learning a Regression is the task of learning a

What determines a good fit?

What determines a good fit?

• The relationship between X and Y is non-deterministic (hence the

• We also assume that f is fixed and unknown, and we let 𝒇! denote

• The metric to optimize is MSE (mean squared error):

• Bias: over different realizations of the training set, how much is

• High variance may cause an algorithm to model random noise or erroneous

• Bias–variance decomposition: a way of analyzing the expected

Let us expand this now

Ideal scenario: plenty of data, so divide it into three compartments:

training validation test

• Each subset is given a chance to be in the test set

If k=N then leave-one-out Cross

What is good about LOOCV?

What training set size

What if training set = 200 examples?

The training loss remains flat The training loss continues to

The training loss continues to decrease The validation loss decreases to a

Bias leads to comparable errors in

Adding more data does not help!

What should one do to reduce bias?

High variance leads to overfitting

The model is forced to learn

Adding more examples reduces

What should one do to reduce variance?

Outer loop: evaluate

We deploy the feature set and

Predicted value of model

• Train on each replication b Any problem with that?

training error Leave-one-out

• Prepare for this week’s lab:

You might also like