EE2211 Introduction To Machine Learning

EE2211 Introduction to Machine
Learning
Lecture 8
Thomas Yeo
thomas.yeo@nus.edu.sg
Electrical and Computer Engineering Department

National University of Singapore
Acknowledgement: EE2211 development team

Thomas Yeo, Kar-Ann Toh, Chen Khong Tham, Helen Zhou, Robby Tan & Haizhou
© Copyright EE, NUS. All Rights Reserved.

Course Contents
• Introduction and Preliminaries (Haizhou)
– Introduction
– Data Engineering
– Introduction to Probability and Statistics
• Fundamental Machine Learning Algorithms I (Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Haizhou)
– Performance Issues
– K-means Clustering
– Neural Networks
2
Fundamental ML Algorithms:
Optimization, Gradient Descent
Module III Contents

• Overfitting, underfitting and model complexity
• Regularization
• Bias-variance trade-off
• Loss function
• Optimization
• Gradient descent
• Decision trees
• Random forest
3
Review
• Supervised learning: given feature(s) , we want to predict
target
• Most supervised learning algorithms can be formulated as the
following optimization problem
• Data-Loss(w) quantifies fitting error to training set given

parameters w: smaller error => better fit to training data
• Regularization(w) penalizes more complex models
• For example, in the case of polynomial regression (previous
lectures):
4
Review
target

lectures):
5
Review
target

lectures):
Data-Loss(w) Reg(w)
6
Review
7
Review
is prediction of -th is target of -th

training sample training sample
8
Review
Bias/Offset
Feature 1 of i-th sample
9
Review
Bias/Offset
Bias/Offset
is feature
of i-th sample
10
Loss Function & Learning Model
11
12
13
14
15
Cost Function Loss Function Learning Model Regularization
16
17
Cost Function Loss Function Learning Model Regularization
18
Building Blocks of ML algorithms
• Learning model reflects our belief about the relationship between the
features & target
• Loss function is the penalty for predicting when the true value is
• Regularization encourages less complex models
• Cost function is the final optimization criterion we want to minimize
• Optimization routine to find solution to cost function
19
features & target
20
features & target
21
features & target
22
features & target
23
Motivation for Gradient Descent
• Different learning function , loss function & regularization
give rise to different learning algorithms
• In polynomial regression (previous lectures), optimal can
be written with the following “closed-form” formula (primal
solution):
• For other learning function , loss function &

regularization , optimizing might not be so easy
• Usually have to estimate iteratively with some algorithm
• Optimization workhorse for modern machine learning is
gradient descent
24
solution):

gradient descent
25
solution):

gradient descent
26
solution):

gradient descent
27
solution):

gradient descent
28
Questions?
29
Gradient Descent Algorithm
30
31
32
33
34
35
36
According to multi-variable
calculus, if eta is not too big,
then => we get better after
each iteration
37
According to multi-variable
calculus, if eta is not too big,
then => we get better after
each iteration
38
39
• Possible convergence criteria

– Set maximum iteration
– Check percentage or absolute change in below a threshold
• Gradient descent can only find local minimum
– Because gradient = 0 at local minimum, so won’t change after that
• Many variations of gradient descent, e.g., change how gradient is
computed or learning rate decreases with increasing
40
• Possible convergence criteria Local Minimum

Global Minimum
41
• Possible convergence criteria Local Minimum

Global Minimum
42
Questions?
43
Find to minimize
• Obviously minimum corresponds to , but let’s do gradient
descent
– Gradient
– Initialize , learning rate
– At each iteration,
44
Find to minimize
descent
– Gradient
45
Find to minimize
descent
– Gradient
46
Find to minimize
descent
– Gradient
47
Find to minimize
descent
– Gradient
48
Find to minimize
descent
– Gradient
49
Find to minimize
descent
– Gradient
50
Find to minimize
descent
– Gradient
51
Find to minimize
descent
– Gradient
52
Find to minimize
descent
– Gradient
53
Find to minimize
descent
– Gradient
54
Find to minimize
descent
– Gradient
55
Find to minimize
descent
– Gradient
56
Find to minimize
descent
– Gradient
57
Find to minimize
descent
– Gradient
58
Find to minimize
descent
– Gradient
is just nice => rapid convergence
59
What if learning rate is too small?
descent
– Gradient
60
descent
– Gradient
61
descent
– Gradient
62
descent
– Gradient
63
descent
– Gradient
64
descent
– Gradient
65
descent
– Gradient
66
descent
– Gradient
67
descent
– Gradient
68
descent
– Gradient
69
descent
– Gradient
too small => slow convergence
70
What if learning rate is too big?
descent
– Gradient
71
descent
– Gradient
72
descent
– Gradient
73
descent
– Gradient
74
descent
– Gradient
75
descent
– Gradient
too big => overshoot local

minimum => slow convergence
or may never converge
76
Questions?
77
Different Learning Models
78
79
80
81
82
83
ReLU Exponential
84
ReLU Exponential
85
ReLU Exponential
86
Different Loss Functions
87
88
89
90
91
92
Logistic Loss
93
94
Questions?
95
Summary
• Building blocks of machine learning algorithms
– Learning model: reflects our belief about relationship between features
& target we want to predict
– Loss function: penalty for wrong prediction
– Regularization: penalizes complex models
– Optimization routine: find minimum of overall cost function
• Gradient descent algorithm
– At each iteration, compute gradient & update model parameters in
direction opposite to gradient
– If learning rate is too big => may not converge
– If learning rate is too small => converge very slowly
• Different learning models, e.g., linear, polynomial, sigmoid,
ReLU, exponential, etc
• Different loss functions, e.g., square error, binary, logistic, etc
96
Summary
97
Summary
98
Summary
99

EE2211 Introduction To Machine Learning

Uploaded by

Copyright:

Available Formats

You might also like

EE2211 Introduction To Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EE2211 Introduction To Machine Learning

Uploaded by

Copyright:

Available Formats

EE2211 Introduction to Machine

Electrical and Computer Engineering Department

Acknowledgement: EE2211 development team

© Copyright EE, NUS. All Rights Reserved.

Module III Contents

• Data-Loss(w) quantifies fitting error to training set given

• Data-Loss(w) quantifies fitting error to training set given

• Data-Loss(w) quantifies fitting error to training set given

is prediction of -th is target of -th

Cost Function Loss Function Learning Model Regularization

Cost Function Loss Function Learning Model Regularization

• For other learning function , loss function &

• For other learning function , loss function &

• For other learning function , loss function &

• For other learning function , loss function &

• For other learning function , loss function &

• Possible convergence criteria

• Possible convergence criteria Local Minimum

• Possible convergence criteria Local Minimum

is just nice => rapid convergence

too small => slow convergence

too big => overshoot local

You might also like