Professional Documents
Culture Documents
L02 Fundamentals of ML
L02 Fundamentals of ML
Unit 2:
The Fundamentals of Machine Learning
Outline
• Machine Learning
Reference:
• Introduction
• Types of machine learning
• Challenges of Machine Learning
• The Machine Learning Framework
Géron Chapter 1
Problems with Traditional Programming
Traditional programming paradigm:
Deploy
Study the
Write rules Evaluate
problem
Analyze errors
3
The Machine Learning Framework
▪ Instead of handcraft rules, ML learns a model from training samples
▪ One learning algorithm for different problems
Training Deploy
Samples
Analyze errors
4
What is Machine Learning?
5
Why use Machine Learning?
▪ Some problems are too complex to solve by using rules
• An example: image classification
6
Why use machine learning?
Training Update
Deploy
Samples data
Can be
updated
Study the Train ML Evaluate
problem algorithm Solution
Analyze errors
7
Why use machine learning?
Lots of
training samples
8
Categories of Machine learning systems
1. Amount of supervision
9
Supervised Learning
10
Supervised Learning Tasks
Classification Regression
Classification predicts discrete Regression predicts continuous
valued output (e.g., present/not valued output (e.g., house price)
present) 400
300
Price (RM x1000)
Yes No 200
100
0 Size
0 500 1000 1500 2000 2500
Object Detection (Images with Car)
Housing Price Prediction
11
Supervised Learning Algorithms
13
Supervised Learning Applications
14
Unsupervised learning
x1
x1 15
Unsupervised Learning Task: Clustering
16
Unsupervised Learning Task: Anomaly detection
▪ Identify items, events or observations which do not conform
to an expected pattern or other items in a dataset
▪ Anomalies are also referred to as outliers, novelties or
noise
outliers
18
Unsupervised Learning Tasks and Algorithms
▪ Clustering
• k-Means
• Hierarchical Cluster Analysis (HCA)
• Expectation Maximization
19
Semi-supervised learning
Features
▪ Partially labeled training data. length width weight
Label
20
Reinforcement Learning
The agent
1. observes the environment
2. Select action using policy
3. Perform action
4. Get reward or penalty
22
Instance-based Learning
▪ Lazy learner: simply remembers all training samples
▪ Predict the label of new samples based on their similarity to training samples
▪ Need a measure of similarity between two samples
width f=pos
20
x xx ▪ The model is a straight line (linear
15
x x x
model) that separates the two
x x
x x categories of fishes in a 2-D
10 x x space.
x x
5
x ▪ Has the form:
f=neg
weight 𝑓 = 𝜃0 + 𝜃1 × weight + 𝜃2 × width
2 4 6 8
Once model is trained, no need to store training samples and fast prediction time24
A Simple Model-based Learning Example (1/6)
25
A Simple Model-based Learning Example (2/6)
y
(salary)
X (year)
26
A Simple Model-based Learning Example (3/6)
Training set
Test set
y
(salary)
X (year)
27
A Simple Model-based Learning Example (4/6)
𝑦 = 𝜃0 + 𝜃1 𝑥
prediction error
y
(salary)
X (year)
28
A Simple Model-based Learning Example (5/6)
y
(salary)
X (year)
29
A Simple Model-based Learning Example (6/6)
𝑦 = 𝜃0 + 𝜃1 𝑥
y
(salary)
X (year)
30
Challenges of Machine Learning
31
Insufficient training data
M. Banko & E. Brill (2001), Scaling to very very large corpora for Natural Language Disambiguation
A. Halevy, P. Norvig and F. Pereira (2009), The Unreasonable Effectiveness of Data,
32
Non-representative data
▪ Training data should be representative of the new cases that you want
to generalize to
▪ Consider if we add 7 missing countries :
without missing
countries
with missing
countries
34
Irrelevant features
▪ Selected features must be relevant to the task at hand. Having
irrelevant features in your data can decrease the accuracy of
the models.
• For example, area or perimeter length are irrelevant feature for
classifying shapes like circle and rectangle.
▪ Overfitting (high variance) may happen when our model is too complex
and fits too specifically to the training set, but it does not generalize
well to new data (low training error but high testing error).
𝑦 = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 +𝜃3 𝑥 3 𝑦 = 𝜃0 + 𝜃1 𝑥 𝑦 = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 + … + 𝜃9 𝑥 9
37
The Machine Learning Framework
▪ Divided into two phases
Training Phase: Testing Phase:
Learning Test
Algorithm
Tune
Hyperparameter
tuning & Train Model Final Model
Validation
Predict
38
Next: