L02 Fundamentals of ML

UCCD2063
Artificial Intelligence Techniques
Unit 2:
The Fundamentals of Machine Learning
Outline
• Machine Learning
Reference:
• Introduction
• Types of machine learning
• Challenges of Machine Learning
• The Machine Learning Framework
Géron Chapter 1
Problems with Traditional Programming
Traditional programming paradigm:
Deploy
Study the
Write rules Evaluate
problem
Analyze errors
▪ Problems of traditional programming paradigm:

• complexity – too many rules, very hard to cover all aspects of the
problem, different problems require different rules
• static – cannot adapt to new input, need to keep writing new rules,
very hard to maintain
3
The Machine Learning Framework
▪ Instead of handcraft rules, ML learns a model from training samples
▪ One learning algorithm for different problems
Training Deploy
Samples
Study the Training ML Evaluate

problem algorithm Model Solution
Analyze errors
4
What is Machine Learning?
“Machine Learning: Field of study

that gives computers the ability to
learn (from data) without being
explicitly programmed. ”
[Arthur Samuel (1959)]
5
Why use Machine Learning?
▪ Some problems are too complex to solve by using rules
• An example: image classification
Input Complex, specific Output

program “car”
if this local region looks like a side mirror,
and
if this local region area looks like a wheel
then classify image as a car
Will work if given the same image

again, but, given new images, the
algorithm is bound to fail
6
Why use machine learning?
▪ Good for problems that evolves with time

• Machine learning can automatically rebuild the model when
necessary
• Example: spam classifier – learns new spam words when it become
unusually frequent in spam flagged by users
Training Update
Deploy
Samples data
Can be
updated
Study the Train ML Evaluate
problem algorithm Solution
Analyze errors
7
Why use machine learning?
▪ Help humans learn

• Can inspect ML to see what they learn
− may find unsuspected correlations or new trends
− learn the problem better
− Example: can examine the list of words or its combination that ML identifies as the
best predictors of spam filter
Lots of
training samples
Study the Train ML

Solution
problem algorithm
Iterate if Understand the Inspect the

necessary problem better solution
8
Categories of Machine learning systems
1. Amount of supervision
supervised learning semi-supervised learning

unsupervised learning reinforcement learning
2. Comparing samples / building predictive models
instance-based learning model-based learning
9
Supervised Learning
▪ In supervised learning, the algorithm is given some example

input-output pair and it learns a function that maps from input
to output
▪ input: the set of features used to describe the samples
▪ output: the attribute we are interested to predict
▪ Example: Fruit classification
Input Output
Features
length width weight Label
fruit 1 165 38 172 Banana
fruit 2 218 39 230 Function Banana
Samples
fruit 3 76 80 145 Orange
fruit 4 145 35 150 Banana
… … … … …
fruit m … … … …
10
Supervised Learning Tasks
Two main supervised learning tasks
Classification Regression
Classification predicts discrete Regression predicts continuous
valued output (e.g., present/not valued output (e.g., house price)
present) 400
300
Price (RM x1000)
Yes No 200
100
0 Size
0 500 1000 1500 2000 2500
Object Detection (Images with Car)
Housing Price Prediction
11
Supervised Learning Algorithms
▪ Algorithms for classification:

• k-Nearest neighbour (k-NN) – instance-based
• Logistic Regression
• Decision Tree and Random Forests
• Support Vector Machine (SVM)
• Neural Networks (NN)
• ...
▪ Algorithms for regression:

• K-NN Regressor – instance-based
• Linear Regression
• Decicion Tree and Random Forests
• SVM Regressor
• NN
• Non-linear Regression ...
12
Supervised Learning Applications
Digit Classification Input (Image): Output:

▪ Input: images / pixel grids 0
▪ Output: a digit 0-9
Feature 1
▪ Features:
Extraction
• Intensity values of pixels
2
• Histogram of gradients
• Shape Patterns: NumComponents, 1
AspectRatio, NumLoops
• … ??
Spam mail classification Input (Text): Output:

▪ Input: an email Dear Andy,
▪ Output: spam or non-spam How you are doing, buddy? Hopefully you
are adapting well to your new school. All
▪ Features: of us miss you dearly here. We miss your
silly jokes.
• Words: FREE!, Earn, Call now,..
• Text Patterns: $$$, ALL CAPS Hello, I have a special offer for you…
WANT TO LOSE WEIGHT? The most
• Non-text: SenderInContacts powerful weight loss is now available
• … without prescription.
13
Supervised Learning Applications
Predicting the house price in the Boston area (Regression)

▪ Input: numerical and categorical data
▪ Output: house price (in 1000usd)
▪ Features:
• CRIM: per capita crime rate by town
• RM: average number of rooms per dwelling
• CHAS: Charles River dummy variable (= 1 if near river; 0 otherwise) (categorical)
• RAD: index of accessibility to radial highways
• TAX: full-value property-tax rate per $10,000
• INDUS: proportion of non-retail business acres per town
• NOX: nitric oxides concentration (parts per 10 million)
• …
14
Unsupervised learning
Unsupervised Learning Features

▪ No labels are provided for all training length width weight
fruit 1 165 38 172
samples fruit 2 218 39 230
▪ Discovers the underlying structure, fruit 3 76 80 145
fruit 4 145 35 150
relationship or patterns based only on … … … …
the features of the training sample fruit m … … …
Supervised Learning Unsupervised Learning

Positive
samples cluster
x x
x
vs
x2 x
x x2
Negative Decision Samples (no labels)
samples Boundary
x1
x1 15
Unsupervised Learning Task: Clustering
▪ Detect groups of similar samples

▪ Example:
Detecting groups of visitors who visit your blog
Example analysis:
Time of day • 40% visitors who love comic
books and read in the evening
night • 20% are young sci-fi lovers who
visit before school, etc.
day How does it help?

Age • Can target your posts for each
young -> old group
16
Unsupervised Learning Task: Anomaly detection
▪ Identify items, events or observations which do not conform
to an expected pattern or other items in a dataset
▪ Anomalies are also referred to as outliers, novelties or
noise
outliers
Applications: Bank fraud, medical problems or errors in a text. 17

Unsupervised Learning Task: Association Rule Mining
▪ Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other items
in the transaction.
18
Unsupervised Learning Tasks and Algorithms
▪ Clustering
• k-Means
• Hierarchical Cluster Analysis (HCA)
• Expectation Maximization
▪ Association rule learning

• Apriori
• Eclat
19
Semi-supervised learning
Features
▪ Partially labeled training data. length width weight
Label
Typically more unlabeled data fruit 1 165 38 172 Banana

fruit 2 218 39 230 ?
than labeled fruit 3 76 80 145 Orange
?
▪ Most semi-supervised fruit 4 145 35 150
…
… … … …
algorithms are combinations fruit m … … … …
of unsupervised algorithms
and supervised algorithms
20
Reinforcement Learning
▪ No training set is provided

▪ Learns based on the feedback of the environment:
• effect of the agent' action on the environment (state)
• rewards of taking a particular action
▪ Learns by itself the best policy to get the most reward
over time
Policy
Applications: games (chess, go, video), robotics, traffic control, trading,... 21

Reinforcement Learning
The agent
1. observes the environment
2. Select action using policy
3. Perform action
4. Get reward or penalty
5. Update policy (learning step)

6. Iterate until an optimal policy
is found
22
Instance-based Learning
▪ Lazy learner: simply remembers all training samples
▪ Predict the label of new samples based on their similarity to training samples
▪ Need a measure of similarity between two samples
▪ Example: spam email detection:

• Remember all spam emails and flag emails that are similar to them
• Similarity measure: count the number of words that are common
No training time, but require storage for samples and slow prediction time 23
Model-based Learning
▪ Generalize from a set of examples by building a model of these

examples
▪ Then, use the model to make predictions on unseen data
width f=pos
20
x xx ▪ The model is a straight line (linear
15
x x x
model) that separates the two
x x
x x categories of fishes in a 2-D
10 x x space.
x x
5
x ▪ Has the form:
f=neg
weight 𝑓 = 𝜃0 + 𝜃1 × weight + 𝜃2 × width
2 4 6 8
Salmon 𝜃0 , 𝜃1 , 𝜃2 = model parameters

x Sea-bass
Once model is trained, no need to store training samples and fast prediction time24
A Simple Model-based Learning Example (1/6)
▪ Problem: want to predict IT salary

▪ Step 1: Collect Data
• Consult domain expert/survey/research what key factors
(features) affecting IT salary
− Experience in years 
− Job title
− Size of organization
Samples
− Gender X (year) y (salary, k)
− Industry sector 5 38
10 64
− Geographic region 7 36
− ... 8 44
0 na
• Collect the data … …
− From survey/database
− Data processing for missing/incomplete data, normalization (see L03)
25
▪ Step 2: Model selection

• Study the data and determine what model is suitable
− For this plot, select linear regression
y
(salary)
X (year)
26
▪ Step 3: Train model

• Separate data into training set (80%) and test set (20%)
− Train model on training set, validate model using test set
Training set
Test set
y
(salary)
X (year)
27
▪ Step 3: Train model (cont.)

• Train model with training set
− Use normal equation or gradient descent (see L05)
− Minimize sum-of-squared error (SSE) – training error
𝑦 = 𝜃0 + 𝜃1 𝑥
prediction error
y
(salary)
X (year)
28
▪ Step 4: Validate model

• Validate the model using test set – test error
y
(salary)
X (year)
29
▪ Step 5: Deploy model

• Use the trained model for prediction
𝑦 = 𝜃0 + 𝜃1 𝑥
y
(salary)
X (year)
30
Challenges of Machine Learning
▪ Insufficient quantity of training data

▪ Non-representative training data
▪ Poor quality data
▪ Irrelevant features
▪ Underfitting & overfitting the training data
31
Insufficient training data
▪ Human can learn from very few samples

▪ Machine still needs a lot of samples to work properly: thousands for
simple and millions for complex problems
(M. Banko & E. Brill, 2001) ▪ Given enough data, the

performance of different ML
algorithms are similar
▪ Data matters more than algorithms
▪ Should we invest on data /
algorithm?
Not easy or cheap to get extra
training data. So, do not abandon
algorithm yet
M. Banko & E. Brill (2001), Scaling to very very large corpora for Natural Language Disambiguation
A. Halevy, P. Norvig and F. Pereira (2009), The Unreasonable Effectiveness of Data,
32
Non-representative data
▪ Training data should be representative of the new cases that you want
to generalize to
▪ Consider if we add 7 missing countries :
without missing
countries
with missing
countries
Previous model does not generalize

well !
▪ Causes of non-representative data:

• Sampling noise especially if sample is too small
• Sampling bias if the sampling method is flawed 33
Poor-quality data
▪ Training data may contain:

• Errors
• Outliers
• Missing values
• Noise
▪ Harder for the ML algorithm to detect the underlying pattern
▪ Data cleaning: Most data scientists spend a significant time to clean the
data. For example:
• Remove outliers or fix the error manually
• Fill up missing value
• Drop columns with many missing values
• Drop instances with errors
34
Irrelevant features
▪ Selected features must be relevant to the task at hand. Having
irrelevant features in your data can decrease the accuracy of
the models.
• For example, area or perimeter length are irrelevant feature for
classifying shapes like circle and rectangle.
▪ Feature engineering is the process to come up with a good set

of features to train on. It involves:
• feature extraction – use some tools to extract features from
samples (e.g., extract color histograms from images)
• feature selection – selecting the most useful features to
train on among existing features in samples
Deep learning learns features automatically but requires lots of training data. 35
Underfitting & Overfitting
▪ Underfitting (high bias) may happen when our model is over-simplified
or not expressive enough (both training error and test error are high).
▪ Overfitting (high variance) may happen when our model is too complex
and fits too specifically to the training set, but it does not generalize
well to new data (low training error but high testing error).
Just Right Underfit / High Bias Overfitting/High Variance
𝑦 = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 +𝜃3 𝑥 3 𝑦 = 𝜃0 + 𝜃1 𝑥 𝑦 = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 + … + 𝜃9 𝑥 9
green = ground truth model (unknown), red = fitted model

36
Hyperparameter Tuning
▪ Hyperparameter is a parameter whose value is set before the learning

process begins and which is used to control the learning process.
• For example, % of train-test split is a hyperparameter
▪ Different model has different set of hyperparameters. For example,
• K-NN: n_neighbors, distance metric (Manhattan or Euclidean)
• Neural Networks: α (learning rate), max_iter (maximum iterations)
• SVM: kernel (linear, rbf), C (penalty parameter)
▪ In machine learning, hyperparameter tuning is the process of choosing
a set of optimal hyperparameters for a learning algorithm.
▪ Need to balance between fitting the data perfectly and keeping the
model simple to ensure it generalizes well (avoid underfit & overfit)
▪ Hyperparameter tuning is an important step of building a machine
learning system.
37
The Machine Learning Framework
▪ Divided into two phases
Training Phase: Testing Phase:
Raw Data Data Preprocessing

(cleaning, normalization, dimension reduction, etc.)
Training Validation Testing

Set Set Set
Learning Test
Algorithm
Tune
Hyperparameter
tuning & Train Model Final Model
Validation
Predict
38
Next:
The Regression Pipeline

L02 Fundamentals of ML

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L02 Fundamentals of ML

Uploaded by

Copyright:

Available Formats

UCCD2063

Artificial Intelligence Techniques

▪ Problems of traditional programming paradigm:

Study the Training ML Evaluate

“Machine Learning: Field of study

Input Complex, specific Output

Will work if given the same image

▪ Good for problems that evolves with time

▪ Help humans learn

Study the Train ML

Iterate if Understand the Inspect the

supervised learning semi-supervised learning

2. Comparing samples / building predictive models

instance-based learning model-based learning

▪ In supervised learning, the algorithm is given some example

Two main supervised learning tasks

▪ Algorithms for classification:

▪ Algorithms for regression:

Digit Classification Input (Image): Output:

Spam mail classification Input (Text): Output:

Predicting the house price in the Boston area (Regression)

Unsupervised Learning Features

Supervised Learning Unsupervised Learning

▪ Detect groups of similar samples

day How does it help?

Applications: Bank fraud, medical problems or errors in a text. 17

▪ Given a set of transactions, find rules that will predict the

▪ Association rule learning

Typically more unlabeled data fruit 1 165 38 172 Banana

▪ No training set is provided

Applications: games (chess, go, video), robotics, traffic control, trading,... 21

5. Update policy (learning step)

▪ Example: spam email detection:

▪ Generalize from a set of examples by building a model of these

Salmon 𝜃0 , 𝜃1 , 𝜃2 = model parameters

▪ Problem: want to predict IT salary

▪ Step 2: Model selection

▪ Step 3: Train model

▪ Step 3: Train model (cont.)

▪ Step 4: Validate model

▪ Step 5: Deploy model

▪ Insufficient quantity of training data

▪ Human can learn from very few samples

(M. Banko & E. Brill, 2001) ▪ Given enough data, the

Previous model does not generalize

▪ Causes of non-representative data:

▪ Training data may contain:

▪ Feature engineering is the process to come up with a good set

Just Right Underfit / High Bias Overfitting/High Variance

green = ground truth model (unknown), red = fitted model

▪ Hyperparameter is a parameter whose value is set before the learning

Raw Data Data Preprocessing

Training Validation Testing

The Regression Pipeline

You might also like