Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

UCCD2063

Artificial Intelligence Techniques

Unit 2:
The Fundamentals of Machine Learning
Outline
• Machine Learning
Reference:
• Introduction
• Types of machine learning
• Challenges of Machine Learning
• The Machine Learning Framework

Géron Chapter 1
Problems with Traditional Programming
Traditional programming paradigm:
Deploy

Study the
Write rules Evaluate
problem

Analyze errors

▪ Problems of traditional programming paradigm:


• complexity – too many rules, very hard to cover all aspects of the
problem, different problems require different rules
• static – cannot adapt to new input, need to keep writing new rules,
very hard to maintain

3
The Machine Learning Framework
▪ Instead of handcraft rules, ML learns a model from training samples
▪ One learning algorithm for different problems

Training Deploy
Samples

Study the Training ML Evaluate


problem algorithm Model Solution

Analyze errors

4
What is Machine Learning?

“Machine Learning: Field of study


that gives computers the ability to
learn (from data) without being
explicitly programmed. ”
[Arthur Samuel (1959)]

5
Why use Machine Learning?
▪ Some problems are too complex to solve by using rules
• An example: image classification

Input Complex, specific Output


program “car”
if this local region looks like a side mirror,
and
if this local region area looks like a wheel
then classify image as a car

Will work if given the same image


again, but, given new images, the
algorithm is bound to fail

6
Why use machine learning?

▪ Good for problems that evolves with time


• Machine learning can automatically rebuild the model when
necessary
• Example: spam classifier – learns new spam words when it become
unusually frequent in spam flagged by users

Training Update
Deploy
Samples data

Can be
updated
Study the Train ML Evaluate
problem algorithm Solution

Analyze errors

7
Why use machine learning?

▪ Help humans learn


• Can inspect ML to see what they learn
− may find unsuspected correlations or new trends
− learn the problem better
− Example: can examine the list of words or its combination that ML identifies as the
best predictors of spam filter

Lots of
training samples

Study the Train ML


Solution
problem algorithm

Iterate if Understand the Inspect the


necessary problem better solution

8
Categories of Machine learning systems

1. Amount of supervision

supervised learning semi-supervised learning


unsupervised learning reinforcement learning

2. Comparing samples / building predictive models

instance-based learning model-based learning

9
Supervised Learning

▪ In supervised learning, the algorithm is given some example


input-output pair and it learns a function that maps from input
to output
▪ input: the set of features used to describe the samples
▪ output: the attribute we are interested to predict
▪ Example: Fruit classification
Input Output
Features
length width weight Label
fruit 1 165 38 172 Banana
fruit 2 218 39 230 Function Banana
Samples
fruit 3 76 80 145 Orange
fruit 4 145 35 150 Banana
… … … … …
fruit m … … … …

10
Supervised Learning Tasks

Two main supervised learning tasks

Classification Regression
Classification predicts discrete Regression predicts continuous
valued output (e.g., present/not valued output (e.g., house price)
present) 400

300
Price (RM x1000)
Yes No 200

100

0 Size
0 500 1000 1500 2000 2500
Object Detection (Images with Car)
Housing Price Prediction

11
Supervised Learning Algorithms

▪ Algorithms for classification:


• k-Nearest neighbour (k-NN) – instance-based
• Logistic Regression
• Decision Tree and Random Forests
• Support Vector Machine (SVM)
• Neural Networks (NN)
• ...

▪ Algorithms for regression:


• K-NN Regressor – instance-based
• Linear Regression
• Decicion Tree and Random Forests
• SVM Regressor
• NN
• Non-linear Regression ...
12
Supervised Learning Applications

Digit Classification Input (Image): Output:


▪ Input: images / pixel grids 0
▪ Output: a digit 0-9
Feature 1
▪ Features:
Extraction
• Intensity values of pixels
2
• Histogram of gradients
• Shape Patterns: NumComponents, 1
AspectRatio, NumLoops
• … ??

Spam mail classification Input (Text): Output:


▪ Input: an email Dear Andy,
▪ Output: spam or non-spam How you are doing, buddy? Hopefully you
are adapting well to your new school. All
▪ Features: of us miss you dearly here. We miss your
silly jokes.
• Words: FREE!, Earn, Call now,..
• Text Patterns: $$$, ALL CAPS Hello, I have a special offer for you…
WANT TO LOSE WEIGHT? The most
• Non-text: SenderInContacts powerful weight loss is now available
• … without prescription.

13
Supervised Learning Applications

Predicting the house price in the Boston area (Regression)


▪ Input: numerical and categorical data
▪ Output: house price (in 1000usd)
▪ Features:
• CRIM: per capita crime rate by town
• RM: average number of rooms per dwelling
• CHAS: Charles River dummy variable (= 1 if near river; 0 otherwise) (categorical)
• RAD: index of accessibility to radial highways
• TAX: full-value property-tax rate per $10,000
• INDUS: proportion of non-retail business acres per town
• NOX: nitric oxides concentration (parts per 10 million)
• …

14
Unsupervised learning

Unsupervised Learning Features


▪ No labels are provided for all training length width weight
fruit 1 165 38 172
samples fruit 2 218 39 230
▪ Discovers the underlying structure, fruit 3 76 80 145
fruit 4 145 35 150
relationship or patterns based only on … … … …
the features of the training sample fruit m … … …

Supervised Learning Unsupervised Learning


Positive
samples cluster
x x
x
vs
x2 x
x x2
Negative Decision Samples (no labels)
samples Boundary

x1
x1 15
Unsupervised Learning Task: Clustering

▪ Detect groups of similar samples


▪ Example:
Detecting groups of visitors who visit your blog
Example analysis:
Time of day • 40% visitors who love comic
books and read in the evening
night • 20% are young sci-fi lovers who
visit before school, etc.

day How does it help?


Age • Can target your posts for each
young -> old group

16
Unsupervised Learning Task: Anomaly detection
▪ Identify items, events or observations which do not conform
to an expected pattern or other items in a dataset
▪ Anomalies are also referred to as outliers, novelties or
noise

outliers

Applications: Bank fraud, medical problems or errors in a text. 17


Unsupervised Learning Task: Association Rule Mining

▪ Given a set of transactions, find rules that will predict the


occurrence of an item based on the occurrences of other items
in the transaction.

18
Unsupervised Learning Tasks and Algorithms

▪ Clustering
• k-Means
• Hierarchical Cluster Analysis (HCA)
• Expectation Maximization

▪ Association rule learning


• Apriori
• Eclat

19
Semi-supervised learning
Features
▪ Partially labeled training data. length width weight
Label

Typically more unlabeled data fruit 1 165 38 172 Banana


fruit 2 218 39 230 ?
than labeled fruit 3 76 80 145 Orange
?
▪ Most semi-supervised fruit 4 145 35 150

… … … …
algorithms are combinations fruit m … … … …
of unsupervised algorithms
and supervised algorithms

20
Reinforcement Learning

▪ No training set is provided


▪ Learns based on the feedback of the environment:
• effect of the agent' action on the environment (state)
• rewards of taking a particular action
▪ Learns by itself the best policy to get the most reward
over time
Policy

Applications: games (chess, go, video), robotics, traffic control, trading,... 21


Reinforcement Learning

The agent
1. observes the environment
2. Select action using policy

3. Perform action
4. Get reward or penalty

5. Update policy (learning step)


6. Iterate until an optimal policy
is found

22
Instance-based Learning
▪ Lazy learner: simply remembers all training samples
▪ Predict the label of new samples based on their similarity to training samples
▪ Need a measure of similarity between two samples

▪ Example: spam email detection:


• Remember all spam emails and flag emails that are similar to them
• Similarity measure: count the number of words that are common
No training time, but require storage for samples and slow prediction time 23
Model-based Learning

▪ Generalize from a set of examples by building a model of these


examples
▪ Then, use the model to make predictions on unseen data

width f=pos
20
x xx ▪ The model is a straight line (linear
15
x x x
model) that separates the two
x x
x x categories of fishes in a 2-D
10 x x space.
x x
5
x ▪ Has the form:
f=neg
weight 𝑓 = 𝜃0 + 𝜃1 × weight + 𝜃2 × width
2 4 6 8

Salmon 𝜃0 , 𝜃1 , 𝜃2 = model parameters


x Sea-bass

Once model is trained, no need to store training samples and fast prediction time24
A Simple Model-based Learning Example (1/6)

▪ Problem: want to predict IT salary


▪ Step 1: Collect Data
• Consult domain expert/survey/research what key factors
(features) affecting IT salary
− Experience in years 
− Job title
− Size of organization
Samples
− Gender X (year) y (salary, k)
− Industry sector 5 38
10 64
− Geographic region 7 36
− ... 8 44
0 na
• Collect the data … …
− From survey/database
− Data processing for missing/incomplete data, normalization (see L03)

25
A Simple Model-based Learning Example (2/6)

▪ Step 2: Model selection


• Study the data and determine what model is suitable
− For this plot, select linear regression

y
(salary)

X (year)

26
A Simple Model-based Learning Example (3/6)

▪ Step 3: Train model


• Separate data into training set (80%) and test set (20%)
− Train model on training set, validate model using test set

Training set
Test set

y
(salary)

X (year)

27
A Simple Model-based Learning Example (4/6)

▪ Step 3: Train model (cont.)


• Train model with training set
− Use normal equation or gradient descent (see L05)
− Minimize sum-of-squared error (SSE) – training error

𝑦 = 𝜃0 + 𝜃1 𝑥
prediction error

y
(salary)

X (year)
28
A Simple Model-based Learning Example (5/6)

▪ Step 4: Validate model


• Validate the model using test set – test error

y
(salary)

X (year)

29
A Simple Model-based Learning Example (6/6)

▪ Step 5: Deploy model


• Use the trained model for prediction

𝑦 = 𝜃0 + 𝜃1 𝑥
y
(salary)

X (year)

30
Challenges of Machine Learning

▪ Insufficient quantity of training data


▪ Non-representative training data
▪ Poor quality data
▪ Irrelevant features
▪ Underfitting & overfitting the training data

31
Insufficient training data

▪ Human can learn from very few samples


▪ Machine still needs a lot of samples to work properly: thousands for
simple and millions for complex problems

(M. Banko & E. Brill, 2001) ▪ Given enough data, the


performance of different ML
algorithms are similar
▪ Data matters more than algorithms
▪ Should we invest on data /
algorithm?
Not easy or cheap to get extra
training data. So, do not abandon
algorithm yet

M. Banko & E. Brill (2001), Scaling to very very large corpora for Natural Language Disambiguation
A. Halevy, P. Norvig and F. Pereira (2009), The Unreasonable Effectiveness of Data,
32
Non-representative data

▪ Training data should be representative of the new cases that you want
to generalize to
▪ Consider if we add 7 missing countries :
without missing
countries

with missing
countries

Previous model does not generalize


well !

▪ Causes of non-representative data:


• Sampling noise especially if sample is too small
• Sampling bias if the sampling method is flawed 33
Poor-quality data

▪ Training data may contain:


• Errors
• Outliers
• Missing values
• Noise
▪ Harder for the ML algorithm to detect the underlying pattern
▪ Data cleaning: Most data scientists spend a significant time to clean the
data. For example:
• Remove outliers or fix the error manually
• Fill up missing value
• Drop columns with many missing values
• Drop instances with errors

34
Irrelevant features
▪ Selected features must be relevant to the task at hand. Having
irrelevant features in your data can decrease the accuracy of
the models.
• For example, area or perimeter length are irrelevant feature for
classifying shapes like circle and rectangle.

▪ Feature engineering is the process to come up with a good set


of features to train on. It involves:
• feature extraction – use some tools to extract features from
samples (e.g., extract color histograms from images)
• feature selection – selecting the most useful features to
train on among existing features in samples
Deep learning learns features automatically but requires lots of training data. 35
Underfitting & Overfitting
▪ Underfitting (high bias) may happen when our model is over-simplified
or not expressive enough (both training error and test error are high).

▪ Overfitting (high variance) may happen when our model is too complex
and fits too specifically to the training set, but it does not generalize
well to new data (low training error but high testing error).

Just Right Underfit / High Bias Overfitting/High Variance

𝑦 = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 +𝜃3 𝑥 3 𝑦 = 𝜃0 + 𝜃1 𝑥 𝑦 = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 + … + 𝜃9 𝑥 9

green = ground truth model (unknown), red = fitted model


36
Hyperparameter Tuning

▪ Hyperparameter is a parameter whose value is set before the learning


process begins and which is used to control the learning process.
• For example, % of train-test split is a hyperparameter
▪ Different model has different set of hyperparameters. For example,
• K-NN: n_neighbors, distance metric (Manhattan or Euclidean)
• Neural Networks: α (learning rate), max_iter (maximum iterations)
• SVM: kernel (linear, rbf), C (penalty parameter)
▪ In machine learning, hyperparameter tuning is the process of choosing
a set of optimal hyperparameters for a learning algorithm.
▪ Need to balance between fitting the data perfectly and keeping the
model simple to ensure it generalizes well (avoid underfit & overfit)
▪ Hyperparameter tuning is an important step of building a machine
learning system.

37
The Machine Learning Framework
▪ Divided into two phases
Training Phase: Testing Phase:

Raw Data Data Preprocessing


(cleaning, normalization, dimension reduction, etc.)

Training Validation Testing


Set Set Set

Learning Test
Algorithm
Tune

Hyperparameter
tuning & Train Model Final Model
Validation
Predict

38
Next:

The Regression Pipeline

You might also like