Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

1/14

CHAPTER 7
Supervised & Unsupervised
Learning
CSC3005 – INTRODUCTION TO DATA SCIENCE
MACHINE LEARNING
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.” (Tom Mitchell, 1997,
Carnegie Mellon University)

Experience
* Task
= Performance

Input data: Task: Performance: Machine learning is


the study of
Movie rating Recommend movie Movie is enjoyed computer programs
Shopping cart Segment customers Increased sales that allow
Photo gallery Classify image Correctly labeled images computer programs
Traffic flow Predict congestion Congestion is balanced
Symptoms Diagnose disease Accurate diagnosis
to automatically
Stock records Predict inventory balance Always ready stock improve through
Mixed color objects Cluster by colour Grouped objects experience

2
Imagine that you want to
build a machine that can
recognize different
objects, such as
separating orange from
apple

3
You need to train the machine by
providing examples of apples and
oranges to make the machine
experience, recognize and learn the
difference among the fruits.

You will be satisfied with the machine


if it can categorize the fruits
correctly. 4
The machine learns from the features of each
piece of fruit, such as color, width, smell, texture
and weight to distinguish apple from orange to
come out with a model. Some features are more
important than the other, such as color,
compared to weight. The model is then
evaluated by testing it to categorize new object.
This approach is called supervised learning.

The machine may also separate a piece of fruit


from the other by comparing the fruits and
finding their similarity or difference, until
eventually building clusters or groups of similar
objects. This approach is called unsupervised
learning.

The machine may learn about the fruits by


guessing and getting reward/penalty and
adjusting its understanding until it achieves
consistently correct prediction. This is
reinforcement learning approach.
5
orange

100%
apple
ACCURACY
classify

These are oranges train SUPERVISED LEARNING pattern

Feature ALGORITHM LEARNED MODEL


vector
with Decision Tree, Naïve Bayes, Regression, test

?
label Neural Network, Support Vector Machine

These are apples ?


INPUT RAW DATA WITH LABELS INPUT NEW DATA TO OBTAIN LABEL

Experience Task Performance


6
7/14

Supervised learning

 Training data you feed to the algorithm includes the desired solutions, called LABELS
 Classification – Spam filter
 SpamAssassin
1

Cluster1 Cluster2 Silhoutte Score

Clustering

UNSUPERVISED
Feature LEARNING ALGORITHM pattern LEARNED MODEL 1
vector
without Clustering: K-Means, K-Medoid
label Association Rules: Apriori Association Support
Rules
A bunch of fruits
If color=orange and weight > 80 then orange
Else If color=red and weight <79 then apple

INPUT RAW DATA WITHOUT LABELS ANALYZE DATA TO OBTAIN PATTERN

Experience Task Performance


8
9/14

Unsupervised Learning

 Training data is unlabelled.


 System tries to learn without a teacher
TYPES OF MACHINE LEARNING
I can classify a spam and non-
I can guess which mail is spam or
spam mail by analyzing features I can group similar messages
not but I need to ask you
such as its sender, subject and based on their content
whether I got it correct or not
content

Supervised Unsupervised Reinforcement


By: Nurfadhlina
learning learning learning
FSKTM UPM

10
11/14

Types of Machine Learning systems


12/14

Criteria of Machine Learning

How they are supervised?

• Supervised
• Unsupervised
• Semi-supervised
• Self-supervised
• Reinforcement

How they learn?

• online learning
• batch learning

How they work?

• instance-based learning
• model-based learning
Steps in Machine Learning
Development •# linear algebra
•import numpy as np
•# data processing Python
•import pandas as pd Libraries for
•# data visualization
1.Data Collection •import seaborn as sns Machine
•%matplotlib inline Learning
2.Data Preparation and cleaning •from matplotlib import pyplot as plt
3.Choose a model based on the required •from matplotlib import style
•# Algorithms
task •from sklearn import linear_model
4.Train the model on training set by •from sklearn.linear_model import
LogisticRegression
executing several machine learning •from sklearn.ensemble import
algorithms RandomForestClassifier
•from sklearn.linear_model import Perceptron
5.Iteratively test the model on validation •from sklearn.linear_model import
set and tune the model’s parameter SGDClassifier
•from sklearn.tree import
6.Evaluate the model on test set DecisionTreeClassifier
•from sklearn.neighbors import
KNeighborsClassifier
•from sklearn.svm import SVC, LinearSVC
Machine Learning Algorithms

14
Dataset Preparation
Original dataset

Training set Test set

Training set Validation set Test set

Training, tuning and Training, tuning Performance


evaluation and evaluation estimation

Overfitting is a modeling error


Classification which occurs when a function
Regression is too closely fit to a limited
Machine Learning Clustering set of data points.
Algorithm Association Rules Learned model

The more complicated the


task, the more data needed.
15
Dataset Preparation
Training Dataset Validation Dataset Testing Dataset
Used to train an algorithm to Provides an unbiased evaluation of a Used to evaluate how well your
understand how to apply model fit on the training dataset algorithm was trained with the
concepts such as neural while tuning the model's training data set.
networks, to learn and produce hyperparameter (e.g. the number of
results. hidden units (layers and layer widths) We can’t use the training data
in a neural network). set in the testing stage because
Includes both input data and the the algorithm will already know
expected output. Can be used for regularization by in advance the expected output
Make up the majority of the early stopping (stopping training which is not our goal.
total data, around 60 %. when the error on the validation
dataset increases, as this is a sign of
overfitting to the training dataset).
Cross Validation
•Cross-validation is a resampling procedure used to
evaluate machine learning models on a limited data
sample.

•The procedure has a single parameter called k that


refers to the number of groups that a given data
sample is to be split into. As such, the procedure is
often called k-fold cross-validation.

•When a specific value for k is chosen, it may be


used in place of k in the reference to the model,
such as k=10 becoming 10-fold cross-validation.

•It generally results in a less biased or less


optimistic estimate of the model skill than other
methods, such as a simple train/test split.
Cross Validation
The general procedure is as follows:
● Shuffle the dataset randomly.
For example:
● Split the dataset into k groups
● For each unique group:
Model1: Trained on Fold1 + Fold2,
● Take the group as a hold out or test data set
Tested on Fold3
● Take the remaining groups as a training data
set
Model2: Trained on Fold2 + Fold3,
● Fit a model on the training set and evaluate
Tested on Fold1
it on the test set
● Retain the evaluation score and discard the
Model3: Trained on Fold1 + Fold3,
model
Tested on Fold2
● Summarize the skill of the model using the
sample of model evaluation scores
Cross Validation Example k=3, so set three pairs of train and test data, each with the
k=3 same ratio of training and test, but difference instances
selection

1st-fold

Training set (n=15) Test set (n=5)

2nd-fold

Training set (n=5) Test set (n=5) Training set (n=10)

3rd-fold

Test set (n=5) Training set (n=10)

19
#importing data modeling libraries
from sklearn import linear_model
from sklearn.ensemble import RandomForestClassifier
Python Libraries for
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
Machine Learning
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC

from sklearn.ensemble import (RandomForestClassifier, AdaBoostClassifier,


GradientBoostingClassifier, ExtraTreesClassifier,
VotingClassifier)

from sklearn.model_selection import (GridSearchCV, cross_val_score, cross_val_predict,


StratifiedKFold, learning_curve)

from sklearn.metrics import (confusion_matrix, accuracy_score)

from sklearn.model_selection import cross_val_score


20
Supervised Learning
Supervised learning
• The goal of supervised learning is to develop a finely tuned
predictor function h(x) (sometimes called the “hypothesis”).
• “Learning” consists of using sophisticated mathematical
algorithms to optimize this function so that, given input data
x about a certain domain (e.g. weight or width), it will
accurately predict some interesting value h(x) (say price of
a fruit).
• In practice, x almost always represents multiple data points.
• Predictor h(x) can be optimized through training, to find the
difference between the known, correct value y, and our
predicted value h(x).
Classification
• predicts categorical class labels (discrete or nominal)
• classifies data (constructs a model) based on the
training set and the values (class labels) in a
Supervised classifying attribute and uses it in classifying new
data
Learning
Regression
Classification • models continuous-valued functions, i.e., predicts
unknown or missing values
vs.
Typical applications
Regression • Credit/loan approval: if an applicant passes several
criteria
• Medical diagnosis: if a tumor is cancerous or benign
• Fraud detection: if a transaction is fraudulent
• Web page categorization: which category it is
Classification
Attributes Class

1. Given a collection of records (training set )


Weight Width Colour Class
• Each record contains a set of attributes,
one of the attributes is the class. 120 82 Orange Orange
62 84 Green Apple

2. Find a model (e.g., regression, Naïve Bayes, 74 88 Red Apple

gradient boosting) for class attribute as a 123 75 Orange Orange


Find f(x)
function (f(x)) of the values of other attributes. 134 80 Yellow Orange
80 80 Red Apple
3. Goal: previously unseen records should be 140 85 Orange Orange
assigned a class as accurately as possible. 76 80 Red Apple
• A test set is used to determine the accuracy X = Training Set
of the model. Usually, the given data set is
divided into training and test sets, with Weight Width Colour Class
training set used to build the model and 110 80 Orange ?
test set used to validate it. 68 70 Green ?
100 75 Orange ?
70 80 Red ?
f(attributes)=?
80 80 Green ?
Class
Test Set
Classification

Classification can be broken down into two areas:


a) Binary classification, where we wish to group
an outcome into one of two groups. Numerical variables need to be transformed to their
b) Multi-class classification, where we wish to categorical counterparts (using binning technique)
group an outcome into one of multiple (more before constructing their frequency tables.
than two) groups.
The other option we have is using the distribution of
Classification models: the numerical variable to have a good guess of the
• Decision Tree based Methods
• Rule-based Methods frequency.
• Memory based reasoning
• Neural Networks For example, one common practice is to assume
• Naïve Bayes and Bayesian Belief Networks normal distributions for numerical variables.
• Support Vector Machines
Recommended reading:
https://stackabuse.com/classification-in-python
-with-scikit-learn-and-pandas/
Examples is sourced from

Decision Tree •
https://www.saedsayad.com/decision_tree.htm
Algorithms used in Decision Tree:
ID3 → (extension of D3)
• Powerful and popular tools for classification and prediction. C4.5 → (successor of ID3)
• Classify instances or examples by starting at the root of the tree and CART → (Classification And Regression Tree)
moving through it until a leaf node. CHAID → (Chi-square automatic interaction detection Performs
• Tree represent rules. multi-level splits when computing classification trees)
• There could be more than one tree that fits the same data! MARS → (multivariate adaptive regression splines)

26
Unsupervised Learning
Unsupervised 1. Unsupervised learning is a type of machine
learning algorithm used to draw inferences from
Learning datasets consisting of input data without labeled
responses.

2. Unsupervised machine learning can identify previously


unknown patterns in data. System is presented with
unlabeled, uncategorized data and the system's
algorithms act on the data without prior training.

3. Useful in exploratory analysis because it can


automatically identify structure in data. It can be easier,
faster and less costly to use than supervised learning as
unsupervised learning does not require the manual work
associated with labeling data that supervised learning
requires.

4. Unsupervised learning can work with real-time data to


identify patterns.
Clustering
•Customer segmentation or understanding different customer
groups around which to build marketing or other business
strategies.
•Examples:
• Clustering DNA patterns to analyze evolutionary biology.
• Recommender systems, which involve grouping together
users with similar viewing patterns in order to recommend
similar content.
• Anomaly detection, including fraud detection or detecting
defective mechanical parts (i.e., predictive maintenance).

Association
•Association rule learning is a rule-based machine learning
method for discovering interesting relations between variables
in large databases.
Rules •It is intended to identify strong rules discovered in databases
using some measures of interestingness.
•Examples:
• Market basket analysis, stock analysis, web log mining,
medical diagnosis, customer market analysis bioinformatics
10 clustering Algo with Python
Clustering https://machinelearningmastery.com/cl
ustering-algorithms-with-python/

Cluster: A collection of data objects


● similar (or related) to one another within the same group
● dissimilar (or unrelated) to the objects in other groups

Cluster analysis (or clustering, data segmentation, …)


● Finding similarities between data according to the characteristics found in
the data and grouping similar data objects into clusters
● Measurement of cluster can be based on Davies Bouldin index

Unsupervised learning: no Typical applications


predefined classes (i.e., learning As a stand-alone tool to get insight into data
by observations vs. learning by distribution
examples: supervised) As a preprocessing step for other algorithms
30
Tasks in Clustering Modeling
Feature selection Clustering criterion Validation of the results
 Select info  Expressed via a cost  Validation test (also,
concerning the task function or some clustering tendency
of interest rules test)
 Minimal information
redundancy

Proximity measure Clustering algorithms Interpretation of the


 Similarity of two  Choice of algorithms results
feature vectors (e.g., k-means, k-  Integration with
medoid) applications

31
K-Means Clustering
K-means is a centroid-
based algorithm, or a
distance-based algorithm,
where we calculate the
distances to assign a point
to a cluster. In K-Means,
each cluster is associated
with a centroid.

Video start at min 12:27:


https://www.youtube.com/watch?v=BSRsXi-gvms

This note is developed by copying from https://www.kaggle.com/khotijahs1/k-means-clustering-of-iris-dataset


33/14

Association Rule Learning

 Goal  to dig into large amounts of data and discover interesting relations between
attributes.
 Example  Supermarket owners
 Running an association rule on your sales logs may reveal that people who purchase
barbecue sauce and potato chips also tend to buy steak.
 Thus, you may want to place these items close to one another.
34/14

Popular unsupervised learning algorithms

 Clustering
 K-Means
 Hierarchical Cluster Analysis (HCA)
 Expectation Maximization
 Visualization and dimensionality reduction
 Principal Component Analysis (PCA)
 Kernel PCA
 Locally-Linear Embedding (LLE)
 t-Distributed Stochastic Neighbour Embedding (t-SNE)
 Association rule learning
 Apriori
 Eclat

You might also like