Professional Documents
Culture Documents
Chapter 7 Learning
Chapter 7 Learning
CHAPTER 7
Supervised & Unsupervised
Learning
CSC3005 – INTRODUCTION TO DATA SCIENCE
MACHINE LEARNING
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.” (Tom Mitchell, 1997,
Carnegie Mellon University)
Experience
* Task
= Performance
2
Imagine that you want to
build a machine that can
recognize different
objects, such as
separating orange from
apple
3
You need to train the machine by
providing examples of apples and
oranges to make the machine
experience, recognize and learn the
difference among the fruits.
100%
apple
ACCURACY
classify
?
label Neural Network, Support Vector Machine
Supervised learning
Training data you feed to the algorithm includes the desired solutions, called LABELS
Classification – Spam filter
SpamAssassin
1
Clustering
UNSUPERVISED
Feature LEARNING ALGORITHM pattern LEARNED MODEL 1
vector
without Clustering: K-Means, K-Medoid
label Association Rules: Apriori Association Support
Rules
A bunch of fruits
If color=orange and weight > 80 then orange
Else If color=red and weight <79 then apple
Unsupervised Learning
10
11/14
• Supervised
• Unsupervised
• Semi-supervised
• Self-supervised
• Reinforcement
• online learning
• batch learning
• instance-based learning
• model-based learning
Steps in Machine Learning
Development •# linear algebra
•import numpy as np
•# data processing Python
•import pandas as pd Libraries for
•# data visualization
1.Data Collection •import seaborn as sns Machine
•%matplotlib inline Learning
2.Data Preparation and cleaning •from matplotlib import pyplot as plt
3.Choose a model based on the required •from matplotlib import style
•# Algorithms
task •from sklearn import linear_model
4.Train the model on training set by •from sklearn.linear_model import
LogisticRegression
executing several machine learning •from sklearn.ensemble import
algorithms RandomForestClassifier
•from sklearn.linear_model import Perceptron
5.Iteratively test the model on validation •from sklearn.linear_model import
set and tune the model’s parameter SGDClassifier
•from sklearn.tree import
6.Evaluate the model on test set DecisionTreeClassifier
•from sklearn.neighbors import
KNeighborsClassifier
•from sklearn.svm import SVC, LinearSVC
Machine Learning Algorithms
14
Dataset Preparation
Original dataset
1st-fold
2nd-fold
3rd-fold
19
#importing data modeling libraries
from sklearn import linear_model
from sklearn.ensemble import RandomForestClassifier
Python Libraries for
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
Machine Learning
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC
Decision Tree •
https://www.saedsayad.com/decision_tree.htm
Algorithms used in Decision Tree:
ID3 → (extension of D3)
• Powerful and popular tools for classification and prediction. C4.5 → (successor of ID3)
• Classify instances or examples by starting at the root of the tree and CART → (Classification And Regression Tree)
moving through it until a leaf node. CHAID → (Chi-square automatic interaction detection Performs
• Tree represent rules. multi-level splits when computing classification trees)
• There could be more than one tree that fits the same data! MARS → (multivariate adaptive regression splines)
26
Unsupervised Learning
Unsupervised 1. Unsupervised learning is a type of machine
learning algorithm used to draw inferences from
Learning datasets consisting of input data without labeled
responses.
Association
•Association rule learning is a rule-based machine learning
method for discovering interesting relations between variables
in large databases.
Rules •It is intended to identify strong rules discovered in databases
using some measures of interestingness.
•Examples:
• Market basket analysis, stock analysis, web log mining,
medical diagnosis, customer market analysis bioinformatics
10 clustering Algo with Python
Clustering https://machinelearningmastery.com/cl
ustering-algorithms-with-python/
31
K-Means Clustering
K-means is a centroid-
based algorithm, or a
distance-based algorithm,
where we calculate the
distances to assign a point
to a cluster. In K-Means,
each cluster is associated
with a centroid.
Goal to dig into large amounts of data and discover interesting relations between
attributes.
Example Supermarket owners
Running an association rule on your sales logs may reveal that people who purchase
barbecue sauce and potato chips also tend to buy steak.
Thus, you may want to place these items close to one another.
34/14
Clustering
K-Means
Hierarchical Cluster Analysis (HCA)
Expectation Maximization
Visualization and dimensionality reduction
Principal Component Analysis (PCA)
Kernel PCA
Locally-Linear Embedding (LLE)
t-Distributed Stochastic Neighbour Embedding (t-SNE)
Association rule learning
Apriori
Eclat