Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Chapter 2

Supervised Learning

HCMC, 10-2022
What is supervised
learning?

Supervised learning is the types of ML in


which machines are trained using well
"labelled" training data, and on basis of that
data, machines predict the output.
In supervised learning, the training data
provided to the machines work as the
supervisor that teaches the machines to
predict the output correctly.
The aim of a supervised learning algorithm
is to find a mapping function to map the
input variable(x) with the output
variable(y).
In the real-world, supervised learning can be
used for Risk Assessment, Image
classification, Fraud Detection, spam
filtering, etc.
How Supervised Learning
Works?

In supervised learning, models are trained


using labelled dataset, where the model learns about
each type of data.
Once the training process is completed, the model is
tested on the basis of test data (a subset of
the training set), and then it predicts the output.
Suppose we have a dataset of different types of
shapes which includes square, rectangle, triangle,
and Polygon. Now the first step is that we need to
train the model for each shape.
• If the given shape has four sides, and all the sides
are equal, then it will be labelled as a Square.
• If the given shape has three sides, then it will be
labelled as a triangle.
• If the given shape has six equal sides then it will
be labelled as hexagon.
Now, after training, we test our model using the test
set, and the task of the model is to identify the
shape.
Steps Involved in
Supervised Learning:

1- First Determine the type of training dataset


2 - Collect/Gather the labelled training data.
3 - Split the training dataset into training dataset,
test dataset, and validation dataset.
4 - Determine the input features of the training
dataset, which should have enough knowledge so
that the model can accurately predict the output.
5 - Determine the suitable algorithm for the model,
such as support vector machine, decision tree, etc.
6 - Execute the algorithm on the training dataset.
Sometimes we need validation sets as the control
parameters, which are the subset of training
datasets.
7 - Evaluate the accuracy of the model by providing
the test set. If the model predicts the correct
output, which means our model is accurate.
Types of supervised ML Algorithms:​ Advantages of Supervised learning:
• With the help of supervised learning, the model can
predict the output on the basis of prior experiences.
• In supervised learning, we can have an exact idea
about the classes of objects.
• Supervised learning model helps us to solve various
real-world problems such as fraud detection, spam
filtering, etc.

Disadvantages of supervised learning:


• Supervised learning models are not suitable for
handling the complex tasks.
• Supervised learning cannot predict the correct
output if the test data is different from the training
dataset.
• Training required lots of computation times.
• In supervised learning, we need enough knowledge
about the classes of object.
ML Algorithms – Regression

Regression algorithms are used if there is a


relationship between the input variable and the
output variable. It is used for the prediction of
continuous variables, such as Weather
forecasting, Market Trends, etc.
Below are some popular Regression algorithms
which come under supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
Classification Algorithm in
Machine Learning

The Classification algorithm is a Supervised


Learning technique that is used to identify the
category of new observations on the basis of
training data.
There are two types of Classifications:
• Binary Classifier: If the classification
problem has only two possible outcomes,
then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE,
SPAM or NOT SPAM, CAT or DOG, etc.
• Multi-class Classifier: If a classification
problem has more than two outcomes, then
it is called as Multi-class Classifier.
Example: Classifications of types of crops,
Classification of types of music.
Types of ML Classification Algorithms:
Classification Algorithms can be further
divided into the mainly two category:
• Linear Models
• Logistic Regression
• Support Vector Machines
• Non-linear Models
• K-Nearest Neighbours
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
Use cases of Classification Algorithms
Classification algorithms can be used in different places. Below
are some popular use cases of Classification Algorithms:
• Email Spam Detection
• Speech Recognition
• Identifications of Cancer tumor cells.
• Drugs Classification
• Biometric Identification, etc.
Evaluating a Classification model

Once our model is completed, it is necessary to


evaluate its performance; either it is a
Classification or Regression model. So for
evaluating a Classification model, we have the
following ways:
1. Loss Function
(Cross-Entropy, RMSE, MAE, MSE,...)
2. Confusion Matrix
3. AUC-ROC curve Log Loss or Cross-Entropy Loss​

https://ml-
cheatsheet.readthedocs.io/en/latest/loss_fu
nctions.html#:~:text=Cross%2Dentropy%20l
oss%2C%20or%20log,diverges%20from%20t
he%20actual%20label.
Evaluating a Classification model
Confusion matrix

A confusion matrix is a table that is used to define


the performance of a classification algorithm. A
confusion matrix visualizes and summarizes the
performance of a classification algorithm.
Confusion Matrix is a useful machine learning
method which allows you to measure Recall,
Precision, Accuracy, and AUC-ROC curve.
• “true positive” for correctly predicted event
values.
• “false positive” for incorrectly predicted event
values.
• “true negative” for correctly predicted no-event
values.
• “false negative” for incorrectly predicted no-
event values.
Evaluating a Classification model
AUC-ROC curve

The Receiver Operator Characteristic


(ROC) curve is an evaluation metric for binary
classification problems. It is a probability curve
that plots the TPR against FPR at various
threshold values and essentially separates the
‘signal’ from the ‘noise’. The Area Under the
Curve (AUC) is the measure of the ability of a
classifier to distinguish between classes and is
used as a summary of the ROC curve.
K-Nearest Neighbors

K-Nearest Neighbour is one of the simplest Machine Learning


algorithms based on Supervised Learning technique. This
algorithm assumes that similar data points can be found near
each other. As a result, it seeks to calculate the distance
between data points, usually through Euclidean distance, and
then it assigns a category based on the most frequent category
or average.
Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be
complex some time.
• The computation cost is high because of calculating the
distance between the data points for all the training
samples.
K-Nearest Neighbors

• The K-NN working can be explained on


the basis of the below algorithm:
• Step-1: Select the number K of the
neighbors
• Step-2: Calculate the Euclidean distance
of K number of neighbors
• Step-3: Take the K nearest neighbors as
per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count
the number of the data points in each
category.
• Step-5: Assign the new data points to that
category for which the number of the
neighbor is maximum.
• Step-6: Our model is ready.
Logistic
Regression Classification

Logistic regression predicts the output of a


categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It
can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives
the probabilistic values which lie between 0 and
1. The sigmoid function is a mathematical function Type of Logistic Regression:
used to map the predicted values to probabilities.
On the basis of the categories, Logistic Regression can be
In logistic regression, we use the concept of the classified into three types:
threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold • Binomial: In binomial Logistic regression, there can be only
value tends to 1, and a value below the threshold two possible types of the dependent variables, such as 0 or
values tends to 0. 1, Pass or Fail, etc.
Logistic Regression is much similar to the Linear • Multinomial: In multinomial Logistic regression, there can
Regression except that how they are used. Linear be 3 or more possible unordered types of the dependent
Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the variable, such as "cat", "dogs", or "sheep"
classification problems. • Ordinal: In ordinal Logistic regression, there can be 3 or
more possible ordered types of dependent variables, such
as "low", "Medium", or "High".
K-nearest neighbor

Logistic Regression Equation:


The Logistic regression equation can be
obtained from the Linear Regression equation.
The mathematical steps to get Logistic
Regression equations are given below:
• We know the equation of the straight line
can be written as:

• In Logistic Regression y can be between 0


and 1 only, so for this let's divide the
above equation by (1-y):

• But we need range between -[infinity] to


+[infinity], then take logarithm of the
equation it will become:
Support Vector Machine
Algorithm

Support Vector Machine or SVM is one of the most


popular Supervised Learning algorithms, which is
used for Classification as well as Regression
problems.
The goal of the SVM algorithm is to create the best
line or decision boundary that can segregate n-
dimensional space into classes so that we can easily
put the new data point in the correct category in the
future. This best decision boundary is called a
hyperplane.
SVM chooses the extreme points/vectors that help
in creating the hyperplane. These extreme cases are Hyperplane: There can be multiple lines/decision boundaries to segregate the classes
called as support vectors, and hence algorithm is in n-dimensional space, but we need to find out the best decision boundary that helps
termed as SVM. Consider the below diagram in to classify the data points.
which there are two different categories that are Support Vectors:
classified using a decision boundary or hyperplane: The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.
Types of SVM
• Linear SVM: Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear
SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified
by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear
SVM classifier

SVM kernel functions


A kernel is a function used in SVM for helping to solve
problems. They provide shortcuts to avoid complex calculations. The
amazing thing about kernel is that we can go to higher dimensions
and perform smooth calculations with the help of it.
Kernels are a way to solve non-linear problems with the help of
linear classifiers. This is known as the kernel trick method. The
kernel functions are used as parameters in the SVM codes. They
help to determine the shape of the hyperplane and decision
boundary.
Linear SVM

The working of the SVM algorithm can be understood


by using an example. Suppose we have a dataset that
has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classifier that can classify
the pair(x1, x2) of coordinates in either green or blue.
So as it is 2-d space so by just using a straight line, we
can easily separate these two classes. But there can be
multiple lines that can separate these classes.
Hence, the SVM algorithm helps to find the best line or
decision boundary; this best boundary or region is
called as a hyperplane.
SVM algorithm finds the closest point of the lines from
both the classes. These points are called support
vectors. The distance between the vectors and the
hyperplane is called as margin. And the goal of SVM is
to maximize this margin. The hyperplane with
maximum margin is called the optimal hyperplane.
Non-Linear SVM
So now, SVM will divide the datasets into classes in the
If data is linearly arranged, then we can separate it by using a
straight line, but for non-linear data, we cannot draw a single following way. Consider the below image:
straight line. So to separate these data points, we
need to add one more dimension.

For linear data, we have used two dimensions x and y, so for non-linear
data, we will add a third dimension z. It can be calculated as:
z=x^2 +y^2
Since we are in 3-d Space, hence it is looking like a plane
By adding the third dimension, the sample space will become as below
parallel to the x-axis. If we convert it in 2d space with z=1,
image:
then it will become as:
Naïve Bayes Classifier
Algorithm

Naïve Bayes algorithm is a supervised


learning algorithm, which is based on Bayes
theorem and used for solving classification
problems.
It is a probabilistic classifier, which means P(A|B) is Posterior probability: Types of Naïve Bayes Model:
it predicts on the basis of the probability of Probability of hypothesis A on the
an object. observed event B.
Some popular examples of Naïve Bayes P(B|A) is Likelihood probability:
Algorithm are spam filtration, Sentimental Probability of the evidence given that the
analysis, and classifying articles. probability of a hypothesis is true.
Bayes' theorem is also known as Bayes' P(A) is Prior Probability: Probability of
Rule or Bayes' law, which is used to hypothesis before observing the
determine the probability of a evidence.
hypothesis with prior knowledge. It P(B) is Marginal Probability: Probability
depends on the conditional probability. of Evidence.
Working of Naïve Bayes' Classifier:
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we
should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps:
1.Convert the given dataset into frequency tables.
2.Generate Likelihood table by finding the probabilities of given features.
3.Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Random Forest Classifier

• Decision Tree is a Supervised learning


technique that can be used for both
classification and Regression problems, but
mostly it is preferred for solving
Classification problems. It is a tree-
structured classifier, where internal nodes
represent the features of a dataset,
branches represent the decision
rules and each leaf node represents the
outcome.
• In a Decision tree, there are two nodes,
which are the Decision Node and Leaf
Node. Decision nodes are used to make any
decision and have multiple branches,
whereas Leaf nodes are the output of those
decisions and do not contain any further
branches.
Why use Decision Trees? Example: Suppose there is a candidate who has a job offer and wants
Below are the two reasons for using the Decision tree: to decide whether he should accept the offer or Not. So, to solve this
•Decision Trees usually mimic human thinking ability while problem, the decision tree starts with the root node (Salary attribute
making a decision, so it is easy to understand. by ASM). The root node splits further into the next decision node
•The logic behind the decision tree can be easily (distance from the office) and one leaf node based on the
understood because it shows a tree-like structure. corresponding labels. The next decision node further gets split into
one decision node (Cab facility) and one leaf node. Finally, the decision
Decision Tree Terminologies node splits into two leaf nodes (Accepted offers and Declined offer).
•Root Node: Root node is from where the decision tree Consider the below diagram:
starts. It represents the entire dataset, which further gets
divided into two or more homogeneous sets.
•Leaf Node: Leaf nodes are the final output node, and the
tree cannot be segregated further after getting a leaf node.
•Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given
conditions.
•Branch/Sub Tree: A tree formed by splitting the tree.
•Pruning: Pruning is the process of removing the unwanted
branches from the tree.
•Parent/Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes.
Random Forest Classifier

Random Forest is a classifier that contains a


number of decision trees on various subsets of
the given dataset and takes the average to
improve the predictive accuracy of that
dataset. The greater number of trees in the
forest leads to higher accuracy and prevents the
problem of overfitting.
Applications of Random Forest
• Banking: Banking sector mostly uses this
algorithm for the identification of loan risk.
• Medicine: With the help of this algorithm,
disease trends and risks of the disease can
be identified.
• Land Use: We can identify the areas of
similar land use by this algorithm.
• Marketing: Marketing trends can be
identified using this algorithm.

You might also like