UNIT-2 (Classification)

Machine Learning
UNIT-2
Supervised Learning
By
T.Satya Kumari
Assistant Professor
Dept. of Computer Science & Engineering
Aditya College of Engineering
Surampalem
Classifier
What is a Classifier?
• In machine learning, a classifier is an algorithm that automatically
sorts or categorizes data into one or more "classes." Targets, labels,
and categories are all terms used to describe classes.
• One of the most prominent instances is an email classifier, which
examines emails and filters them according to whether they are spam
or not.
Types of Classifiers
Types of classifiers in Machine learning:

There are six different classifiers in machine learning, that we are going
to discuss below:
1.Perceptron:
For binary classification problems, the Perceptron is a linear machine
learning technique. It is one of the original and most basic forms of
artificial neural networks.
A linear classification algorithm is the Perceptron. This implies it learns
a decision boundary in the feature space that divides two classes using
a line (called a hyperplane).
2. Logistic Regression:
• Under the Supervised Learning approach, one of the most prominent
Machine Learning algorithms is logistic regression. It's a method for
predicting a categorical dependent variable from a set of independent
factors.
3.Naive Bayes:
• The Naive Bayes family of probabilistic algorithms calculates the
likelihood that every given data point falls into one or more of a set of
categories (or not). It is a supervised learning approach for addressing
classification issues that are based on the Bayes theorem. It's a
probabilistic classifier, which means it makes predictions based on an
object's likelihood.
4.K-Nearest Neighbors:
• K nearest neighbor's is a straightforward method that maintains all
existing examples and categorizes new ones using a similarity metric
(e.g., distance functions).
5. Support Vector Machine:
• The Support Vector Machine, or SVM, is a common Supervised
Learning technique that may be used to solve both classification and
regression issues. However, it is mostly utilized in Machine Learning
for Classification difficulties.
• The SVM algorithm's purpose is to find the optimum line or decision
boundary for categorizing n-dimensional space into classes so that
additional data points may be readily placed in the proper category in
the future. A hyperplane is a name for the optimal choice boundary.
6. Random Forest:
• Random forest is a supervised learning approach used in machine
learning for classification and regression. It's a classifier that averages
the results of many decision trees applied to distinct subsets of a
dataset to improve the dataset's projected accuracy.
• It's also known as a meta-estimator since it fits a number of decision
trees on different sub-samples of datasets and utilizes the average to
enhance the model's forecast accuracy and prevent over-fitting. The
size of the sub-sample is always the same as the size of the original
input sample, but the samples are generated using replacement.
Types of Classification tasks
There are four different types of Classification Tasks in Machine

Learning and they are following -
• Binary Classification
• Multi-Class Classification
• Multi-Label Classification
• Imbalanced Classification
Binary Classification
Those classification jobs with only two class labels are referred to as
binary classification.
Examples comprise -
• Prediction of conversion (buy or not).
• Churn forecast (churn or not).
• Detection of spam email (spam or not).
Binary classification problems often require two classes, one
representing the normal state and the other representing the abnormal
state.
Binary Classification
The following are well-known binary classification algorithms:

• Logistic Regression
• Support Vector Machines
• Simple Bayes
• Decision Trees
• Some algorithms, such as Support Vector Machines and Logistic
Regression, were created expressly for binary classification and do not
by default support more than two classes.
Multi-Class Classification
Multi-class labels are used in classification tasks referred to as multi-

class classification.
Examples comprise -
• Categorization of faces.
• Classifying plant species.
• Character recognition using optical.
• The multi-class classification does not have the idea of normal and
abnormal outcomes, in contrast to binary classification. Instead,
instances are grouped into one of several well-known classes.
• In some cases, the number of class labels could be rather high. In a
facial recognition system, for instance, a model might predict that a
shot belongs to one of thousands or tens of thousands of faces.
For multi-class classification, many binary classification techniques are

applicable.
The following well-known algorithms can be used for multi-class
classification:
• Progressive Boosting
• Choice trees
• Nearest K Neighbors
• Rough Forest
• Simple Bayes
Multi-class problems can be solved using algorithms created for binary
classification.
In order to do this, a method is known as "one-vs-rest" or "one model for

each pair of classes" is used, which includes fitting multiple binary
classification models with each class versus all other classes (called one-vs-
one).
• One-vs-One: For each pair of classes, fit a single binary classification
model.
• The following binary classification algorithms can apply these multi-class
classification techniques:
• One-vs-Rest: Fit a single binary classification model for each class versus
all other classes.
The following binary classification algorithms can apply these multi-class
classification techniques:
• Support vector Machine
• Logistic Regression
Multi-Label Classification
Multi-label classification problems are those that feature two or more

class labels and allow for the prediction of one or more class labels for
each example.
Think about the photo classification example. Here a model can predict
the existence of many known things in a photo, such as “person”,
“apple”, "bicycle," etc. A particular photo may have multiple objects in
the scene.
This greatly contrasts with multi-class classification and binary
classification, which anticipate a single class label for each occurrence.
Multi-Label Classification
It is not possible to directly apply multi-label classification methods

used for multi-class or binary classification. The so-called multi-label
versions of the algorithms, which are specialized versions of the
conventional classification algorithms, include:
• Multi-label Gradient Boosting
• Multi-label Random Forests
• Multi-label Decision Trees
Imbalanced Classification
• The term "imbalanced classification" describes classification jobs

where the distribution of examples within each class is not equal.
• A majority of the training dataset's instances belong to the normal
class, while a minority belong to the abnormal class, making
imbalanced classification tasks binary classification tasks in general.
Examples comprise -
• Clinical diagnostic procedures
• Detection of outliers
• Fraud investigation
Types of Classification Algorithms
You can apply many different classification methods based on the

dataset you are working with. It is so because the study of classification
in statistics is extensive. The top five machine learning algorithms are
listed below.
1. Logistic Regression
It is a supervised learning classification technique that forecasts the
likelihood of a target variable. There will only be a choice between two
classes. Data can be coded as either one or yes, representing success,
or as 0 or no, representing failure. The dependent variable can be
predicted most effectively using logistic regression. When the forecast
is categorical, such as true or false, yes or no, or a 0 or 1, you can use it.
A logistic regression technique can be used to determine whether or
not an email is a spam.
2. Naive Byes
Naive Bayes determines whether a data point falls into a particular
category. It can be used to classify phrases or words in text analysis
as either falling within a predetermined classification or not.
Text Tag
“A great game” Sports
“The election is over” Not Sports
“What a great score” Sports
“A clean and unforgettable game” Sports
“The spelling bee winner was a

Not Sports
surprise”
3. K-Nearest Neighbors
• It calculates the likelihood that a data point will join the groups based on
. which group the data points closest to it are a part of. When using k-NN
for classification, you determine how to classify the data according to its
nearest neighbor.
4. Decision Tree
• A decision tree is an example of supervised learning. Although it can solve
regression and classification problems, it excels in classification problems.
Similar to a flow chart, it divides data points into two similar groups at a
time, starting with the "tree trunk" and moving through the "branches"
and "leaves" until the categories are more closely related to one another.
5. Random Forest Algorithm

• The random forest algorithm is an extension of the Decision Tree algorithm
. where you first create a number of decision trees using training data and
then fit your new data into one of the created ‘tree’ as a ‘random forest’. It
averages the data to connect it to the nearest tree data based on the data
scale. These models are great for improving the decision tree’s problem of
forcing data points unnecessarily within a category.
6. Support Vector Machine
• Support Vector Machine is a popular supervised machine learning technique
for classification and regression problems. It goes beyond X/Y prediction by
using algorithms to classify and train the data according to polarity.
K-Nearest Neighbor(KNN) Algorithm
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by
using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar
to cat and dog, but we want to know either it is a cat or dog. So for
this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of
the new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B,

and we have a new data point x1, so this data point will lie in which of
these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category
or class of a particular dataset. Consider the below diagram:
How does K-NN work?
The K-NN working can be explained on the basis of the below
algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data
points in each category.
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
• Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the
required category. Consider the below image:
• Firstly, we will choose the number of neighbors,
• so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data
points. The Euclidean distance is the distance between two points,
which we have already studied in geometry. It can be calculated as:
• By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:
• As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A
Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the
data points for all the training samples.
• By calculating the Euclidean distance we got the nearest neighbors, as
three nearest neighbors in category A and two nearest neighbors in
category B. Consider the below image:
• As we can see the 3 nearest neighbors are from category A, hence this
new data point must belong to category A
Naïve Bayes Classifier Algorithm
• Naïve Bayes algorithm is a supervised learning algorithm, which is based

on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that
can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Naïve Bayes Classifier Algorithm
Why is it called Naïve Bayes?

• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features. Such
as if the fruit is identified on the basis of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without
depending on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
Bayes' Theorem
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is

used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
• Naive Bayes Classifiers basically depends on the Bayes’ Theorem,
which is based on conditional probability. The likelihood that an event
(A) will occur given that another event (B) has already occurred.
Basically, the theorem allows a hypothesis to be updated every time
new evidence is introduced.
• The formula for Bayes' theorem is given as:
Bayes' Theorem
• P(A|B) is Posterior probability: Probability of hypothesis A on the

observed event B.
• P(B|A) is Likelihood probability: Probability of the evidence given that
the probability of a hypothesis is true.
• P(A) is Prior Probability: Probability of hypothesis before observing
the evidence.
• P(B) is Marginal Probability: Probability of Evidence.
Bayes' Theorem
A list of probabilities required to store to file for a learned naive Bayes

model are,
• Class Probabilities: The probabilities of each class in the training dataset.
• Conditional Probabilities: The conditional probabilities of each input
value for every given class value.
To perform Naïve-Bayes classification, the dataset is divided into two
parts, namely, feature matrix and the response vector.
• Feature matrix contains all the vectors(rows) of dataset in which every
vector consists of the value of dependent features.
• Response vector contains the value of class variable either prediction or
output for each row of feature matrix.
Bayes' Theorem
Prior Probability :A prior probability is the probability that an

observation will fall into a category before you collect the data. The
prior is a probability distribution that represents your uncertainty over
θ before you have sampled any data and tried to estimate it – usually
denoted by π(θ).
Posterior Probability : A posterior probability is the probability of
assigning observations to categories or groups given in the data. The
posterior is a probability distribution representing your uncertainty
over θ after you have sampled data and is denoted by π(θ|X). It is a
conditional distribution because it applies conditions on the observed
data.
Bayes' Theorem
From Bayes’ theorem we can relate the two as

Naïve Bayes' Classifier
Working of Naïve Bayes' Classifier:

• Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that
whether we should play or not on a particular day according to the
weather conditions. So to solve this problem, we need to follow the
below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given
features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
SNO Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4
Likelihood table weather condition:
Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
• So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|

Sunny)
• Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

• Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.
Applications of Naïve Bayes Classifier:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
• It is used in Text classification such as Spam filtering and Sentiment analysis.
Types of Naïve Bayes Classifiers

There are three types of classifiers. They are
• Gaussian Naïve Bayes
• Multinomial Naïve Bayes
• Bernoulli Naïve Bayes
Gaussian Naive Bayes

• Gaussian Naive Bayes is useful when we are working with continuous
values whose probabilities can be modeled using a Gaussian
distribution.
Multinomial naive Bayes

• A multinomial distribution is helpful to model feature vectors where
each value represents like the number of occurrences of a term or its
relative frequency. If the feature vectors have n elements and each
element of them can assume k different values with probability pk,
then
Bernoulli naive Bayes

• If X is a random variable which is Bernoulli-distributed, it assumes
only two values and their probability is given as follows
MNIST
• The MNIST database (Modified National Institute of Standards and

Technology database) is a large database of handwritten digits that is
commonly used for training various image processing systems. The
database is also widely used for training and testing in the field of
machine learning.
• The MNIST database contains 60,000 training images and 10,000
testing images. Half of the training set and half of the test set were
taken from NIST's training dataset, while the other half of the training
set and the other half of the test set were taken from NIST's testing
dataset. Each image is labeled with the digit it represents.
MNIST
• The set of images in the MNIST database was created in 1998 as a

combination of two of NIST's databases: Special Database 1 and
Special Database 3. Special Database 1 and Special Database 3 consist
of digits written by high school students and employees of the United
States Census Bureau, respectively.
Ranking
• Ranking is a type of machine learning that sorts data in a relevant order. Companies
use ranking to optimize search and recommendations.
What is a ranking model?
• Ranking is a type of supervised machine learning (ML) that uses labeled datasets to
train its data and models to classify future data to predict outcomes. Quite simply, the
goal of a ranking model is to sort data in an optimal and relevant order.
• Ranking was first largely deployed within search engines. People search for a topic,
while the ranking algorithm reorders search results based on the Page Rank and the
search engine is able to display the most relevant results to its customers.
How does ranking work?
• Ranking models are made up of 2 main factors: queries and documents. Queries are
any input value, such as a question on Google or an interaction on an e-commerce site.
Documents are the output value or results of the query. Given the query, and the
associated documents, a function, given a list of parameters to rank on, will score the
documents to be sorted in order of relevancy.
Ranking
• As an example, a search for “Mage” is done on Google Search (“Mage” is the
query). After the search, a list of associated documents matching the query will
be displayed (Mage A.I., Mage definition, Mage World of Warcraft, etc.). The
function will score each of the documents based on their relevance to the query
(Mage A.I. = 1, Mage definition = 2, Mage World of Warcraft =3, and so on). The
documents with higher scores will be ranked higher when there is a search for
Mage.
Ranking
Here are a few companies who have used ranking to maximize user engagement.
• Amazon
With millions of listings or documents, for every product search or query, Amazon
needed to find a way to rank its products in order to maximize the chance of
purchase. Using a combination of individual preferences, gathered from users'
search and purchasing history and a product’s popularity, Amazon created a
ranking system that would display the most relevant products at the top of their
feed. Additionally, ranking was used in Amazon’s recommendation system, which
would use users' ranked preferences in order to predict what products a user is
most likely to purchase in the future.
Ranking
Netflix
Similar to Amazon, Netflix uses ranking to fuel their recommendation system. The
recommendation system predicts what content a user is most likely to watch and
displays the most relevant content at the top of the home page. Netflix uses a few
different features to rank and recommend content; such as: watch history, search
history, and general popularity. They also use ranking to fuel their collaborative
filtering.
TikTok
TikTok’s standout feature is the For You page which is built on a ranking system.
This feature has allowed TikTok to customize each home page to be reflective of the
preferences and interests of its user. TikTok uses similar metrics to Netflix to rank its
content: watch history, re-watch rate, and engagement. Similar to Netflix, TikTok’s
ranking system also aids in collaborative filtering.
Ranking
• Ranking is a machine learning technique to rank items.
• Ranking is useful for many applications in information retrieval such as e-
commerce, social networks, recommendation systems, and so on. For example, a
user searches for an article or an item to buy online. To build a recommendation
system, it becomes important that similar articles or items of relevance appear to
the user such that the user clicks or purchases the item.
• The ranking technique directly ranks items by training a model to predict the
ranking of one item over another item. In the training model, it is possible to
have items, ranking one over the other by having a "score" for each item. Higher
ranked items have higher scores and lower ranked items have lower scores. Using
these scores, a model is built to predict which item ranks higher than the other.

UNIT-2 (Classification)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT-2 (Classification)

Uploaded by

Copyright:

Available Formats

Machine Learning

Types of classifiers in Machine learning:

There are four different types of Classification Tasks in Machine

The following are well-known binary classification algorithms:

Multi-class labels are used in classification tasks referred to as multi-

For multi-class classification, many binary classification techniques are

In order to do this, a method is known as "one-vs-rest" or "one model for

Multi-label classification problems are those that feature two or more

It is not possible to directly apply multi-label classification methods

• The term "imbalanced classification" describes classification jobs

You can apply many different classification methods based on the

“A great game” Sports

“The election is over” Not Sports

“What a great score” Sports

“A clean and unforgettable game” Sports

“The spelling bee winner was a

5. Random Forest Algorithm

• Suppose there are two categories, i.e., Category A and Category B,

• Naïve Bayes algorithm is a supervised learning algorithm, which is based

Why is it called Naïve Bayes?

• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is

• P(A|B) is Posterior probability: Probability of hypothesis A on the

A list of probabilities required to store to file for a learned naive Bayes

Prior Probability :A prior probability is the probability that an

From Bayes’ theorem we can relate the two as

Working of Naïve Bayes' Classifier:

P(Sunny|Yes)= 3/10= 0.3

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|

• Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

Types of Naïve Bayes Classifiers

Gaussian Naive Bayes

Multinomial naive Bayes

Bernoulli naive Bayes

• The MNIST database (Modified National Institute of Standards and

• The set of images in the MNIST database was created in 1998 as a

You might also like