Professional Documents
Culture Documents
UNIT-2 (Classification)
UNIT-2 (Classification)
UNIT-2
Supervised Learning
By
T.Satya Kumari
Assistant Professor
Dept. of Computer Science & Engineering
Aditya College of Engineering
Surampalem
Classifier
What is a Classifier?
• In machine learning, a classifier is an algorithm that automatically
sorts or categorizes data into one or more "classes." Targets, labels,
and categories are all terms used to describe classes.
• One of the most prominent instances is an email classifier, which
examines emails and filters them according to whether they are spam
or not.
Types of Classifiers
2. Logistic Regression:
• Under the Supervised Learning approach, one of the most prominent
Machine Learning algorithms is logistic regression. It's a method for
predicting a categorical dependent variable from a set of independent
factors.
3.Naive Bayes:
• The Naive Bayes family of probabilistic algorithms calculates the
likelihood that every given data point falls into one or more of a set of
categories (or not). It is a supervised learning approach for addressing
classification issues that are based on the Bayes theorem. It's a
probabilistic classifier, which means it makes predictions based on an
object's likelihood.
Types of Classifiers
4.K-Nearest Neighbors:
• K nearest neighbor's is a straightforward method that maintains all
existing examples and categorizes new ones using a similarity metric
(e.g., distance functions).
5. Support Vector Machine:
• The Support Vector Machine, or SVM, is a common Supervised
Learning technique that may be used to solve both classification and
regression issues. However, it is mostly utilized in Machine Learning
for Classification difficulties.
• The SVM algorithm's purpose is to find the optimum line or decision
boundary for categorizing n-dimensional space into classes so that
additional data points may be readily placed in the proper category in
the future. A hyperplane is a name for the optimal choice boundary.
Types of Classifiers
6. Random Forest:
• Random forest is a supervised learning approach used in machine
learning for classification and regression. It's a classifier that averages
the results of many decision trees applied to distinct subsets of a
dataset to improve the dataset's projected accuracy.
• It's also known as a meta-estimator since it fits a number of decision
trees on different sub-samples of datasets and utilizes the average to
enhance the model's forecast accuracy and prevent over-fitting. The
size of the sub-sample is always the same as the size of the original
input sample, but the samples are generated using replacement.
Types of Classification tasks
Those classification jobs with only two class labels are referred to as
binary classification.
Examples comprise -
• Prediction of conversion (buy or not).
• Churn forecast (churn or not).
• Detection of spam email (spam or not).
Binary classification problems often require two classes, one
representing the normal state and the other representing the abnormal
state.
Binary Classification
2. Naive Byes
Naive Bayes determines whether a data point falls into a particular
category. It can be used to classify phrases or words in text analysis
as either falling within a predetermined classification or not.
Text Tag
3. K-Nearest Neighbors
• It calculates the likelihood that a data point will join the groups based on
. which group the data points closest to it are a part of. When using k-NN
for classification, you determine how to classify the data according to its
nearest neighbor.
4. Decision Tree
• A decision tree is an example of supervised learning. Although it can solve
regression and classification problems, it excels in classification problems.
Similar to a flow chart, it divides data points into two similar groups at a
time, starting with the "tree trunk" and moving through the "branches"
and "leaves" until the categories are more closely related to one another.
Types of Classification Algorithms
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by
using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar
to cat and dog, but we want to know either it is a cat or dog. So for
this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of
the new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny)= 0.35
P(Yes)=0.71
• So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
With millions of listings or documents, for every product search or query, Amazon
needed to find a way to rank its products in order to maximize the chance of
purchase. Using a combination of individual preferences, gathered from users'
search and purchasing history and a product’s popularity, Amazon created a
ranking system that would display the most relevant products at the top of their
feed. Additionally, ranking was used in Amazon’s recommendation system, which
would use users' ranked preferences in order to predict what products a user is
most likely to purchase in the future.
Ranking
Netflix
Similar to Amazon, Netflix uses ranking to fuel their recommendation system. The
recommendation system predicts what content a user is most likely to watch and
displays the most relevant content at the top of the home page. Netflix uses a few
different features to rank and recommend content; such as: watch history, search
history, and general popularity. They also use ranking to fuel their collaborative
filtering.
TikTok
TikTok’s standout feature is the For You page which is built on a ranking system.
This feature has allowed TikTok to customize each home page to be reflective of the
preferences and interests of its user. TikTok uses similar metrics to Netflix to rank its
content: watch history, re-watch rate, and engagement. Similar to Netflix, TikTok’s
ranking system also aids in collaborative filtering.
Ranking
• Ranking is a machine learning technique to rank items.
• Ranking is useful for many applications in information retrieval such as e-
commerce, social networks, recommendation systems, and so on. For example, a
user searches for an article or an item to buy online. To build a recommendation
system, it becomes important that similar articles or items of relevance appear to
the user such that the user clicks or purchases the item.
• The ranking technique directly ranks items by training a model to predict the
ranking of one item over another item. In the training model, it is possible to
have items, ranking one over the other by having a "score" for each item. Higher
ranked items have higher scores and lower ranked items have lower scores. Using
these scores, a model is built to predict which item ranks higher than the other.