Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

Naïve Bayes Classifier

Naïve
Bayesian
rule
intuition
Lets talk about a factory that
makes spanners
• We have two machines in the
factory.
• Each machine has different
characteristics i.e., they work at
different rate, they consume
different power.
• But they produce that same
spanners.
• The spanners produced by
machine 1 and machine 2are
labeled m1 and m2 respectively.
• In the figure we can see some
defective spanners among the
good ones marked as black.
• What is the probability of
machine 2 producing a defective
spanner
• We can do this using the bayes
theorem
Introduction to Naïve Bayes Classifier
It is a classification technique based on Bayes’ Theorem

• The fundamental Naïve Bayes assumption is that each feature makes an:

Assumptions made by Naïve Bayes • Independent contribution to the outcome


• Equal contribution to the outcome
Types of Naïve Bayes Classifier
Multinomial Naïve Bayes:
•Use Case: Multinomial Naïve Bayes is commonly used for text classification tasks, such as spam email
detection or document categorization. It's suited for data with discrete features, like word counts.
•How It Works: It models the likelihood of each term (word) occurring in a document as a multinomial
distribution. It assumes that each term is a count of word occurrences and doesn't consider the order of
words
Bernoulli Naïve Bayes:
•Use Case: Bernoulli Naïve Bayes is used for binary classification problems where features are binary
(present/absent), such as sentiment analysis (positive/negative) or document classification (relevant/irrelevant).
•How It Works: It treats features as binary variables, indicating whether a feature is "on" (1) or "off" (0). It models
the probability of these binary events using a Bernoulli distribution
Gaussian Naïve Bayes:
•Use Case: Gaussian Naïve Bayes is suitable for classification tasks where the features follow a Gaussian (normal)
distribution. It's commonly used in cases where the data is continuous and normally distributed, such as in some
scientific measurements or sensor data.
•How It Works: It assumes that the features are normally distributed and uses mean and standard deviation to model
the distribution. It's a good choice when dealing with continuous data.
Types of Naïve Bayes Classifier:
Multinomial Naïve Bayes Classifier:
• Feature vectors represent the frequencies with which certain events have been generated by
a multinomial distribution. This is the event model typically used for document classification.

Bernoulli Naïve Bayes Classifier

• In the multivariate Bernoulli event model, features are independent booleans (binary variables)
describing inputs. Like the multinomial model, this model is popular for document
classification tasks, where binary term occurrence (i.e. a word occurs in a document or not)
features are used rather than term frequencies (i.e. frequency of a word in the document).

Gaussian Naïve Bayes Classifier:

• In Gaussian Naïve Bayes, continuous values associated with each feature are assumed to be
distributed according to a Gaussian distribution (Normal distribution). When plotted, it gives a
bell-shaped curve which is symmetric about the mean of the feature values as shown below:

Note: We will consider the Gaussian Naïve Bayes Classifier


Naïve Bayes Classifier
It is probabilities machine learning model that is used for classification
task. This classifier is based on Buyer Theorem.

Bayes Theorem is
P(A/B) = P(B/A) * P(A)

P(B)
Naïve Bayes Classifier
Example
Naïve Bayes Classifier

• Argmax gives the highest probability


• Suppose for yes = 0.7, No = 0.3
• argmax (y) = 0.7 => The max value is output.
Example # 01

The Frequency table shows how often labels appear for each feature. These tables will
assist us in calculating the prior and posterior probabilities.
• Now let us divide the above frequency table into prior and posterior
probabilities. The prior probability is the probability of an event before
new data is collected. In contrast, the posterior probability is the
statistical probability that calculates the hypothesis to be true in the
light of relevant observations.
• Table 1 shows the prior probabilities of labels, and Table 2 shows the
posterior probability.
• We want to calculate the probability of playing when the weather is
overcast.
• Solution:
Probability of Playing:
We have the formula for the Naive Bayes classification, which is
P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P
(Overcast).
Now, let’s calculate the prior probabilities:
P(Overcast) = 4/14 = 0.29
P(Yes)= 9/14 = 0.64
• The next step is to find the posterior probability, which can be easily
calculated by:
• P(Overcast | Yes) = 4/9 = 0.44
• Once we have the posterior and prior probabilities, we can put them
back in our main formula to calculate the probability of playing when
the weather is overcast.
• P(Yes | Overcast) = 0.44 * 0.64 / 0.29 = 0.98
• Probability of Not Playing:
• Similarly, we can calculate the probability of not playing any sports when overcast weather.
• First, let us calculate the prior probabilities
• P(Overcast) = 4/14 = 0.29
• P(No)= 5/14 = 0.36
• The next step is to calculate the posterior probability, which is:
• P(Overcast | No) = 0/9 = 0
• By putting these probabilities in the main formula, we get:
• P(No | Overcast) = 0 * 0.36 / 0.29 = 0
• We can see that the probability of a playing class is higher, so if the weather is overcast
players will play sports.
Example # 02
Naïve Bayes Classifier
• Outlook and temperature are features and play is output
• Today(Sunny, hot), Play or Not = ?
Outlook Temperature
Yes No P(Y) P(N) Yes No P(Y) P(N)
Sunn 2 3 2/9 3/5 Hot 2 2 2/9 2/5
y
Overc 4 0 4/9 0/5 Mild 4 2 4/9 2/5
ast Cool 3 1 3/9 1/5
Rainy 3 2 3/9 2/5
Total 9 5 100% 100%
Total 9 5 100% 100%
Play
P(Y) & P(N)
Yes 9 9/14
No 5 5/14
Total 14 100%
Naïve Bayes Classifier
P(Yes) = 0.031 / 0.031 + 0.0837 => Normalize
P(Yes) = 0.27
P(No) = 1 – 0.27
P(No) = 0.73
Output = argmax(P)
Output = 0.73 => No
Python Implementation
Problem
The problem is same as before

You are given data by your manager of customers who have previously bought some older makes of your company’s
SUV. The data includes a total of 400 instances

Independent variables

• Age
• Estimated salary

Dependent Variable

•Purchased (0 = no SUV purchased, 1 = SUV purchased)

Your company has just introduced a new SUV vehicle.

You are asked to predict who will buy the new SUV vehicle
Steps
Split the dataset
Import the Import the into training Feature scaling
libraries dataset and testing
samples

Training Predicting the Predicting the Making the


the new results Test set confusion matrix
model results

Visualize the
Visualize the
training set
results test set results
Importing libraries and dataset +
dividing the dataset into train and test sets
Scaling the
data
• We do not need to scale the dependent variable ‘purchased’ as it is already in the
form of ‘0’ and ‘1’.
• Remember scaling is always applied after splitting the dataset into test and train sets,
because we want to avoid data leakage.
Training the dataset
Predicting unknown
values
Predicting results on the test data
Confusion matrix and
accuracy

• Accuracy = 90%
Decision boundary Training
set
Decision boundary test set

You might also like