Naive Bayes Classifier

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Naive Bayes Classifier + NLP

● Extremely fast classification algorithm as compared to others.


● Particularly very useful for very large datasets.
● Mostly used in the text classification (NLP)
● It works on the Bayes’ theorem of probability to predict the class of unknown datapoints
(dataset).
● Belongs to the family of generative learning algorithms.
● Unlike logistic regression it doesn’t learn which features are most crucial for
distinguishing between classes.
● Widely used in text classification, spam filtering and recommendation systems.
● Very popular supervised machine learning algorithm.
● The approach is based on the assumption that the features of the input data are
conditionally independent given the class, allowing to make predictions accurately and
quickly.
● It applies Bayes’ theorem, the theorem is based on the probability of hypothesis
(assumption of datapoint belonging to a class), given the data and some prior
knowledge.
● It assumes that the features of input data are independent of each other, which is often
not true in real-world scenarios.
● Understanding Bayes’ theorem is very crucial for understanding the Naive Bayes
Classifier algorithm.
● Understanding Bayes’ Theorem with an example


○ Bayes theorem says that impact of B on A is the same as impact of A on B
● Understand Naive Bayes Algorithm with example:
○ If a fruit is red, round and 3 inches wide, we might call it an apple. Even if these
things are related, each one helps us decide it’s probably an apple. That’s why it
is called “naive”.
○ Bayes theorem provides as way of computing posterior probability P(c|X) from
P(c), P(X) and P(X|c). Look at the equation below:


○ P(c|X) - It’s Posterior probability of class (c, target) given predictor (X, attributes)
○ P(c) - It’s the prior probability of class
○ P(X|c) - It’s likelihood of the predictor (X) given class (c, target)
○ P(X) - It’s the probability of the predictor.
● So our goal is to calculate P(c|X), in the equation often the denominator is ignored to
make the equation simple as shown in the above image.
● How does the Naive Bayes Classifier algorithm work ?:
○ Convert the data set into the frequency table
○ Create likelihood table by finding the probabilities
○ Use Naive Bayesian equation to calculate the posterior probability for each class.
○ The class with the highest posterior probability is the outcome of the prediction.
○ Example refer below image:


○ Problem: Players will play if the weather is sunny. Is this statement correct?
○ First calculate probability of players will play given that weather is Sunny
i. Posterior Probability of players will play given that weather is Sunny
P(Yes|Sunny) = P(Sunny|Yes) * P(Yes) / P(Sunny)
ii. Likelihood of weather being Sunny given that players will play
P(Sunny|Yes) = 3/9 = 0.33
iii. Prior probability of players playing P(Yes) = 9/14 = 0.64
iv. Prior probability of weather being Sunny P(Sunny) = 5/14 = 0.36
v. Putting all the values in the equation i) = 0.33 * 0.64 / 0.36 = 0.59
○ Now calculate probability of players will not play given that weather is Sunny:
i. Posterior Probability of players won’t play given that weather is
Sunny P(No|Sunny) = P(Sunny|No) * P(No) / P(Sunny)
ii. Likelihood of weather being Sunny given that players won’t play
P(Sunny|No) = 2/5 = 0.4
iii. Prior probability of players won’t play P(No) = 5/14 = 0.36
iv. Prior probability of weather being Sunny P(Sunny) = 5/14 = 0.36
v. Putting all the values in the equation i) = 0.4 * 0.36 / 0.36 = 0.4
○ Now it’s clear that Posterior Probability P(Yes|Sunny) is higher than the
Posterior Probability P(No|Sunny) hence the prediction will be that players
will play if the weather is Sunny.
○ Similarly we can calculate Posterior probability for multiclass .
● Applications of Naive Bayes Algorithm:
○ Real Time prediction
○ Multiclass Prediction
○ Text Classification/Spam Filtering/Sentiment Analysis
○ Recommendation System
● Some important terminologies in NLP:
○ Document: A document is a single text data point. For Example, a review of a
particular product by the user.
○ Corpus: It is a collection of all the documents present in our dataset.
○ Lemmas: It’s the cleaned text without punctuation and stop words.
○ Vectorization (Count Vectorizor): It’s a technique to convert raw textual data
into numerical format that the machine can understand. It creates the list of
unique words (often called as features) of the document (text), converts them into
numeric and each word its frequency of the document.
○ Bag-of-words(BoW): It represents the collection of unique words in the whole
dataset assigned with frequency.
○ Term Frequency and Inverse Document Frequency (TF-IDF) : It is used to
assess the importance of words in a document relative to a collection of
documents. It can transform the frequency in Bag-of-Words with the optimal
weights. The weights assigned to words helps the model in better classification. It
gives higher weights to the word having less importance and lower weights to the
more important words.
i. TF: Number of times a word w appears in a document
ii. IDF: log_e(Total number of documents / Number of documents with word
w in it)
iii. TF-IDF: TF * IDF

You might also like