Lecture - 4 Classification (Naive Bayes)

Classification : Naïve Bayes
Adama Science and Technology University

School of Electrical Engineering and
Computing
Department of CSE
Dr. Mesfin Abebe Haile (2020)
Outline
 What is Naïve Bayes

 Pros and Cons of Naïve Bayes
 Probability theory
 Conditional probability
 Naïve Bayes classification
11/07/22 2
Probability Theory : Naïve Bayes
 In both kNN and DT; we asked the classifiers to make hard decisions.
 We asked for a defined answer for the question.
 Asking the best guess about the class is better. (probability)
 Probability theory forms the basis for many machine learning algorithms.
 Probability theory can help us in classify things.
11/07/22 3
Classifying with Bayesian Decision Theory:

 Pros:
 Works with small amount of data, handles multiple classes.
 Cons:
 Sensitive to how the input data is prepared.
 Works with:
 Nominal values
11/07/22 4
 Naïve Bayes is a subset of Bayesian Decision Theory.
 The decision tree wouldn’t be very successful, and kNN would

require a lot of calculations compared to the simple probability
calculation.
 Conditional Probability:
 P(gray/bucket B) = P(gray and bucket B) / P(bucket B)
11/07/22 5
Figure 1: Seven stones in two buckets
11/07/22 6
 Conditional Probability:
 Calculating the probability of a gray stone, given that the unknown stone
comes from bucket B.
 P(gray / bucket B) = 1/3
 P(gray / bucket A) = 2/4
 To formalize how to calculate the conditional probability, we can say:
 P(gray / bucket B) = P(gray and bucket B) / P(bucket B)
 P(gray and bucket B) = 1/7 (gray stone in B / total stone )
 P(bucket B) = 3/7 (Three stone in bucket B) – Simple
11/07/22 7
 Conditional Probability
 P(gray / bucket B) = P(gray and bucket B) / P(bucket B)
 P(gray / bucket B) = (1/7) / (3/7)
 P(gray / bucket B) = 1/3
 Another useful way to manipulate conditional probabilities is known as Bayes’ rule.

 If we have P(x|c) but want to have P(c|x)
11/07/22 8
Classifying with Conditional
Probabilities
 Bayesian decision theory can told us to find the two probabilities:
 If P1(x, y) > P2(x, y) , then the class is 1.
 If P1(x, y) < p2(x, y), then the class is 2.
 What we really need to compare p(c1|x,y) and p(c2|x,y).

 Given a point identified as x,y; what is the probability it came from class c1?
 What is the probability it came from class c2?
11/07/22 9
Classifying with Conditional
Probabilities
 Posterior = (likelihood * prior) / evidence
 With these definitions, we can define the Bayesian classification rule:

 If P(c1 | x, y) > P(c2 | x, y) , the class is c1.
 If P(c1 | x, y) < p(c2 | x, y), the class is c2.
11/07/22 10
Uses of Naïve Bayes Classification
 Application of Naïve Bayes:

 Naïve Bayes text classification
 Spam filtering
 Hybrid recommender system (Collaborative and Content based filtering)
 Online application
 Bayesian reasoning is applied to decision making and inferential statistics

that deals with probability inference.
 It used the knowledge of prior events to predict future events.
11/07/22 11
Example One
11/07/22 Figure 2: Example training data 12

Example One
 X = ( age = youth, income = medium, student = yes, credit_rating =

fair)
 A person belonging to tuple X will buy a computer?
 Maximum Posteriori Hypothesis :

 P(Ci | X) = P(X | Ci) P(Ci) / P(X)
 Maximum P(Ci | X) = P(X | Ci) P(Ci) as P(X) is constant
11/07/22 13
Example One
 P(C1=yes) = P(buys_computer = yes) = 9/14 = 0.643

 P(C2=no) = P(buys_computer = no) = 5/14 = 0.357
 P(age=youth /buys_computer = yes) = 2/9 = 0.222
 P(age=youth /buys_computer = no) = 3/5 = 0.600
 P(income=medium /buys_computer = yes) = 4/9 = 0.444
 P(income=medium /buys_computer = no) = 2/5 = 0.400
 P(student=yes /buys_computer = yes) = 6/9 = 0.667
 P(student=yes/buys_computer = no) = 1/5 = 0.200
 P(credit rating=fair /buys_computer = yes) = 6/9 = 0.667
 P(credit rating=fair /buys_computer = no) = 2/5 =0.400
11/07/22 14
Example One
 P(X/Buys a computer = yes) = P(age=youth /buys_computer = yes) *

P(income=medium /buys_computer = yes) * P(student=yes
/buys_computer = yes) * P(credit rating=fair /buys_computer = yes)
 P(X/Buys a computer = yes) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044
 P(X/Buys a computer = No) = 0.600 * 0.400 * 0.200 * 0.400 = 0.019
11/07/22 15
Example One
 Find class Ci that Maximizes P(X/Ci) * P(Ci):

 P(X/Buys a computer = yes) * P(buys_computer = yes)
 = 0.044 * 0.643
 = 0.028
 P(X/Buys a computer = No) * P(buys_computer = no)
 = 0.019 * 0.357
 = 0.007
 Prediction : Buys a computer for Tuple X. (x can buy a computer)
11/07/22 16
Example Two
 Consider a set of documents, each of which is related either to

Sports (S ) or to Informatics (I).
 Given a training set of 11 documents, we would like to estimate a

Naive Bayes classifier, using the Bernoulli document model, to
classify unlabelled documents as S or I.
 We define a vocabulary of eight words:
11/07/22 17
Example Two
 Types of Naïve Bayes:
11/07/22 18
Example Two
Figure 3: Vocabulary of eight words
11/07/22 19
Example Two
 Thus each document is represented as an 8-dimensional binary vector.

 The training data is presented below as a matrix for each class, in which each row
represents an 8-dimensional document vector.
11/07/22 20
Example Two
 Classify the following into Sports or Informatics using a Naive Bayes classifier.
 b1 = (1, 0, 0, 1, 1, 1, 0, 1) = S or I
 b2 = (0, 1, 1, 0, 1, 0, 1, 0) = S or I
11/07/22 21
Example Two
 The total number of documents in the training set N =11; NS =6, NI

=5.
 We can estimate the prior probabilities from the training data as:
 P(S) = 6/11
 P(I) = 5/11
11/07/22 22
Example Two
 The word count in the training data are:
11/07/22 23
Example Two
 We can estimate the word likelihood using:
11/07/22 24
Example Two
 The word likelihood for class I:
11/07/22 25
Example Two
 To compute the posterior probabilities of the two test vectors and hence classify them.
 b1 = (1, 0, 0, 1, 1, 1, 0, 1)
 P(S| b1) = P(wt | S) x P(S)
 (1/2 X 5/6 X 2/3 X ½ X ½ X 2/3 X 1/3 X 2/3) x (6/11)
 (5/891) = 5.6 x 10-3
 P(I| b1) = P(wt | I) x P(I)
 (1/5 X 2/5 X 2/5 X 1/5 X 1/5 X 1/5 X 2/5 X 1/5) x (5/11)
 (8/859375) = 9.3 x 10-6
 Classify this document as S.
11/07/22 26
Example Two
 To compute the posterior probabilities of the two test vectors and hence classify them.
 b2 = (0, 1, 1, 0, 1, 0, 1, 0)
 P(S| b2) = P(wt | S) x P(S)
 (1/2 X 1/6 X 1/3 X ½ X ½ X 1/3 X 2/3 X 1/3) x (6/11)
 (12/14256) = 8.4 x 10-4
 P(I| b2) = P(wt | I) x P(I)
 (4/5 X 3/5 X 3/5 X 4/5 X 1/5 X 4/5 X 3/5 X 4/5) x (5/11)
 (34560/4296875) = 8.0 x 10-3
 Classify this document as I.
11/07/22 27
Naïve Bayes: Syntax
 Import the class containing the classification method:
11/07/22 28
Summary
 Using probabilities can sometimes be more effective than using hard rules for classification.
 Bayesian probability and Bayes’ rule gives us a way to estimate unknown probabilities from known values.
 You can reduce the need for a lot of data by assuming conditional independence among the features in your data.
 The assumption we make is that the probability of one word doesn’t depend on any other words in the document.
11/07/22 29
Summary
 Despite its incorrect assumptions, naïve Bayes is effective at classification.
 Underflow is one problem that can be addressed by using the logarithm of probabilities in your
calculations.
11/07/22 30
Question & Answer
11/07/22 31
Thank You !!!
11/07/22 32
Assignment Three
 Predict outcome for the following: x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
11/07/22 33

Lecture - 4 Classification (Naive Bayes)

Uploaded by

Copyright:

Available Formats

You might also like

Lecture - 4 Classification (Naive Bayes)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture - 4 Classification (Naive Bayes)

Uploaded by

Copyright:

Available Formats

Classification : Naïve Bayes

Adama Science and Technology University

 What is Naïve Bayes

 Probability theory can help us in classify things.

Classifying with Bayesian Decision Theory:

 Naïve Bayes is a subset of Bayesian Decision Theory.

 The decision tree wouldn’t be very successful, and kNN would

Figure 1: Seven stones in two buckets

 Another useful way to manipulate conditional probabilities is known as Bayes’ rule.

 What we really need to compare p(c1|x,y) and p(c2|x,y).

 Posterior = (likelihood * prior) / evidence

 With these definitions, we can define the Bayesian classification rule:

 Application of Naïve Bayes:

 Bayesian reasoning is applied to decision making and inferential statistics

11/07/22 Figure 2: Example training data 12

 X = ( age = youth, income = medium, student = yes, credit_rating =

 Maximum Posteriori Hypothesis :

 P(C1=yes) = P(buys_computer = yes) = 9/14 = 0.643

 P(X/Buys a computer = yes) = P(age=youth /buys_computer = yes) *

 P(X/Buys a computer = yes) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044

 P(X/Buys a computer = No) = 0.600 * 0.400 * 0.200 * 0.400 = 0.019

 Find class Ci that Maximizes P(X/Ci) * P(Ci):

 Prediction : Buys a computer for Tuple X. (x can buy a computer)

 Consider a set of documents, each of which is related either to

 Given a training set of 11 documents, we would like to estimate a

 We define a vocabulary of eight words:

 Types of Naïve Bayes:

Figure 3: Vocabulary of eight words

 Thus each document is represented as an 8-dimensional binary vector.

 The total number of documents in the training set N =11; NS =6, NI

 The word count in the training data are:

 We can estimate the word likelihood using:

 The word likelihood for class I:

 Import the class containing the classification method:

 Despite its incorrect assumptions, naïve Bayes is effective at classification.

You might also like