Naïve Bayes Classifier (Week 8)

Naïve Bayes Classifier
(Week 8)
Dr Muhammad Atif Tahir
Professor
School of Computer Science
National University of Computing & Emerging Sciences
Karachi Campus
Review Week 7
 Decision Tree Classifier
 Information Gain
 Gini Index
Classification Techniques
 Decision Tree based Methods

 Rule-based Methods
 Memory based reasoning
 Neural Networks
 Naïve Bayes and Bayesian Belief Networks
 Support Vector Machines
Bayes Classifier
 A probabilistic framework for solving
classification problems
 Conditional Probability: P ( A, C )
P (C | A) 
P ( A)
P ( A, C )
P( A | C ) 
P (C )
 Bayes theorem:
P( A | C ) P(C )
P(C | A) 
P( A)
Example of Bayes Theorem
 Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20
 If a patient has stiff neck, what’s the probability he/she

has meningitis?
P( S | M ) P( M ) 0.5 1 / 50000
P( M | S )    0.0002
P( S ) 1 / 20
Bayesian Classifiers
 Consider each attribute and class label as random
variables
 Given a record with attributes (A1, A2,…,An)

– Goal is to predict class C
– Specifically, we want to find the value of C that maximizes
P(C| A1, A2,…,An )
 Can we estimate P(C| A1, A2,…,An ) directly from data?

Bayesian Classifiers
 Approach:
– compute the posterior probability P(C | A1, A2, …, An) for all
values of C using the Bayes theorem
P( A A  A | C ) P(C )
P(C | A A  A )  1 2 n
P( A A  A )
1 2 n
1 2 n
– Choose value of C that maximizes

P(C | A1, A2, …, An)
– Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)
 How to estimate P(A1, A2, …, An | C )?

 Assume independence among attributes Ai when class is

given:
– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)
– Can estimate P(Ai| Cj) for all Ai and Cj.
– New point is classified to Cj if P(Cj)  P(Ai| Cj) is

maximal.
How to Estimate Probabilities from Data?
a l a l s
u
o ric o ric u o
eg eg it n s
c at c at c on cl
s
a  Class: P(C) = Nc/N
Tid Refund Marital Taxable – e.g., P(No) = 7/10,
Status Income Evade P(Yes) = 3/10
1 Yes Single 125K No
2 No Married 100K No  For discrete attributes:
3 No Single 70K No
4 Yes Married 120K No
P(Ai | Ck) = |Aik|/ Nck
5 No Divorced 95K Yes – where |Aik| is number of
6 No Married 60K No instances having attribute Ai
7 Yes Divorced 220K No and belongs to class Ck
8 No Single 85K Yes
– Examples:
9 No Married 75K No
P(Status=Married|No) = 4/7
10 No Single 90K Yes
10
P(Refund=Yes|Yes)=0
How to Estimate Probabilities from Data?
 For continuous attributes:

– Discretize the range into bins
 one ordinal attribute per bin
 violates independence assumption k
– Two-way split: (A < v) or (A > v)
 choose only one of the two splits as new attribute
– Probability density estimation:
 Assume attribute follows a normal distribution
 Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
 Once probability distribution is known, can use it to estimate
the conditional probability P(Ai|c)
How to rEstimate
ic al
r ic al
u
Probabilities
o us from Data?
g o g o in s
te te n t as
l
ca ca c o c
Tid Refund Marital
Status
Taxable
Income Evade
 Normal distribution:
( Ai   ij ) 2
1 
1 Yes Single 125K No
P( A | c )  e 2  ij2
2
i j 2
2 No Married 100K No
ij
3 No Single 70K No
4 Yes Married 120K No – One for each (Ai,ci) pair
5 No Divorced 95K Yes
6 No Married 60K No
 For (Income, Class=No):
7 Yes Divorced 220K No – If Class=No
8 No Single 85K Yes  sample mean = 110
9 No Married 75K No  sample variance = 2975
10 No Single 90K Yes
10
1 
( 120 110 ) 2
P( Income  120 | No)  e 2 ( 2975 )

 0.0072
2 (54.54)
Example of Naïve Bayes Classifier
Given a Test Record:
X  (Refund  No, Married, Income  120K)
naive Bayes Classifier:
P(Refund=Yes|No) = 3/7  P(X|Class=No) = P(Refund=No|Class=No)

P(Refund=No|No) = 4/7  P(Married| Class=No)
P(Refund=Yes|Yes) = 0  P(Income=120K| Class=No)
P(Refund=No|Yes) = 1 = 4/7  4/7  0.0072 = 0.0024
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7  P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Marital Status=Single|Yes) = 2/7  P(Married| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7  P(Income=120K| Class=Yes)
P(Marital Status=Married|Yes) = 0 = 1  0  1.2  10-9 = 0
For taxable income:
If class=No: sample mean=110 Since P(X|No)P(No) > P(X|Yes)P(Yes)
sample variance=2975 Therefore P(No|X) > P(Yes|X)
If class=Yes: sample mean=90
sample variance=25 => Class = No
 If one of the conditional probability is zero, then

the entire expression becomes zero
 Probability estimation:
N ic
Original : P( Ai | C ) 
Nc
c: number of classes
N ic  1
Laplace : P( Ai | C )  p: prior probability
Nc  c
m: parameter
N ic  mp
m - estimate : P( Ai | C ) 
Nc  m
Example of Naïve Bayes Classifier
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
python no no no no non-mammals
salmon no no yes no non-mammals A: attributes
whale yes no yes no mammals
frog no no sometimes yes non-mammals M: mammals
komodo no no no yes non-mammals
bat yes yes no yes mammals N: non-mammals
pigeon no yes no yes non-mammals
cat yes no no yes mammals
leopard shark yes no yes no non-mammals 6 6 2 2
turtle no no sometimes yes non-mammals P( A | M )      0.06
penguin no no sometimes yes non-mammals 7 7 7 7
porcupine yes no no yes mammals 1 10 3 4
eel no no yes no non-mammals P( A | N )      0.0042
salamander no no sometimes yes non-mammals 13 13 13 13
gila monster no no no yes non-mammals 7
platypus no no no yes mammals P( A | M ) P( M )  0.06   0.021
owl no yes no yes non-mammals 20
dolphin yes no yes no mammals 13
eagle no yes no yes non-mammals P( A | N ) P( N )  0.004   0.0027
20
Give Birth Can Fly Live in Water Have Legs Class P(A|M)P(M) >
yes no yes no ? P(A|N)P(N)
=> Mammals
Exercise
Give Birth Can Fly Live in Water Have Legs Class
yes yes yes yes ?
6 1 2 5
P( A | M )      0.025
7 7 7 7
1 3 3 9
P( A | N )      0.0028
13 13 13 13
7
P( A | M ) P( M )  0.025   0.0088
20
13
P( A | N ) P( N )  0.004   0.0018
20
P(A|M)P(M) >
P(A|N)P(N)
=> Mammals
Naïve Bayes (Summary)
 Robust to isolated noise points

 Handle missing values by ignoring the instance
during probability estimate calculations
 Robust to irrelevant attributes
 Independence assumption may not hold for
some attributes
– Use other techniques such as Bayesian Belief
Networks (BBN)
References
 Introduction to Data Mining by Tan, Steinbach, Kumar

(Lecture Slides)
 http://robotics.stanford.edu/~ronnyk/glossary.html
 http://www.cs.tufts.edu/comp/135/Handouts/introduction-
lecture-12-handout.pdf
 http://en.wikipedia.org/wiki/Confusion_matrix
 http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_mat
rix/confusion_matrix.html
Questions!

Naïve Bayes Classifier (Week 8)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Naïve Bayes Classifier (Week 8)

Uploaded by

Copyright:

Available Formats

Naïve Bayes Classifier

 Decision Tree Classifier

 Decision Tree based Methods

 If a patient has stiff neck, what’s the probability he/she

 Given a record with attributes (A1, A2,…,An)

 Can we estimate P(C| A1, A2,…,An ) directly from data?

– Choose value of C that maximizes

– Equivalent to choosing value of C that maximizes

 How to estimate P(A1, A2, …, An | C )?

 Assume independence among attributes Ai when class is

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj)  P(Ai| Cj) is

 For continuous attributes:

P( Income  120 | No)  e 2 ( 2975 )

P(Refund=Yes|No) = 3/7  P(X|Class=No) = P(Refund=No|Class=No)

 If one of the conditional probability is zero, then

 Robust to isolated noise points

 Introduction to Data Mining by Tan, Steinbach, Kumar

You might also like