Professional Documents
Culture Documents
Naïve Bayes Classifier: Ke Chen
Naïve Bayes Classifier: Ke Chen
Naïve Bayes Classifier: Ke Chen
Ke Chen
• Background
• Probability Basics
• Probabilistic Classification
• Naïve Bayes
– Principle and Algorithms
– Example: Play Tennis
• Relevant Issues
• Summary
2
COMP24111 Machine Learning
Background
• There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data
Example: perceptron with the cross-entropy cost
c) Make a probabilistic model of data within each class
Examples: naive Bayes, model based classifiers
• a) and b) are examples of discriminative classification
• c) is an example of generative classification
• b) and c) are both examples of probabilistic classification
3
COMP24111 Machine Learning
Probability Basics
• Prior, conditional and joint probability for random variables
– Prior probability: P(X )
– Conditional probability: P( X1 |X2 ), P(X2 | X1 )
– Joint probability: X ( X1 , X2 ), P( X ) P(X1 ,X2 )
– Relationship: P(X1 ,X2 ) P( X2 | X1 )P( X1 ) P( X1 | X2 )P( X2 )
– Independence: P( X2 | X1 ) P( X2 ), P( X1 | X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2 )
• Bayesian Rule
4
COMP24111 Machine Learning
Probability Basics
• Quiz: We have two six-sided dice. When they are tolled, it could end up
with the following occurance: (A) dice 1 lands on side “3”, (B) dice 2 lands
on side “1”, and (C) Two dice sum to eight. Answer the following questions:
1) P( A) ?
2) P(B) ?
3) P(C) ?
4) P( A | B) ?
5) P(C | A) ?
6) P( A , B) ?
7) P( A , C ) ?
8) Is P( A , C ) equal to P(A) P(C) ?
5
COMP24111 Machine Learning
Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model
P(C |X ) C c1 , , c L , X (X1 , , Xn )
P(c1 | x ) P(c 2 | x) P( c L | x )
Discriminative
Probabilistic Classifier
x1 x2 xn
x ( x1 , x2 , , xn )
6
COMP24111 Machine Learning
Probabilistic Classification
• Establishing a probabilistic model for classification (cont.)
– Generative model
P( X |C ) C c1 , , c L , X (X1 , , Xn )
P( x | c1 ) P( x | c 2 ) P( x | c L )
x ( x1 , x2 , , xn )
7
COMP24111 Machine Learning
Probabilistic Classification
• MAP classification rule
– MAP: Maximum A Posterior
– Assign x to c* if
P(C c * |X x ) P(C c |X x ) c c * , c c1 , , c L
8
COMP24111 Machine Learning
Naïve Bayes
• Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C )P(C )
9
COMP24111 Machine Learning
Naïve Bayes
• Algorithm: Discrete-Valued Features
– Learning Phase: Given a training set S,
For each target value of ci (ci c1 , , c L )
Pˆ (C ci ) estimate P(C ci ) with examples in S;
For every feature value x jk of each feature X j ( j 1, , n; k 1, , N j )
Pˆ ( X j x jk |C ci ) estimate P( X j x jk |C ci ) with examples in S;
10
COMP24111 Machine Learning
Example
• Example: Play Tennis
11
COMP24111 Machine Learning
Example
• Learning Phase
Outlook Play=Yes Play=No Temperature Play=Yes Play=No
Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5
12
COMP24111 Machine Learning
Example
• Test Phase
– Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
13
COMP24111 Machine Learning
Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
– MAP rule
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
14
COMP24111 Machine Learning
Naïve Bayes
• Algorithm: Continuous-valued Features
– Numberless values for a feature
– Conditional probability often modeled with the normal distribution
1 ( X j ji )2
Pˆ (X j |C ci ) exp
2 ji 2 2
ji
ji : mean (avearage) of feature values X j of examples for which C ci
ji : standard deviation of feature values X j of examples for which C ci
18
COMP24111 Machine Learning
Exercise
• Using Naive Bayes,
please classify
buys_computer for
a case:
• (age=youth,
income=medium,
student=yes,
credit_rating=fair).
19
COMP24111 Machine Learning
Thank You
20
COMP24111 Machine Learning