Professional Documents
Culture Documents
Lecture 5-Naïve Bayes
Lecture 5-Naïve Bayes
Lecture 5-Naïve Bayes
Hanaa Bayomi
Updated By: Prof Abeer ElKorany
Notation:
Missing Features
P( X | C ) P(C )
P(C | X ) =
P( X )
P( x1 , x2 , , xn | c j ) P(c j )
= argmax
c j C P( x1 , x2 , , xn )
= argmax P( x1 , x2 , , xn | c j ) P(c j )
c j C
The Naïve Bayes Model
▪ The Naïve Bayes Assumption: Assume that the effect of the value of
the predictor (X) on a given class ( C ) is independent of the values
of other predictors.
▪ This assumption is called class conditional independence
P( x1 , x2 ,, xn | C ) = P( x1 | C ) P( x2 | C ) ..... P( xn | C )
P( x1 , x2 ,, xn | C ) = n
i =1 P( xi | C )
Naïve Bayes Algorithm
• Naïve Bayes Algorithm (for discrete input attributes) has two phases
– 1. Learning Phase: Given a training set S, Learning is easy, just
create probability tables.
For each target value of ci (c i = c1 , , c L )
Pˆ (C = ci ) estimate P(C = ci ) with examples in S;
For every attribute value x jk of each attribute X j ( j = 1, , n; k = 1, , N j )
Pˆ ( X j = x jk |C = ci ) estimate P( X j = x jk |C = ci ) with examples in S;
Output: conditional probability tables; for X j , N j L elements
[Pˆ (a1 |c* ) Pˆ (an |c* )]Pˆ (c* ) [Pˆ (a1 |c) Pˆ (an |c)]Pˆ (c), c c* , c = c1 , , cL
Classification is easy, just multiply probabilities
How to Estimate Probabilities from Data?
How to Estimate
al
Probabilities
al us from Data?
oric oric uo
teg teg ntin a s s
ca ca co cl
Tid Refund Marital Taxable ▪ Class: P(C) = Nc/N
Status Income Evade
▪ e.g., P(No) = 7/10,
1 Yes Single 125K No P(Yes) = 3/10
2 No Married 100K No
3 No Single 70K No
▪ For discrete Attribute/Features:
4 Yes Married 120K No
5 No Divorced 95K Yes
P(Ai | Ck) = |Aik|/ Nc
k
6 No Married 60K No ▪ where |Aik| is number of instances
7 Yes Divorced 220K No having attribute Ai and belongs to
8 No Single 85K Yes class Ck
9 No Married 75K No ▪ Examples:
P(Status=Married|No) = 4/7
10 No Single 90K Yes
10
P(Refund=Yes|Yes)=0
Example1
• Example: Play Tennis
Example1
• Learning Phase
19
Naïve Bayes: Continuous Features
▪ 𝑃(𝑋𝑖 |𝑌) is Gaussian
▪ Training: estimate mean and standard deviation
▪ 𝜇𝑖 = 𝐸 𝑋𝑖 𝑌 = 𝑦
▪ 𝜎𝑖2 = 𝐸 𝑋𝑖 − 𝜇𝑖 2 𝑌=𝑦
𝑋1 𝑋2 𝑋3 𝑌
2 3 1 1
−1.2 2 0.4 1
1.2 0.3 0 0
2.2 1.1 0 1
20
Naïve Bayes: Continuous Features
▪ 𝑃(𝑋𝑖 |𝑌) is Gaussian
▪ Training: estimate mean and standard deviation
▪ 𝜇𝑖 = 𝐸 𝑋𝑖 𝑌 = 𝑦
𝑋1 𝑋2 𝑋3 𝑌
▪ 𝜎𝑖2 = 𝐸 𝑋𝑖 − 𝜇𝑖 2 𝑌=𝑦
2+ −1.2 +2.2 2 3 1 1
▪ 𝜇11 = 𝐸 𝑋1 𝑌 = 1 = =1
3 −1.2 2 0.4 1
1.2 1.2 0.3 0 0
▪ 𝜇10 = 𝐸 𝑋1 𝑌 = 0 = = 1.2
1
2.2 1.1 0 1
2 2−1 2 + −1.2 −1 2 + 2.2 −1 2
▪ 𝜎11 = 𝐸 𝑋1 − 𝜇1 𝑌 = 1 = = 2.43
3
1.2 −1.2 2
2
▪ 𝜎10 = 𝐸 𝑋1 − 𝜇1 𝑌 = 0 = =0
1
21
Naïve Bayes
• Example: Continuous-valued Features
– Temperature is naturally of continuous value.
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1
– Estimate mean and variance for each class
1 N 1 N Yes = 21.64, Yes = 2.35
= xn , = ( xn − )2
2
No = 23.88, No = 7.09
N n=1 N n=1
ˆ 1 ( x − 21 . 64 ) 2
1 ( x − 21 . 64 ) 2
P( x | Yes ) = exp − = exp −
2.35 2 2 2.35 2.35 2
2
11.09
ˆ 1 ( x − 23 .88 ) 2
1 ( x − 23 .88 ) 2
P( x | No) =
exp −
=
exp −
7.09 2 2 7.09 7.09 2
2
50.25
22
Zero conditional probability
• If no example contains the feature value
– In this circumstance, we face a zero conditional probability problem
during test
Pˆ ( x1 | ci ) Pˆ (a jk | ci ) Pˆ ( xn | ci ) = 0 for x j = a jk , Pˆ (a jk | ci ) = 0
n + mp
Pˆ (a jk | ci ) = c (m-estimate)
n+m
nc : number of training examples for which x j = a jk and c = ci
n : number of training examples for which c = ci
p : prior estimate (usually, p = 1 / t for t possible values of x j )
m : weight to prior (number of " virtual" examples, m 1) 23
Zero conditional probability
• Example: P(outlook=overcast|no)=0 in the play-tennis dataset
– Adding m “virtual” examples (m: up to 1% of #training example)
• In this dataset, # of training examples for the “no” class is 5.
1
Conclusion
▪ Naïve Bayes is based on the independence assumption
▪ Training is very easy and fast; just requiring considering each attribute in each
class separately
▪ Test is straightforward; just looking up tables or calculating conditional
probabilities with normal distributions