Professional Documents
Culture Documents
Naive Ba Yes
Naive Ba Yes
Naive Ba Yes
Terminology/Notation Primer
X and Y (two different variables)
Joint probability: P(X=x, Y=y)
The probability that variable X takes on the value x and
variable Y has the value y
Conditional probability: P( Y=y | X=x )
Probability that variable Y has the value y, given that
variable X takes on the value x
Given that I’m observed with an umbrella, what’s the probability that it will rain today?
Terminology/Notation Primer
Single Probability:
“X has the value x”
Joint Probability:
“X and Y”
Conditional Probability:
“Y” given observation of “X”
Relation of Joint and Conditional Probabilities:
Terminology/Notation Primer
Bayes’ Theorem:
Predicted Probability Example
Scenario:
1. A doctor knows that meningitis causes a stiff neck 50% of the
time
2. Prior probability of any patient having meningitis is 1/50,000
3. Prior probability of any patient having a stiff neck is 1/20
If a patient has a stiff neck, what’s the probability that they
have meningitis?
• A doctor knows that meningitis causes a stiff neck 50% of the
Predicted Probability Example
time
• Prior probability of any patient having meningitis is 1/50,000
• Prior probability of any patient having a stiff neck is 1/20
Apply Bayes’ Rule:
If a patient has a stiff neck, what’s the probability that they
have meningitis?
Interested in:
“given Z, what is
the joint
probability of X
and Y?”
Naïve Bayes Classifier
Before (simple Bayes’ rule):
Single predictor variable X
Example: will compute P(E=Yes | Status, Income, Refund) and P(E=No | Status, Income, Refund)
• Find which one is greater (greater likelihood)
Cannot compute / hard to compute:
Can compute from training data:
P(Evade=yes)
Tid Refund M arital Taxable
1 Yes
Status
Single
Incom e
125K
Evade
No
= 3/10
2
3
No
No
M arried
Single
100K
70K
No
No
P(Evade=no)
4
5
Yes
No
M arried
Divorced
120K
95K
No
Yes
=7/10
6 No M arried 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No M arried 75K No
10 No Single 90K Yes
10
Estimating Conditional Probabilities for
Categorical Attributes
P(Refund=yes|Evade=no)
Tid Refund M arital Taxable
Status Incom e Evade = 3/7
1 Yes Single 125K No
2 No M arried 100K No P(Status=married|Evade=yes)
3 No Single 70K No
4 Yes M arried 120K No =0/3
5 No Divorced 95K Yes
6 No M arried 60K No Yikes!
7 Yes Divorced 220K No
8 No Single 85K Yes
Will handle the 0% probability
9
10
No
No
M arried
Single
75K
90K
No
Yes
later
10
Estimating Conditional Probabilities for
Continuous Attributes
P(Refund=YES|NO) = 3/7
P(Refund=NO|NO) = 4/7
P(Refund=YES|YES) = 0/3
Tid Refund M arital Taxable
Status Incom e Evade P(Refund=NO|YES) = 3/3
1 Yes Single 125K No
P(Status=SINGLE|NO) = 2/7
2 No M arried 100K No
P(Status=DIVORSED|NO) = 1/7
3 No Single 70K No P(Status=MARRIED|NO) = 4/7
4 Yes M arried 120K No P(Status=SINGLE|YES) = 2/3
5 No Divorced 95K Yes P(Status=DIVORSED|YES) = 1/3
6 No M arried 60K No P(Status=MARRIED|YES) = 0/3
7 Yes Divorced 220K No
8 No Single 85K Yes
For taxable income:
No
P(Income=above 101K|NO) = 3/7
9 No M arried 75K
P(Income=below101K|NO) = 4/7
10 No Single 90K Yes
10
P(NO) = 7/10
P(YES) = 3/10