Naive Bayes Part 1

Classification Methods:
Naïve Bayes
1
Probability Problem
• A factory produces widgets on three machines: A, B, and C

• 50% are produced on A, 30% on B, and 20% on C
• 1% of widgets from A are defective
• 2% from B are defective
• 4% from C are defective
• Suppose you are given a defective widget – what is the probability
that it was produced on machine A?
2
Solution
• Let P(A) be the probability that a widget is made on

A
• P(A) = 0.5; P(B) = 0.3; P(C) = 0.2
• P(D|A) = 0.01; P(D|B) = 0.02; P(D|C) = 0.04
• What is P(A|D)? 0.005/0.019
≈ 26%
𝑃 ( 𝐷| 𝐴 ) 𝑃 ( 𝐴 )
• Bayes Rule: 𝑃 ( 𝐴|𝐷 )=
𝑃 ( 𝐷)
𝑃 ( 𝐷 )= 𝑃 ( 𝐷| 𝐴) 𝑃 ( 𝐴 ) +𝑃 ( 𝐷|𝐵 ) 𝑃 ( 𝐵 )+ 𝑃 ( 𝐷|𝐶 ) 𝑃(𝐶)

0.01×0.5 + 0.02×0.3 + 0.04×0.2 = 0.019 or 1.9% P(B|D) ≈ 32%
P(C|D) ≈ 42% 3
Updating Beliefs
• Bayes theorem tells us how to update beliefs based

upon conditioning information
• Probability a widget is from machine A given that it is defective
• Previously we would have assigned probability 50% that the widget was from
machine A (a priori probability)
• Then we are informed that the widget is defective
• We deduce the probability that this widget is from A is 26%
(a posteriori probability)
4
Application
• In the training set the proportion of Class = 1 (Class = 0) is the prior

probability of Class = 1 (Class = 0)
• A new case with specified values for features (X) is to be labeled
• What is the class probability P(Class = 1|X)?
Bayes Rule can be applied
We need P(X|Class = 1) and P(X|Class = 0)
Proportion of Class = 1 (Class = 0) records with features X

5
Bayes Classifier
• Assigns each observation to the most likely class given its feature
values
• Assign a test observation with features to the class j for which is
largest
• The Bayes classifier minimizes the test error rate (called the Bayes
error rate)
• Since conditional distribution of Y given X is unknown, computing the
Bayes classifier is impossible
• Many methods attempt to estimate it
6
Bayes Decision Boundary
• Here the pink-blue boundary delineates the Bayes decision boundary

for some given conditional probability
• Note the irreducible error (Bayes error rate) for a sample
3
2
In general, we try
1
to approximate this
X2
-1 0
boundary from sample

-3
-3 -2 -1 0 1 2 3
X1
7
Naïve Bayes Decision Boundary
Naïve Bayes is a classification rule
3
based on Bayes theorem with an

2
assumption of independence among

1
X2
features.
-1 0
Here independence is violated and

-3
features are correlated.

-3 -2 -1 0 1 2 3
X1 Uncannily similar!
Confusion Matrix:
3
2
Yes No
1
X2
Yes 66 4
-1 0
No 7 23
-3
-3 -2 -1 0 1 2 3
X1
8
Example: Personal Loan Offer
• As part of customer acquisition efforts, Universal bank wants to run a

campaign for current customers to purchase a loan
• In order to improve target marketing, they want to find customers
that are most likely to accept the personal loan offer
• They use data from a previous campaign on 5000 customers, out of
which 480 (9%) accepted the personal loan offer
9 9
Personal Loan Data Description
• The data has information about the customers’ relationship with the
bank,
ID
as well asCustomer
some demographic
ID
information
Age Customer's age in completed years
Experience #years of professional experience
Income Annual income of the customer ($000)
ZIPCode Home Address ZIP code.
Family Family size of the customer
CCAvg Avg. spending on credit cards per month ($000)
Education Education Level. 1: Undergrad; 2: Graduate; 3: Advanced/Professional
Mortgage Value of house mortgage if any. ($000)
Personal Loan Did this customer accept the personal loan offered in the last campaign?
Securities Account Does the customer have a securities account with the bank?
CD Account Does the customer have a certificate of deposit (CD) account with the bank?
Online Does the customer use internet banking facilities?
CreditCard Does the customer use a credit card issued by UniversalBank?
10 10
The Exact Bayesian classifier
• Assume (for now) that we have only categorical predictors

• Numerical predictors could be binned
• The basic idea is simple. To classify a new record:
1. First find all records that have the same predictor values in the
training set
2. Determine which class is the most prevalent amongst records you found
in step 1
3. Assign the most prevalent class to the new record
• The method can be adjusted to predict based on a cut-off value
11 11
Example: Exact Bayesian classifier
• Let’s simplify our motivating example
• Assume we only have two predictors:
• CreditCard (0/1) and Online (0/1)
• We are trying to classify records to “will accept”/ “will not
accept”
• (Y = personal loan acceptance (0/1))
• How would you classify a record for customers with
CreditCard=1, Online=1?
Count of Personal Loan Online
CreditCard Personal Loan 0 1 Grand Total
0 794 1123 1917
0
1 70 130 200
0 Total 864 1253 2117
0 323 481 804
1
1 31 48 79
1 Total 354 529 883
Grand Total 1218 1782 3000
12 12
Example: Exact Bayesian classifier
1. First find all records that have the same predictor values in the
training set
0 794 1123 1917
0
1 70 130 200
0 Total 864 1253 2117
0 323 481 804
1
1 31 48 79
1 Total 354 529 883
Grand Total 1218 1782 3000
2. Determine which class is the most prevalent amongst records you
found in step 1: ____
3. Assign most prevalent class to the new
0 record: ____
0
13
Assigning Probabilities
• It may be desirable to tweak the
0
method so that it answers
794 0 the 1917
1123
question: What is an estimated probability
0 Total
of belonging
70 to the
864
1 130 class
1253
200
2117
of interest? 1
0 323 481 804
1
• Allows analysis of misclassification costs,
1 Total
ROC curves etc. to31
identify48
the 79
354 529 883
appropriate model Grand Total 1218 1782 3000
• Exact Bayes Model update:

• Instead of assigning the most prevalent class, you assign the probability of
class = 1
• In our example, a new customer with CreditCard=1, and Online=1 would be
assigned p=____________
48/529=0.0907
Proportion of records with CreditCard=1 and

Online = 1 that have Personal Loan = 1
14
Bayes Rule
• We have a record with CC = 1 and O = 1.

• What is the probability:
P(PL = 1| CC = 1, O = 1)?
• From Bayes Rule:
P(PL = 1| CC = 1, O = 1) = P(CC = 1, O = 1|PL = 1) P(PL=1)
P(CC=1,O=1)
= _______________P(CC = 1, O = 1|PL = 1) P(PL=1)_____________
P(CC = 1, O = 1|PL = 1) P(PL=1) + P(CC = 1, O = 1|PL = 0) P(PL=0)
The denominator is the proportion of records with CC=1 and O=1. The
numerator is the proportion of records with CC=1 and O=1 and PL=1.
15
Practical difficulty with Exact Bayes
• We need to have ALL the combinations of predictor categories

• CC=1, Online=1
• CC=1, Online=0
• CC=0, Online=1
• CC=0, Online=0
• Easy in small examples with 1 and 2 predictors.
• As the number of predictors increases, it is unlikely to have enough
(even any) data for all combinations
16
Example with (only) 3 predictors:
CC, Online, CD account
Count of Personal Loan CD Account Online
0 0 1 1 Grand
0 1 Total 0 1 Total Total
CreditCard Personal Loan
0 794 1116 1910 7 7 1917
0
1 68 103 171 2 27 29 200
0 Total 862 1219 2081 2 34 36 2117
0 320 402 722 3 79 82 804
1
1 28 28 3 48 51 79
1 Total 348 402 750 6 127 133 883
Grand Total 1210 1621 2831 8 161 169 3000
CD account=0, Online=1, CreditCard=1
17
Solution: The Naïve Bayes Method
• With the naïve Bayes method, we no longer restrict the probability

calculation to those records that exactly match the record to be
classified
• Instead we use the entire dataset
• How is this done? Answer: A probability trick!
• Use Bayes’ rule
• Then make a simplifying assumption
• And get a powerful classifier!
• First lets review the algorithm
• Then dive into how it is derived
18
The Naïve Bayes Algorithm
• Goal: To classify a new record with values X1=x1,…,Xp=x Proportion of the Y=1
p as one of k classes
cases that have X = x
1. For class 1, find the individual probabilities that each predictor value ini thei
record to be classified (x1, . . . , xp) occurs in class 1
• In other words: Calculate P(Xi=xi|Y=1)
2. Multiply these probabilities times each other, then times the proportion of
records belonging to class 1
• If p1 is the proportion of records belonging to class 1
• Then multiply P(X1=x1|Y=1)*P(X2=x2|Y=1) *… * P(Xp=xp|Y=1)*p1
• Lets call this product PC1
3. Repeat steps 1 and 2 for all the classes
4. Estimate a probability for class i by taking the value calculated in step 2 for class
i and dividing it by the sum of such values for all classes
• P(new record belongs to class 1) = PC1/∑i=1..k PCi
5. Assign the record to the class with the highest probability value for this set of
predictor values.
19
The Naïve Bayes Algorithm
• Goal: To classify a new record with values X1=x1,…,Xp=xp as one of k classes
1. For class 1, find the individual probabilities that each predictor value in
the record to be classified (x1, . . . , xp) occurs in class 1
• In other words: Calculate P(Xi=xi|Y=1)
2. Multiply these probabilities times each other, then times the proportion
of records belonging to class 1
• If p1 is the proportion of records belonging to class 1
• Then multiply P(X1=x1|Y=1)*P(X2=x2|Y=1) *… * P(Xp=xp|Y=1)*p1
• Lets call this product PC1
3. Repeat steps 1 and 2 for all the classes
4. Estimate a probability for class i by taking the value calculated in step 2
for class i and dividing it by the sum of such values for all classes
• P(new record belongs to class 1) = PC1/∑i=1..k PCi
5. Assign the record to the class with the highest probability value for this
set of predictor values.
20

Naive Bayes Part 1

Uploaded by

Copyright:

Available Formats

You might also like

Naive Bayes Part 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Naive Bayes Part 1

Uploaded by

Copyright:

Available Formats

Classification Methods:

• A factory produces widgets on three machines: A, B, and C

• Let P(A) be the probability that a widget is made on

𝑃 ( 𝐷 )= 𝑃 ( 𝐷| 𝐴) 𝑃 ( 𝐴 ) +𝑃 ( 𝐷|𝐵 ) 𝑃 ( 𝐵 )+ 𝑃 ( 𝐷|𝐶 ) 𝑃(𝐶)

• Bayes theorem tells us how to update beliefs based

• In the training set the proportion of Class = 1 (Class = 0) is the prior

Bayes Rule can be applied

We need P(X|Class = 1) and P(X|Class = 0)

Proportion of Class = 1 (Class = 0) records with features X

• Here the pink-blue boundary delineates the Bayes decision boundary

boundary from sample

based on Bayes theorem with an

assumption of independence among

Here independence is violated and

features are correlated.

• As part of customer acquisition efforts, Universal bank wants to run a

• Assume (for now) that we have only categorical predictors

• Exact Bayes Model update:

Proportion of records with CreditCard=1 and

• We have a record with CC = 1 and O = 1.

• We need to have ALL the combinations of predictor categories

CD account=0, Online=1, CreditCard=1

• With the naïve Bayes method, we no longer restrict the probability

You might also like