Lecture - Naive Bayesian

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

Classifiers

Where in the catalog should I place this product listing?


Is this email spam?
Is this politician Democrat/Republican/Green?

• Classification: assign labels to objects.


• Usually supervised: training set of pre-classified examples.
• Our examples:
• Naïve Bayesian
• Decision Trees
• (and Logistic Regression)

Module 4: Analytics Theory/Methods 1


Naïve Bayesian Classifier
• Determine the most probable class label for each object
• Based on the observed object attributes
• Naïvely assumed to be conditionally independent of each other
• Example:
• Based on the objects attributes {shape, color, weight}
• A given object that is {spherical, yellow, < 60 grams},
may be classified (labeled) as a tennis ball
• Class label probabilities are determined using Bayes’ Law
• Input variables are discrete
• Output:
• Probability score – proportional to the true probability
• Class label – based on the highest probability score

Module 4: Analytics Theory/Methods 2


Naïve Bayesian Classifier - Use Cases
• Preferred method for many text classification problems.
• Try this first; if it doesn't work, try something more complicated
• Use cases
• Spam filtering, other text classification tasks
• Fraud detection

Module 4: Analytics Theory/Methods 3


Building a Training Dataset to Predict Good or Bad Credit

• Predict the credit behavior of


a credit card applicant from
applicant's attributes:
• Personal status
• Job type
• Housing type
• Savings amount
• These are all categorical
variables and are better suited
to Naïve Bayesian Classifier
than to logistic regression.

Module 4: Analytics Theory/Methods 4


Apply the Naïve Assumption and Remove a Constant

• For observed attributes A = (a1, a2, … am), we want to compute


P(a1 , a2 ,..., am | Ci ) P (Ci )
P(Ci | A)  i  1, 2,..., n
P(a1 , a2 ,..., am )

and assign the classifier, Ci , with the largest P(Ci|A)

• Two simplifications to the calculations


 Apply naïve assumption - each aj is conditionally independent of
each other, then m
P (a1 , a2 ,..., am | Ci )  P(a1 | Ci ) P(a2 | Ci )  P (am | Ci )   P(a j | Ci )
j 1

 Denominator P(a1,a2,…am) is a constant and can be ignored


Module 4: Analytics Theory/Methods 6
Building a Naïve Bayesian Classifier
• Applying the two simplifications
 m 
P (Ci | a1 , a2 ,..., am )    P (a j | Ci )  P (Ci ) i  1, 2,..., n
 j 1 
• To build a Naïve Bayesian Classifier, collect the following statistics from the
training data:
• P(Ci) for all the class labels.
• P(aj| Ci) for all possible aj and Ci
• Assign the classifier label, Ci, that maximizes the value of

 m 
  P (a j | Ci )  P (Ci ) i  1, 2,..., n
 
 j 1 

Module 4: Analytics Theory/Methods 7


Naïve Bayesian Classifiers for the Credit
Example
• Class labels: {good, bad}
• P(good) = 0.7
• P(bad) = 0.3
• Conditional Probabilities
• P(own|bad) = 0.62
• P(own|good) = 0.75
• P(rent|bad) = 0.23
• P(rent|good) = 0.14
• … and so on

Module 4: Analytics Theory/Methods 8


Naïve Bayesian Classifier for a Particular
Applicant a C P(a | C ) j i j i

female single good 0.28


• Given applicant attributes of female single bad 0.36
A= {female single, own good 0.75
owns home,
own bad 0.62
self-employed,
savings > $1000} self emp good 0.14
self emp bad 0.17
savings>1K good 0.06
• Since P(good|A) > (bad|A), savings>1K bad 0.02
assign the applicant the label
"good" credit
P(good|A) ~ (0.28*0.75*0.14*0.06)*0.7 = 0.0012

P(bad|A) ~ (0.36*0.62*0.17*0.02)*0.3 = 0.0002

Module 4: Analytics Theory/Methods 9


Logistic Regression Model
The classification problem is just like the regression
problem, except that the values y we now want to predict
take on only a small number of discrete values.

Some Example of Classification problem


• Email : Spam / Not spam
• Tumor: Malignant/ Benign
0.5
 Binary Logistic Regression
• We have a set of feature vectors X with corresponding binary
outputs
X  {x 1 ,x 2 ,....,x n } T
n y i  {0,1}
Y  {y 1 , y 2 ,...., } T
, w he re
y to model p(y|x)
• We want

p( yi  1 | xi , )  j xij
By definition p( y i ) xi
 1 | xi , . We want to transform the
{0,1}the range restrictions,
probability to remove j as xiθ can take any
real value.
 Odds
p : probability of an event
occurring
1 – p : probability of the event not
occurring The odds forp ievent i are then
odd s i 
defined as 1 pi
Taking the log of the odds removes the range restrictions.

This way we map the probabilities from the [0; 1] range to the
entire number line (real value).
 Odds
p : probability of an event
occurring
1 – p : probability of the event not
occurring The odds forp ievent i are then
odd s i 
defined as 1 pi
Taking the log of the odds removes the range restrictions.

This way we map the probabilities from the [0; 1] range to the
entire number line (real value).
 pi 
log   xi 
 1 p i
pi
xi
1 pi  e

e xi 1
pi xi 
  xi 
 1 e 1
e Standard logistic sigmoid function

g( )
1
p i  g ( t
x)   t
1  e
x

You might also like