Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

CSE 473

Pattern Recognition

Instructor:
Dr. Md. Monirul Islam
Bayesian Classifier
and its Variants

2
Classification Example 1
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– one of every 50,000 persons has meningitis
– one of every 20 persons has stiff neck

3
Classification Example 1
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– one of every 50,000 persons has meningitis
– one of every 20 persons has stiff neck

• If a patient has stiff neck, does he/she has


meningitis?

4
a l a l s
u
Classification Example 2 r ic r ic o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

5
a l a l s
u
Classification Example 2 r ic r ic o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

• A married person with income 100K did not refund


the loan previously
6
a l a l s
c c u
Classification Example 2 r i r i o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

• A married person with income 100K did not refund


the loan previously
• Can we trust him? 7
Classification Example 3
• The sea bass/salmon example

– We know the previous counts of salmon/sea bass


– Can we predict which fish is coming on the
conveyor?

8
Bayes Classifier
• A probabilistic framework for solving classification
problems

• Bayes theorem:

P( A, C )  P( A) P(C | A)  P( A | C ) P (C )

9
Bayes Classifier
• A probabilistic framework for solving classification
problems

• Conditional Probabilities:

P( A, C )
P (C | A) 
P ( A)
P( A, C )
P( A | C ) 
P (C )

10
Example 1
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20

• If a patient has stiff neck, does he/she has meningitis?

P ( S | M ) P ( M ) 0.5 1 / 50000
P( M | S )    0.0002
P( S ) 1 / 20

11
Example 3
• The sea bass/salmon example

– We know the previous counts of salmon/sea bass


• amount of catches
• class or state of nature (Salmon/Sea bass)
state of nature is a random variable

12
Example 3
• The sea bass/salmon example

• if the catch of salmon and sea bass is equi-probable

– P(1) = P(2) (uniform priors)

– P(1) + P( 2) = 1 (exclusivity and exhaustivity)

13
Example 3
• Decision rule with only the prior information

– Decide 1 if P(1) > P(2) otherwise decide 2

14
Example 3
• Decision rule with only the prior information

– Decide 1 if P(1) > P(2) otherwise decide 2

– Misclassify many fishes

15
Example 3
• Use of the class –conditional information
– Use lightness

• P(x | 1) and P(x | 2) describe lightness in sea bass


and salmon

16
Example 3

Lightness
17
Example 3
• Given the lightness evidence x, calculate Posterior
from Likelihood and evidence

p( x |  j ) P( j )
– P( j | x) 
p( x)

– Posterior = (Likelihood. Prior) / Evidence

18
Example 3
• Given the lightness evidence x, calculate Posterior
from Likelihood and evidence

p( x |  j ) P( j )
– P( j | x) 
p( x)

where in case of two categories


j2
P ( x )   P ( x |  j )P (  j )
j 1

19
Example 3

Posterior function

20
Example 3
Decision rules with the posterior probabilities

x is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1


if P(1 | x) < P(2 | x) True state of nature = 2

21
Example 3
Decision rules with the posterior probabilities

x is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1


if P(1 | x) < P(2 | x) True state of nature = 2

Therefore:
whenever we observe a particular x, the probability of
error is :
P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1

22
• Minimizing the probability of error

• Decide 1 if P(1 | x) > P(2 | x);


otherwise decide 2

Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
(Bayes decision)

23
Bayesian Decision Theory – Continuous
Features

• Generalization of the preceding ideas

– Use of more than one feature


– Use more than two states of nature
– Allowing actions and not only decision on the state
of nature
– Introduce a loss of function which is more general
than the probability of error

24
Classification to Minimize Loss
Let {1, 2,…, m} be the set of m states of nature
(or “categories or classes”)

Let {1, 2,…, n} be the set of possible actions

25
Classification to Minimize Loss
Let {1, 2,…, m} be the set of m states of nature
(or “categories or classes”)

Let {1, 2,…, n} be the set of possible actions

Let (i | j) be the loss incurred for taking

action i when the true state of nature (class) is j

26
Classification to Minimize Loss

The risk to take decision i, for observation x, is


j m
R( i | x)    ( i |  j ) P( j | x)
j 1

27
Classification to Minimize Loss

The risk to take decision i, for observation x, is


j m
R( i | x)    ( i |  j ) P( j | x)
j 1

Overall risk
R = Sum of all R(i | x) for i = 1,…,n

28
Classification to Minimize Loss

The risk to take decision i, for observation x, is


j m
R( i | x)    ( i |  j ) P( j | x)
j 1

Overall risk
R = Sum of all R(i | x) for i = 1,…,n

Minimizing R Minimizing R(i | x) for i = 1,…, n

29
Classification to Minimize Loss
• Two-category classification
1 : deciding 1
2 : deciding 2

30
Classification to Minimize Loss
• Two-category classification
1 : deciding 1
2 : deciding 2
ij = (i | j)
loss incurred for deciding i when the true state of nature is j

31
Classification to Minimize Loss
• Two-category classification
1 : deciding 1
2 : deciding 2
ij = (i | j)
loss incurred for deciding i when the true state of nature is j

Conditional risk:

R(1 | x) = 11P(1 | x) + 12P(2 | x)


R(2 | x) = 21P(1 | x) + 22P(2 | x)

32
Classification to Minimize Loss
Our rule is the following:
if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

33
Classification to Minimize Loss
Our rule is the following:
if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

Now use these formula:


R(1 | x) = 11P(1 | x) + 12P(2 | x)
R(2 | x) = 21P(1 | x) + 22P(2 | x)

34
Classification to Minimize Loss
Our rule is the following:
if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

This results in the equivalent rule :

decide 1 if:
(21- 11) P(x | 1) P(1) >
(12- 22) P(x | 2) P(2)

and decide 2 otherwise


35
Classification to Minimize Loss
The preceding rule
(21- 11) P(x | 1) P(1) >
(12- 22) P(x | 2) P(2)

is equivalent to the following rule:

P ( x |  1 ) 12   22 P (  2 )
if  .
P ( x |  2 )  21  11 P (  1 )

Then take action 1 (decide 1)


Otherwise take action 2 (decide 2)
36
Classification to Minimize Loss

The preceding rule


(21- 11) P(x | 1) P(1) >
(12- 22) P(x | 2) P(2)

is equivalent to the following rule:

P ( x |  1 ) 12   22 P (  2 )
if  .
P ( x |  2 )  21  11 P (  1 )
Likelihood
ratio
Then take action 1 (decide 1)
Otherwise take action 2 (decide 2)
37
Classification to Minimize Loss
Optimal decision property

“If the likelihood ratio exceeds a threshold


value independent of the input pattern x, we
can take optimal actions”

P ( x | 1 ) 12  22 P ( 2 )
 .
P( x |  2 ) 21  11 P(1 )

38
Minimum Error Rate Classification
Assume the loss function for two class case:

The Risk is now:

39
Minimum Error Rate Classification
Assume the loss function for two class case:

The Risk is now:

Minimizing R Minimizing R(i | x) Maximizing P(ωi | x)


Minimum Error Rate Classification

For loss minimization, our rule was:


if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken
Minimum Error Rate Classification

For loss minimization, our rule was:


if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

which is equivalent to:


Classifiers, Discriminant Functions
and Decision Surfaces
Classifiers, Discriminant Functions
and Decision Surfaces
• Remember the Bayesian classifier:
Classifiers, Discriminant Functions
and Decision Surfaces
• Remember the Bayesian classifier:

gi(x) > gj(x)


Classifiers, Discriminant Functions
and Decision Surfaces
• Remember the Bayesian classifier:

• This is equivalent to:


decide x to class i
if gi(x) > gj(x) j  i
where, gi(x) =P(ωi|x), i = 1,…, c
Classifiers, Discriminant Functions
and Decision Surfaces

Discriminant
Functions

d- dimensional
Feature vector
47
Classifiers, Discriminant Functions
and Decision Surfaces

• Based on minimum risk classification


– Let gi(x) = - R(i | x)
(max. discriminant corresponds to min. risk!)

48
Classifiers, Discriminant Functions
and Decision Surfaces

• For the minimum error rate, we take


gi(x) = P(i | x)

(max. discrimination corresponds to max.


posterior!)

49
Classifiers, Discriminant Functions
and Decision Surfaces

• For the minimum error rate, we take


gi(x) = P(i | x)

some alternate representations but giving similar


results
gi(x)  P(x | i) P(i)
gi(x)  ln P(x | i) + ln P(i)
(ln: natural logarithm!)
50
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj

Ri

Rj
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj
Decision Decide ωi if P(ωi |x) > P(ωj |x)
Rule

Ri

Rj
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj
Decision Decide ωi if P(ωi |x) - P(ωj |x) > 0
Rule

Ri

Rj
Decision Surface
• Let, Ri and Rj : two regions identifying classes ωi and
ωj
Decision Decide ωi if g(x) ≡ P(ωi |x) - P(ωj |x) > 0
Rule

Ri

Rj
Decision Surface
• Let, Ri , R j : two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0


Decision Surface
• If Ri , R j: two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0

g ( x)  0

decision
surface
56
Decision Surface
• If Ri , R j: two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0

+
- g ( x)  0

decision
surface
57
Decision Surface
• If Ri , R j: two regions identifying classes ωi and ωj

g(x)  P(i x)  P(j x)  0

Ri : P (i x)  P ( j x)
+
- g ( x)  0
R j : P ( j x)  P (i x)
decision
surface
58
Decision Surface in Multi-categories

• Feature space is divided into c decision regions


if gi(x) > gj(x) j  i then x is in Ri
(Ri means: assign x to i)

59
Decision Surface in Two-categories

• The two-category case


– A classifier is a “dichotomizer” that has two
discriminant functions g1 and g2

Let g(x)  g1(x) – g2(x)

Decide 1 if g(x) > 0 ; Otherwise decide 2

60
– The computation of g(x)

g ( x)  P(1 | x)  P (2 | x)
 P( x | 1 ) P(1 )  P( x | 2 ) P(2 )
P ( x | 1 ) P (1 )
 ln  ln
P ( x | 2 ) P(2 )

61
62
The Normal Density
• Density which is analytically tractable
• Continuous density
• A lot of processes are asymptotically Gaussian
– Handwritten characters, speech sounds, and many more
– Any prototype corrupted by random process

63
The Normal Density
• Univariate density
1  1 x 
2

P( x )  exp     ,
2   2    
Where:
 = mean (or expected value) of x
2 = expected squared deviation or variance

64
65
66
• Multivariate density

– Multivariate normal density in d dimensions is:

1  1 
P( x )  1/ 2
exp   ( x   )  ( x   )
t 1

( 2 ) d/2
  2 

where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
 = (1, 2, …, d)t mean vector
 = d*d covariance matrix
|| and -1 are determinant and inverse respectively

67
• Multivariate density

– Multivariate normal density in d dimensions is:

1  1 
P( x )  1/ 2
exp   ( x   )  ( x   )
t 1

( 2 ) d/2
  2 
where:

68
2D Gaussian Example - 1

70
2D Gaussian Example - 2

  
2
1
2
2

71
2D Gaussian Example - 3

  
2
1
2
2

72
Computer Exercise

• Use matlab to generate Gaussian plots


• Try with different Σ and б

73
a l a l s
c c u
Classification Example 2 r i r i o
o o u
g g t in s
te te n las
ca ca c o c
Tid Refund Marital Taxable
Status Income Evade

1 Yes Single 125K No


2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

• A married person with income 120K did not refund


the loan previously
• Can we trust him? 74

You might also like