Professional Documents
Culture Documents
Bayesian Decision Theory: Intro To
Bayesian Decision Theory: Intro To
2. 1 Introduction
Materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001
with the permission of the authors and the publisher
Credits and Acknowledgments
Materials used in this course were taken from the textbook “Pattern
Classification” by Duda et al., John Wiley & Sons, 2001 with the permission of
the authors and the publisher; and also from
Other material on the web:
⚫ Dr. A. Aydin Atalan, Middle East Technical University, Turkey
⚫ Dr. Djamel Bouchaffra, Oakland University
⚫ Dr. Adam Krzyzak, Concordia University
⚫ Dr. Joseph Picone, Mississippi State University
⚫ Dr. Robi Polikar, Rowan University
⚫ Dr. Stefan A. Robila, University of New Orleans
⚫ Dr. Sargur N. Srihari, State University of New York at Buffalo
⚫ David G. Stork, Stanford University
⚫ Dr. Godfried Toussaint, McGill University
⚫ Dr. Chris Wyatt, Virginia Tech
⚫ Dr. Alan L. Yuille, University of California, Los Angeles
⚫ Dr. Song-Chun Zhu, University of California, Los Angeles
TYPICAL APPLICATIONS
IMAGE PROCESSING EXAMPLE
• Problem Analysis:
▪ set up a camera and take Sensing
some sample images to
extract features
▪ Consider features such as Segmentation
P( x ) 0 P(x) = 1
x X
• Probability mass functions are typically used for discrete
random variables while densities describe continuous
random variables (latter must be integrated).
BAYESIAN DECISION THEORY
BAYES FORMULA
P ( j x ) =
( )
p x j P ( j )
p( x )
where in the case of two categories:
( )
p( x ) = p x j P ( j )
2
j =1
BAYESIAN DECISION THEORY
POSTERIOR PROBABILITIES
Bayes formula:
( )
p x j P ( j )
P ( j x ) =
p( x )
can be expressed in words as:
likelihood prior
posterior =
evidence
By measuring x, we can convert the prior probability,
P(j), into a posterior probability, P(j|x).
Special cases:
if p(x| 1)=p(x| 2): x gives us no useful information
p( x | j )P ( j )
P( j x ) =
p( x )
1 1 0 0
= =
1 1
1 1
0 1
=
0 1
2.3 Minimum Error Rate
Classification
Minimum Error Rate
MINIMUM ERROR RATE
Consider a symmetrical or zero-one loss function:
0 i = j
( i j ) = i , j = 1 ,2 ,..., c
1 i j
The conditional risk is:
c
R( i x ) = R( i j )P ( j x)
j =1
c
= P ( j x)
ji
= 1 − P ( i x)
2
Exercise
Consider a 2-class problem with P(C1) = 2/3, P(C2)=1/3; a scalar feature x
and three possible actions a1, a2, a3 defined as:
a1: choose C1
a1 a2 a3
a2: choose C2
a3: do not classify
Let the loss matrix (ai | Cj) be: C1 0 1 1/4
C2 1 0 1/4
Questions:
1) Which action to decide for a pattern x; 0 x 2
2) What is the proportion of patterns for which action a3 is performed (i.e.,
“do not classify”)
3) Compute the total minimum risk
4) If you decide to take action a1 for all x, then how much the total risk will be
reduced.
Solution:
P(x) = (5-2x)/6
P(C1 | x) = (4-2x)/(5-2x); 0 x 2
P(C2 | x) = 1/ (5-2x)
This leads to conditional risks:
r1(x) = r(a1 | x) = 0.P(C1 | x) + 1. P(C2 | x) = 1/(5-2x)
r2(x) = r(a2 | x) = 1.P(C1 | x) + 0. P(C2 | x) = (4-2x)/(5-2x)
r3(x) = r(a3 | x) = 1/4.P(C1 | x) + 1/4. P(C2 | x) = 1/4
Bayes decision rule assigns to each x the action with the minimum
conditional risk. The conditional risks are sketched in the following
figure and the optimal decision rule is therefore:
r1
r2
r3
0.5 11/6
2 x
Case 1: i = 2I
Features are statistically independent, and all features
have the same variance: Distributions are spherical in d
dimensions.
2 0 0 0
0 2
... 0
i = = 2d
0 ... ... ...
2
0 0 ...
−1
i = (1 / 2 ) I
i = 2 I is independent of i
6
GAUSSIAN CLASSIFIERS
THRESHOLD DECODING
This has a simple geometric interpretation:
2 2 2 P( j )
x − μi − x −μ j = 2 ln
P ( i )
Note how priors shift the boundary away from the more likely mean !!!
6
GAUSSIAN CLASSIFIERS
3-D case
6
GAUSSIAN CLASSIFIERS
Case 2: i =
• Covariance matrices are arbitrary, but equal to each other
for all classes. Features then form hyper-ellipsoidal
clusters of equal size and shape.
T
gi ( x ) = ω i x + i 0
−1 1 T −1
ωi = Σ μi , i 0 = − μ i Σ μi + ln P(i )
2
6
6
6
Case 3: i = arbitrary
The covariance matrices are different for each category
All bets off !In two class case, the decision boundaries form
hyperquadratics.
(Hyperquadrics are: hyperplanes, pairs of hyperplanes,
hyperspheres, hyperellipsoids, hyperparaboloids,
hyperhyperboloids)
6
6
GAUSSIAN CLASSIFIERS
ARBITRARY COVARIANCES