Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

Bayes minimum error/risk classifier

Assume we have a three-class (ω1 , ω2 , ω3 ) classification problem with two features (x1 , x2 )
forming a feature vector x. The feature vectors are normally distributed with the same
covariance matrix
" #
1.6 0.4
Σ= .
0.4 2.8

The mean vectors for each class are respectively, µ1 = [0.2 0.3]t , µ2 = [2.4 1.5]t and µ3 =
[−1.5 2.0]t . It is also given that the classes are equiprobable , so that p(ω1 ) = p(ω2 ) = p(ω3 ).
The distribution of the feature vectors in each class can be written as:
 
1 1 t −1
p(x|ωi ) =  1
 exp − (x − µi ) Σ (x − µi ) ; i = 1, 2, 3.
2π|Σ| 2 2

1. Classify the feature vector x = [1.6 1.5]t according to the Bayes minimum error prob-
ability classifier.

2. Classify the feature vector x = [1.6 1.5]t according to the Bayes minimum risk classifier
with loss matrix:
 
0 1 1
Λ = 0.5 0 1 .
 

0 1 0

Note the following from the statement of the problem. The distribution of the features in
each class is given as

 
1 1 t −1
p(x|ωi ) =  1
 exp − (x − µi ) Σ (x − µi ) ; i = 1, 2, 3.
2π|Σ| 2 2

This implies that the features are normally distributed. We need to figure out the de-
terminant (|Σ|) and"inverse (Σ −1 ) of the covariance matrix (σ) of data. Recall that for a
#
g11 g12
(2 × 2)-matrix G = , the determinant is simply |G| = (g11 × g22 ) − (g12 × g21 ) and
g21 g22
" #
−1 1 g22 −g12
the inverse is given by G = .
|G| −g21 g11
Furthermore recall that the formula for the minimum error classifier in terms of the class-
conditional density can be written as the decision rule (C classes), assign feature x to ωj
if,

Philip O. Ogúnbo.na 1 EIS: UOW


Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

p(x|ωj )p(ωj ) > p(x|ωk )p(ωk ) k = 1, . . . , C k 6= .j

Hence we must evaluate px|ωj for the three classes. For example, for class ω1 , p(x|ω1 ) is
computed as follows:

 
1 1 t −1
p(x|ω1 ) =  1
 exp − (x − µ1 ) Σ (x − µ1 ) ;
2π|Σ| 2 2

" #
1.6 0.4
where Σ = and µ1 = [0.2 0.3]t and x = [1.6 1.5]t .
0.4 2.8
" #
1 2.8 −0.4
We know that |Σ| = (1.6 × 2.8) − (0.4 × 0.4) = 4.32 and Σ−1 =
4.32 −0.4 1.6
We can also compute x − µ1 as [(1.6 − 0.2) (1.5 − 0.3)] = [1.4 1.2]t .
t

The quantity (x − µ)t Σ−1 (x − µ) can be computed as

" #" #
t −1 1 2.8 −0.4 1.4
(x − µ) Σ (x − µ) = [1.4 1.2]
4.32 −0.4 1.6 1.2
= 1.492

We can now substitute values into

 
1 1 t −1
p([1.6 1.5]t |ω1 ) =  1
 exp − (x − µ1 ) Σ (x − µ 1 )
2π|Σ| 2 2
 
1 1
= √  exp − (1.492)
2π 4.32 2
= 0.036

1
The value of p(ω1 ) is 3 since the three classes are equiprobbale. With this we have

1
p([1.6 1.5]t |ω1 )p(ω1 ) = 0.036 × = 0.012
3

Please convince yourself that you understand this working and that you are able to carry
it out.
Next, we need to compute p([1.6 1.5]t |ω2 )p(ω2 ) and p([1.6 1.5]t |ω3 )p(ω3 ). Our classifi-
cation is the class that gives largest value of the three.
I encourage you to complete this numerical computation.

Philip O. Ogúnbo.na 2 EIS: UOW


Worked Examples - Please read carefully CSCI433/CSCI933: Machine Learning

For the second question, we consider minimum risk classifier. You will need to recall the
formula for Bayes minimum risk classifier and perform workings similar to the one above.

Philip O. Ogúnbo.na 3 EIS: UOW

You might also like