Cours FLD

Discriminant Functions
Master MLSD - Université Paris Cité
Lazhar.labiod@parisdescartes.fr
1 Université Paris Descartes

▪ Classifier can be viewed as network which computes
m discriminant functions and selects category
corresponding to the largest discriminant
select class
giving maximim
discriminant
functions
features
▪ gi(x) can be replaced with any monotonically

increasing function, the results will be unchanged
20
▪ The minimum error-rate classification is achieved by
the discriminant function
gi(x) = P(ci |x)=P(x|ci)P(ci)/P(x)
▪ Since the observation x is independent of the class,
the equivalent discriminant function is
gi(x) = P(x|ci)P(ci)
▪ For normal density, convinient to take logarithms.
Since logarithm is a monotonically increasing
function, the equivalent discriminant function is
gi(x) = ln P(x|ci)+ ln P(ci)
21
Discriminant Functions for the Normal Density
▪ Suppose we for class ci its class conditional density
p(x|ci) is N(i,i)
1 1
p(x | ci ) = exp {− (x − )i t −1
(x − )}
(2)
i i
i
1/ 2
d/ 2 2
▪ Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)
▪ Plug in p(x|ci) and P(ci) get

constant for all i
1 d 1
g i (x )= − (x −  i )t  i−1(x − i) − ln2− ln  i + lnP(c i )
2 2 2
1 1
g i (x )= − (x −  i )  i (x − )i − ln  i + lnP(ci )
t −1
2 2
22
Case i = 2I
▪ That is
▪ In this case, features x1, x2. ,…, xd are independent

with different means and equal variances 2
23
Case i = 2I
▪ Discriminant function
▪ Det(i)= 2d and
▪ Can simplify discriminant function
constant for all i
24
Case i = 2I Geometric Interpretation
If ln P( ci ) = ln P( c j ), then If ln P( ci )  ln P( c j ), then
g i (x ) = − 2 x −  i + ln P( c i )
1
g i (x ) = − x −  i
2 2
2
decision region decision region decision region

for c1 for c3 for c1
1 3 1 3
x in c3 decision regio
2
2 for c3
decision region
decision region
for c2
for c2
voronoi diagram: points in each
cell are closer to the mean in that cell
than to any other mean
25
Case i = 2I
constant
for all classes
discriminant function is linear
26
Case i = 2I
constant in x
gi (x ) = w x + w i0
t
i
linear in x:
d
w x=
t
i Σw x i i
i =1
▪ Thus discriminant function is linear,

▪ Therefore the decision boundaries
gi(x)=gj(x) are linear
▪ lines if x has dimension 2
▪ planes if x has dimension 3
▪ hyper-planes if x has dimension larger than 3
27
Case i = 2I: Example
▪ 3 classes, each 2-dimensional Gaussian with
P (c1 ) = P (c2 ) = P (c3 ) =

1 1
4 and 2
▪ Discriminant function is
▪ Plug in parameters for each class
28
▪ Need to find out when gi(x) < gj(x) for i,j=1,2,3
▪ Can be done by solving gi(x) = gj(x) for i,j=1,2,3
▪ Let’s take g1(x) = g2(x) first
▪ Simplifying,
line equation
29
▪ Next solve g2(x) = g3(x)
▪ Almost finally solve g1(x) = g3(x)
▪ And finally solve g1(x) = g2(x) = g3(x)
30
▪ Priors P (c1 ) = P (c2 ) =
1
and P (c3 ) = 1
4 2
c3
lines connecting
c2 means
are perpendicular to
decision boundaries
c1
31
Case i = 
▪ Covariance matrices are equal but arbitrary
▪ In this case, features x1, x2. ,…, xd are not

necessarily independent
32
Case i = 
▪ Discriminant function
constant
for all classes
▪ Discriminant function becomes
( x − i )t Σ −1( x − i ) + ln P( c i )
1
gi ( x ) = −
2
squared Mahalanobis Distance
▪ Mahalanobis Distance
▪ If =I, Mahalanobis Distance becomes usual
Eucledian distance
x−y = ( x − y )t( x − y )
2
I −1
33
Eucledian vs. Mahalanobis Distances
x −  = (x − )t (x − ) x− = (x − )t Σ −1(x − )
2 2
−1

eigenvectors of 


points x at equal points x at equal

Eucledian Mahalanobis distance from
distance from  lie on an ellipse:
lie on a circle  stretches cirles to ellipses
34
Case i = Geometric Interpretation
If ln P( ci ) = ln P( c j ), then If ln P( ci )  ln P( c j ), then
gi (x ) = − x − i
1
−1 g i (x ) = − x − i −1 + ln P( ci )
2
decision region decision region
for c2 for c2
2 decision region 2 decision region
for c3 for c3
1 3 1 3
decision region
for c1 decision region
for c1
points in each cell are closer to the
mean in that cell than to any other
mean under Mahalanobis distance
35
Case i = 
▪ Can simplify discriminant function:
▪ Thus in this case discriminant is also linear
36
Case i = : Example
▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3
37
▪ Let’s solve in general first
row vector scalar
38
▪ Now substitute for i,j=1,2

− 2 0x =0
x1 = 0

− 3.14 − 1.4x = −2.41
3.14 x1 + 1.4 x 2 = 2.41

− 5.14 − 1.43x = −2.41
5.14 x1 + 1.43 x 2 = 2.41
39
▪ Priors P (c1 ) = P (c2 ) =
1
and P (c3 ) = 1
4 2
c2
c1 lines connecting
means
are not in general
perpendicular to
decision boundaries
c3
40
General Case i are arbitrary
▪ Covariance matrices for each class are arbitrary
▪ In this case, features x1, x2. ,…, xd are not
necessarily independent
41
▪ From previous discussion,
1 1
g i (x ) = − (x −  i )  i (x − )i − ln  i + lnP(ci )
t −1
2 2
▪ This can’t be simplified, but we can rearrange it:
42
linear in x
constant in x
gi (x ) = xtWx + w t x + w i 0
quadratic in x since
d d d
x tWx = Σ Σ w ij x i x j = Σ w ij x i x j
j =1 i =1 i , j =1
▪ Thus the discriminant function is quadratic

▪ Therefore the decision boundaries are quadratic
(ellipses and parabolloids)
43
General Case i are arbitrary: Example
▪ Priors: P (c1 ) = P (c2 ) = 1 and P (c3 ) = 1

4 2
▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3
▪ Need to solve a bunch of quadratic inequalities of 2

variables
44
General Case i are arbitrary: Example
c2
c1
c3 c1
45
Important Points
▪ The Bayes classifier when classes are normally
distributed is in general quadratic
▪ If covariance matrices are equal and proportional to
identity matrix, the Bayes classifier is linear
▪ If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Eucledian distance classifier
▪ If covariance matrices are equal, the Bayes
classifier is linear
▪ If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Mahalanobis distance classifier
▪ Popular classifiers (Euclidean and Mahalanobis
distance) are optimal only if distribution of data
is appropriate (normal)
46

Cours FLD

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cours FLD

Uploaded by

Copyright:

Available Formats

Discriminant Functions

Master MLSD - Université Paris Cité

1 Université Paris Descartes

▪ gi(x) can be replaced with any monotonically

▪ Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)

▪ Plug in p(x|ci) and P(ci) get

▪ In this case, features x1, x2. ,…, xd are independent

▪ Can simplify discriminant function

constant for all i

decision region decision region decision region

discriminant function is linear

▪ Thus discriminant function is linear,

P (c1 ) = P (c2 ) = P (c3 ) =

▪ Plug in parameters for each class

▪ Almost finally solve g1(x) = g3(x)

▪ And finally solve g1(x) = g2(x) = g3(x)

▪ Covariance matrices are equal but arbitrary

▪ In this case, features x1, x2. ,…, xd are not

points x at equal points x at equal

▪ Thus in this case discriminant is also linear

▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

row vector scalar

▪ Now substitute for i,j=1,2

▪ Now substitute for i,j=2,3

▪ Now substitute for i,j=1,3

▪ This can’t be simplified, but we can rearrange it:

▪ Thus the discriminant function is quadratic

▪ Priors: P (c1 ) = P (c2 ) = 1 and P (c3 ) = 1

▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

▪ Need to solve a bunch of quadratic inequalities of 2

You might also like