Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Discriminant Functions

Master MLSD - Université Paris Cité

Lazhar.labiod@parisdescartes.fr

1 Université Paris Descartes


Discriminant Functions
▪ Classifier can be viewed as network which computes
m discriminant functions and selects category
corresponding to the largest discriminant
select class
giving maximim

discriminant
functions

features

▪ gi(x) can be replaced with any monotonically


increasing function, the results will be unchanged
20
Discriminant Functions
▪ The minimum error-rate classification is achieved by
the discriminant function
gi(x) = P(ci |x)=P(x|ci)P(ci)/P(x)
▪ Since the observation x is independent of the class,
the equivalent discriminant function is
gi(x) = P(x|ci)P(ci)
▪ For normal density, convinient to take logarithms.
Since logarithm is a monotonically increasing
function, the equivalent discriminant function is
gi(x) = ln P(x|ci)+ ln P(ci)

21
Discriminant Functions for the Normal Density
▪ Suppose we for class ci its class conditional density
p(x|ci) is N(i,i)
1 1
p(x | ci ) = exp {− (x − )i t −1
(x − )}
(2)
i i
i
1/ 2
d/ 2 2

▪ Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)

▪ Plug in p(x|ci) and P(ci) get


constant for all i
1 d 1
g i (x )= − (x −  i )t  i−1(x − i) − ln2− ln  i + lnP(c i )
2 2 2

1 1
g i (x )= − (x −  i )  i (x − )i − ln  i + lnP(ci )
t −1

2 2
22
Case i = 2I
▪ That is

▪ In this case, features x1, x2. ,…, xd are independent


with different means and equal variances 2

23
Case i = 2I
▪ Discriminant function

▪ Det(i)= 2d and

▪ Can simplify discriminant function

constant for all i

24
Case i = 2I Geometric Interpretation
If ln P( ci ) = ln P( c j ), then If ln P( ci )  ln P( c j ), then
g i (x ) = − 2 x −  i + ln P( c i )
1
g i (x ) = − x −  i
2 2

2

decision region decision region decision region


for c1 for c3 for c1
1 3 1 3
x in c3 decision regio
2
2 for c3
decision region
decision region
for c2
for c2
voronoi diagram: points in each
cell are closer to the mean in that cell
than to any other mean
25
Case i = 2I

constant
for all classes

discriminant function is linear

26
Case i = 2I

constant in x
gi (x ) = w x + w i0
t
i

linear in x:
d
w x=
t
i Σw x i i
i =1

▪ Thus discriminant function is linear,


▪ Therefore the decision boundaries
gi(x)=gj(x) are linear
▪ lines if x has dimension 2
▪ planes if x has dimension 3
▪ hyper-planes if x has dimension larger than 3
27
Case i = 2I: Example
▪ 3 classes, each 2-dimensional Gaussian with

P (c1 ) = P (c2 ) = P (c3 ) =


1 1
4 and 2

▪ Discriminant function is

▪ Plug in parameters for each class

28
Case i = 2I: Example
▪ Need to find out when gi(x) < gj(x) for i,j=1,2,3
▪ Can be done by solving gi(x) = gj(x) for i,j=1,2,3
▪ Let’s take g1(x) = g2(x) first

▪ Simplifying,

line equation
29
Case i = 2I: Example
▪ Next solve g2(x) = g3(x)

▪ Almost finally solve g1(x) = g3(x)

▪ And finally solve g1(x) = g2(x) = g3(x)

30
Case i = 2I: Example
▪ Priors P (c1 ) = P (c2 ) =
1
and P (c3 ) = 1
4 2

c3

lines connecting
c2 means
are perpendicular to
decision boundaries

c1

31
Case i = 

▪ Covariance matrices are equal but arbitrary

▪ In this case, features x1, x2. ,…, xd are not


necessarily independent

32
Case i = 
▪ Discriminant function

constant
for all classes
▪ Discriminant function becomes
( x − i )t Σ −1( x − i ) + ln P( c i )
1
gi ( x ) = −
2
squared Mahalanobis Distance

▪ Mahalanobis Distance
▪ If =I, Mahalanobis Distance becomes usual
Eucledian distance
x−y = ( x − y )t( x − y )
2
I −1

33
Eucledian vs. Mahalanobis Distances
x −  = (x − )t (x − ) x− = (x − )t Σ −1(x − )
2 2
−1

eigenvectors of 


points x at equal points x at equal


Eucledian Mahalanobis distance from
distance from  lie on an ellipse:
lie on a circle  stretches cirles to ellipses
34
Case i = Geometric Interpretation
If ln P( ci ) = ln P( c j ), then If ln P( ci )  ln P( c j ), then
gi (x ) = − x − i
1
−1 g i (x ) = − x − i −1 + ln P( ci )
2
decision region decision region
for c2 for c2
2 decision region 2 decision region
for c3 for c3

1 3 1 3
decision region
for c1 decision region
for c1
points in each cell are closer to the
mean in that cell than to any other
mean under Mahalanobis distance
35
Case i = 
▪ Can simplify discriminant function:

▪ Thus in this case discriminant is also linear

36
Case i = : Example
▪ 3 classes, each 2-dimensional Gaussian with

▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

37
Case i = : Example
▪ Let’s solve in general first

row vector scalar

38
Case i = : Example

▪ Now substitute for i,j=1,2


− 2 0x =0
x1 = 0

▪ Now substitute for i,j=2,3


− 3.14 − 1.4x = −2.41
3.14 x1 + 1.4 x 2 = 2.41

▪ Now substitute for i,j=1,3


− 5.14 − 1.43x = −2.41
5.14 x1 + 1.43 x 2 = 2.41
39
Case i = : Example
▪ Priors P (c1 ) = P (c2 ) =
1
and P (c3 ) = 1
4 2

c2

c1 lines connecting
means
are not in general
perpendicular to
decision boundaries
c3

40
General Case i are arbitrary
▪ Covariance matrices for each class are arbitrary
▪ In this case, features x1, x2. ,…, xd are not
necessarily independent

41
General Case i are arbitrary
▪ From previous discussion,
1 1
g i (x ) = − (x −  i )  i (x − )i − ln  i + lnP(ci )
t −1

2 2

▪ This can’t be simplified, but we can rearrange it:

42
General Case i are arbitrary
linear in x

constant in x
gi (x ) = xtWx + w t x + w i 0

quadratic in x since
d d d
x tWx = Σ Σ w ij x i x j = Σ w ij x i x j
j =1 i =1 i , j =1

▪ Thus the discriminant function is quadratic


▪ Therefore the decision boundaries are quadratic
(ellipses and parabolloids)

43
General Case i are arbitrary: Example
▪ 3 classes, each 2-dimensional Gaussian with

▪ Priors: P (c1 ) = P (c2 ) = 1 and P (c3 ) = 1


4 2

▪ Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

▪ Need to solve a bunch of quadratic inequalities of 2


variables
44
General Case i are arbitrary: Example

c2
c1

c3 c1

45
Important Points
▪ The Bayes classifier when classes are normally
distributed is in general quadratic
▪ If covariance matrices are equal and proportional to
identity matrix, the Bayes classifier is linear
▪ If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Eucledian distance classifier
▪ If covariance matrices are equal, the Bayes
classifier is linear
▪ If, in addition the priors on classes are equal, the Bayes
classifier is the minimum Mahalanobis distance classifier
▪ Popular classifiers (Euclidean and Mahalanobis
distance) are optimal only if distribution of data
is appropriate (normal)

46

You might also like