FisherDiscriminant Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Fisher’s Linear Discriminant

■ Take a different view on linear classification:


• Find a linear projection of our data and classify the projected
values:

decision boundary

linear projection

• This is really the same thing as a linear discriminant function.


- Projection: y=w x T

- Checking against a threshold: w x ≥ −w0


T
or w x + w0 ≥ 0
T

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS | 8


Fisher’s Linear Discriminant
■ What is a good projection w ?
• Idea: Maximize the “distance” between the two classes to allow
for a good separation.
■ First (and not final) attempt: Maximize the distance
between the class means:
1 ! 1 !
m1 = xn m2 = xn
|C1 | |C2 |
n∈C1 n∈C2

• Projection of the means:


m1 = w T m 1 m2 = w T m 2
• Maximize squared distance between means:
(m1 − m2 ) → max 2

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS | 9


Fisher’s Linear Discriminant
■ Maximize squared distance between means:
w∗ = arg max(wT m1 − wT m2 )2
w

• Obvious problem: Grows unboundedly with the norm of w

• (Obvious) solution: Fix the norm of w:


max (wT m1 − wT m2 )2
w
s.t. ||w|| = 1 2

• Constrained optimization problem!

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |10


Fisher’s Linear Discriminant
■ Constrained optimization problem:
max (w m1 − w m2 )
T T 2
w
s.t. ||w|| = 1 2

■ Necessary conditions:
∇x f (x) + λ∇x g(x) = 0 ⇔ 2(wT m1 − wT m2 )(m1 − m2 ) + 2λw = 0
g(x) = 0 ⇔ ||w||2 − 1 = 0

m1 − m2
■ It follows that: w=
||m1 − m2 ||

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |11


Fisher’s Linear Discriminant
■ Here is what we get:
4

• Obvious problem:
2
Large class overlap!

−2

−2 2 6

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |12


Fisher’s Linear Discriminant
■ Here is what we
4
could get:
2
• Much better separation
between classes.
0
• How do we get this?

−2

−2 2 6
■ Idea:
• Separate the means as far as possible while minimizing the
variance of each class.
© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |13
Fisher’s Linear Discriminant
■ Second (and final) attempt:
• Define within-class variances:
! !
2
s1 = (w xn − m1 )
T 2 2
s2 = (w xn − m2 )
T 2

n∈C1 n∈C2

• with m1 = w m 1 T
m2 = w m 2 T

■ Fisher criterion:
(m1 − m2 )2
J(w) = → max
s1 + s2
2 2

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |14


Fisher’s Linear Discriminant
■ Fisher criterion:
(m1 − m2 )2
J(w) = → max
s21 + s22

■ Rewrite numerator:
(m1 − m2 ) 2
= (w m1 − w m2 )
T T 2
! T "2
= w (m1 − m2 )
= wT (m1 − m2 )(m1 − m2 )T w
# $% &
=:SB

between-class covariance

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |15


Fisher’s Linear Discriminant
■ Fisher criterion:
(m1 − m2 )2
J(w) = → max
s21 + s22
■ Rewrite denominator:
! !
s21 + s22 = (w xn − m1 ) +
T 2
(wT xn − m2 )2
n∈C1 n∈C2
! " #2 ! " #2
= w (xn − m1 ) +
T
w (xn − m2 )
T

n∈C1 n∈C2
! !
= w (xn − m1 )(xn − m1 ) w +
T T
wT (xn − m2 )(xn − m2 )T w
n∈C1 n∈C2
$ %
! !
= w T
(xn − m1 )(xn − m1 ) + T
(xn − m2 )(xn − m2 )T w
n∈C1 n∈C2
& '( )
=:SW

within-class covariance
© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |16
Fisher’s Linear Discriminant
■ Fisher criterion:
(m1 − m2 )2 wT SB w
J(w) = = → max
s21 + s22 wT SW w

■ Necessary condition for a maximum:


(w SB w)SW w = (w SW w)SB w
T T

• We have that:
SB w = (m1 − m2 )(m1 − m2 )T w ⇒ SB w # (m1 − m2 )
⇒ SW w " (m1 − m2 )
Fisher’s linear −1
discriminant
⇒ w" SW (m1 − m2 )

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |17


Fisher’s Linear Discriminant
−1
w∝ SW (m1 − m2 )
■ The Fisher linear discriminant only gives us a projection.
• We still need to find the threshold.
• E.g., use Bayes classifier with Gaussian class-conditionals.

■ Bayes optimality:
• Fisher’s linear discriminant is Bayes optimal, if the class-conditional
distributions have equal, diagonal covariance.

■ Essentially equivalent:
Linear discriminant analysis (LDA)

© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |18

You might also like