Professional Documents
Culture Documents
FisherDiscriminant Analysis
FisherDiscriminant Analysis
FisherDiscriminant Analysis
decision boundary
linear projection
■ Necessary conditions:
∇x f (x) + λ∇x g(x) = 0 ⇔ 2(wT m1 − wT m2 )(m1 − m2 ) + 2λw = 0
g(x) = 0 ⇔ ||w||2 − 1 = 0
m1 − m2
■ It follows that: w=
||m1 − m2 ||
• Obvious problem:
2
Large class overlap!
−2
−2 2 6
−2
−2 2 6
■ Idea:
• Separate the means as far as possible while minimizing the
variance of each class.
© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |13
Fisher’s Linear Discriminant
■ Second (and final) attempt:
• Define within-class variances:
! !
2
s1 = (w xn − m1 )
T 2 2
s2 = (w xn − m2 )
T 2
n∈C1 n∈C2
• with m1 = w m 1 T
m2 = w m 2 T
■ Fisher criterion:
(m1 − m2 )2
J(w) = → max
s1 + s2
2 2
■ Rewrite numerator:
(m1 − m2 ) 2
= (w m1 − w m2 )
T T 2
! T "2
= w (m1 − m2 )
= wT (m1 − m2 )(m1 − m2 )T w
# $% &
=:SB
between-class covariance
n∈C1 n∈C2
! !
= w (xn − m1 )(xn − m1 ) w +
T T
wT (xn − m2 )(xn − m2 )T w
n∈C1 n∈C2
$ %
! !
= w T
(xn − m1 )(xn − m1 ) + T
(xn − m2 )(xn − m2 )T w
n∈C1 n∈C2
& '( )
=:SW
within-class covariance
© Stefan Roth, 20.05.2009 | Department of Computer Science | GRIS |16
Fisher’s Linear Discriminant
■ Fisher criterion:
(m1 − m2 )2 wT SB w
J(w) = = → max
s21 + s22 wT SW w
• We have that:
SB w = (m1 − m2 )(m1 − m2 )T w ⇒ SB w # (m1 − m2 )
⇒ SW w " (m1 − m2 )
Fisher’s linear −1
discriminant
⇒ w" SW (m1 − m2 )
■ Bayes optimality:
• Fisher’s linear discriminant is Bayes optimal, if the class-conditional
distributions have equal, diagonal covariance.
■ Essentially equivalent:
Linear discriminant analysis (LDA)