Professional Documents
Culture Documents
Module03 LinearDiscriminantAnalysis
Module03 LinearDiscriminantAnalysis
Module03 LinearDiscriminantAnalysis
Linear decision boundaries found by LDA Quadratic decision boundaries using LDA
Bayes Theorem for Classification
Pr 𝑋=𝑥𝑌=𝑘 .Pr(𝑌=𝑘)
• Pr 𝑌 = 𝑘 𝑋 = 𝑥 =
Pr(𝑋=𝑥)
𝜋𝑘 𝑓𝑘 𝑥
• Pr 𝑌 = 𝑘 𝑋 = 𝑥 = σ𝑙 𝜋𝑙 𝑓𝑙 𝑥
• Where 𝑓𝑘 𝑥 = Pr 𝑋 = 𝑥 𝑌 = 𝑘 is the conditional density of 𝑋 in
class 𝑘
and 𝜋𝑘 = Pr(𝑌 = 𝑘) is the prior probability
• We need to know the class posterior Pr 𝑌 = 𝑘 𝑋 = 𝑥 for optimal
classification
Linear Discriminant Analysis for One Predictor
• The observation will be classified for which
𝑝𝑘 𝑥 = Pr 𝑌 = 𝑘 𝑋 = 𝑥 is greatest
1 1 2
• 𝑓𝑘 𝑥 = exp − 2 𝑥 − 𝜇𝑘
𝜎𝑘 2𝜋 2𝜎𝑘
Where 𝜇𝑘 and 𝜎𝑘 are mean and variance parameters of the 𝑘 𝑡ℎ class
• For Linear Discriminant Analysis, it is assumed
𝜎12 = 𝜎22 =. . = 𝜎𝑘2 = 𝜎 2
Linear Discriminant Analysis for One Predictor
𝜋𝑘 𝑓𝑘 𝑥
• 𝑝𝑘 𝑥 = Pr 𝑌 = 𝑘 𝑋 = 𝑥 =σ
𝑙 𝜋𝑙 𝑓𝑙 𝑥
1 1
𝜋𝑘 exp − 2 𝑥−𝜇𝑘 2
𝜎 2𝜋
𝑘 2𝜎 𝑘
= 1 1
σ𝑙 𝜋𝑙 exp − 2 𝑥−𝜇𝑙 2
𝜎𝑙 2𝜋 2𝜎𝑙
• The Bayes classifier will assign an observation at 𝑋 = 𝑥 to the class for
which 𝑝𝑘 (𝑥) is largest
• This is equivalent to assign the observation to a class for which 𝛿𝑘 𝑥
is largest
2
𝑥𝜇𝑘 𝜇𝑘
𝛿𝑘 𝑥 = − + log(𝜋𝑘 )
𝜎2 2𝜎 2
Bayes Decision Boundary
• The Bayes decision boundary correspond to the point where
𝜇12 −𝜇22 𝜇1 +𝜇2
𝑥= =
2 𝜇1 −𝜇2 2
Parameter Estimation
• 𝜇ො𝑘 = 1/𝑛𝑘 σ𝑖:𝑦𝑖 =𝑘 𝑥𝑖
• If no knowledge of prior probability 𝜋𝑘 is available, then it can be
estimated by
𝑛𝑘
𝜋ො 𝑘 =
𝑁
1
• 2
𝜎ො = σ𝐾
𝑘=1 σ𝑖:𝑦𝑖 =𝑘 𝑥𝑖 − 𝜇ො𝑘 2
𝑁−𝐾
• The LDA classifier plugs into these estimates for observation 𝑋 = 𝑥
𝑥 𝜇ො𝑘 𝜇ො𝑘2
𝛿መ𝑘 𝑥 = 2 − 2 + log(𝜋ො 𝑘 )
𝜎ො 2𝜎ො
Multivariate Gaussian
LDA is the special case when it is assume that the covariance matrix is
same for all the classes
Σ𝑘 = Σ ∀𝑘
LDA with Multiple Predictors
Estimated values:
𝑁𝐾
𝜋ො 𝑘 =
𝑁
𝑥
𝜇ො𝑘 = σ𝑔𝑖 =𝑘 𝑖
𝑁𝑘
𝑥𝑖 −ෝ 𝜇𝑘 𝑇
𝜇𝑘 𝑥𝑖 −ෝ
Σ = 𝐾
σ𝑘=1 σ𝑔𝑖 =𝑘
𝑁−𝐾
Quadratic Discriminant function
• When the assumption of equal covariant matrix for all classes is
dropped, we get QDA
• Discriminant function
1
𝛿𝑘 𝑥 = − log Σ𝑘 − 𝑥 − 𝜇𝑘 𝑇 Σ𝑘−1 (𝑥 − 𝜇𝑘 ) + log 𝜋𝑘
2
Decision rule: 𝐺 𝑥 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑘 𝛿𝑘 (𝑥)
Example: Stock Market Data
Example: Stock Market Data
> plot(lda.fit)
Test Error Rate