Module03 LinearDiscriminantAnalysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Linear Discriminant Analysis

Dr. Sayak Roychowdhury


Department of Industrial & Systems Engineering,
IIT Kharagpur
Reference Books
• James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An
introduction to statistical learning (Vol. 112, p. 18). New York:
springer.
• Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H.
(2009). The elements of statistical learning: data mining,
inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
Why LDA?
• When classes are well separated, the parameter estimates for Logistic
Regression may become unstable.
• When 𝑛 is small and the distribution of 𝑋 is approximately normal,
LDA is more stable.
• LDA is more applicable when there are more than 2 classes, it
provides low dimensional view of the data.
• With right population model, the Bayes Rule is the best model.
Decision boundaries

Linear decision boundaries found by LDA Quadratic decision boundaries using LDA
Bayes Theorem for Classification
Pr 𝑋=𝑥𝑌=𝑘 .Pr(𝑌=𝑘)
• Pr 𝑌 = 𝑘 𝑋 = 𝑥 =
Pr(𝑋=𝑥)
𝜋𝑘 𝑓𝑘 𝑥
• Pr 𝑌 = 𝑘 𝑋 = 𝑥 = σ𝑙 𝜋𝑙 𝑓𝑙 𝑥
• Where 𝑓𝑘 𝑥 = Pr 𝑋 = 𝑥 𝑌 = 𝑘 is the conditional density of 𝑋 in
class 𝑘
and 𝜋𝑘 = Pr(𝑌 = 𝑘) is the prior probability
• We need to know the class posterior Pr 𝑌 = 𝑘 𝑋 = 𝑥 for optimal
classification
Linear Discriminant Analysis for One Predictor
• The observation will be classified for which
𝑝𝑘 𝑥 = Pr 𝑌 = 𝑘 𝑋 = 𝑥 is greatest
1 1 2
• 𝑓𝑘 𝑥 = exp − 2 𝑥 − 𝜇𝑘
𝜎𝑘 2𝜋 2𝜎𝑘
Where 𝜇𝑘 and 𝜎𝑘 are mean and variance parameters of the 𝑘 𝑡ℎ class
• For Linear Discriminant Analysis, it is assumed
𝜎12 = 𝜎22 =. . = 𝜎𝑘2 = 𝜎 2
Linear Discriminant Analysis for One Predictor
𝜋𝑘 𝑓𝑘 𝑥
• 𝑝𝑘 𝑥 = Pr 𝑌 = 𝑘 𝑋 = 𝑥 =σ
𝑙 𝜋𝑙 𝑓𝑙 𝑥
1 1
𝜋𝑘 exp − 2 𝑥−𝜇𝑘 2
𝜎 2𝜋
𝑘 2𝜎 𝑘
= 1 1
σ𝑙 𝜋𝑙 exp − 2 𝑥−𝜇𝑙 2
𝜎𝑙 2𝜋 2𝜎𝑙
• The Bayes classifier will assign an observation at 𝑋 = 𝑥 to the class for
which 𝑝𝑘 (𝑥) is largest
• This is equivalent to assign the observation to a class for which 𝛿𝑘 𝑥
is largest
2
𝑥𝜇𝑘 𝜇𝑘
𝛿𝑘 𝑥 = − + log(𝜋𝑘 )
𝜎2 2𝜎 2
Bayes Decision Boundary
• The Bayes decision boundary correspond to the point where
𝜇12 −𝜇22 𝜇1 +𝜇2
𝑥= =
2 𝜇1 −𝜇2 2
Parameter Estimation
• 𝜇ො𝑘 = 1/𝑛𝑘 σ𝑖:𝑦𝑖 =𝑘 𝑥𝑖
• If no knowledge of prior probability 𝜋𝑘 is available, then it can be
estimated by
𝑛𝑘
𝜋ො 𝑘 =
𝑁
1
• 2
𝜎ො = σ𝐾
𝑘=1 σ𝑖:𝑦𝑖 =𝑘 𝑥𝑖 − 𝜇ො𝑘 2
𝑁−𝐾
• The LDA classifier plugs into these estimates for observation 𝑋 = 𝑥
𝑥 𝜇ො𝑘 𝜇ො𝑘2
𝛿መ𝑘 𝑥 = 2 − 2 + log(𝜋ො 𝑘 )
𝜎ො 2𝜎ො
Multivariate Gaussian

X1 and X2 uncorrelated, with X1 and X2 correlated


Var(X1)=Var(X2)
Gaussian Density Multiple Predictors
• Suppose each class density is multivariate Gaussian
1 1
− 𝑥−𝜇𝑘 𝑇 Σ−1 𝑥−𝜇𝑘
𝑓𝑘 𝑥 = 𝑝 1 𝑒
2
2𝜋 2 Σ𝑘 2

LDA is the special case when it is assume that the covariance matrix is
same for all the classes
Σ𝑘 = Σ ∀𝑘
LDA with Multiple Predictors

3 Gaussian Distributions Samples from 3 Gaussian Distributions,


Solid lines indicating LDA boundaries
Linear Discriminant function
• Discriminant function
1 𝑇 −1
𝛿𝑘 𝑥 = 𝑥 𝑇 Σ −1 𝜇𝑘 − 𝜇𝑘 Σ 𝜇𝑘 + log 𝜋𝑘
2
Decision rule: 𝐺 𝑥 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑘 𝛿𝑘 (𝑥)
(the predicted class of 𝑥 is the one with largest 𝛿𝑘 (𝑥) value)

Estimated values:
𝑁𝐾
𝜋ො 𝑘 =
𝑁
𝑥
𝜇ො𝑘 = σ𝑔𝑖 =𝑘 𝑖
𝑁𝑘
𝑥𝑖 −ෝ 𝜇𝑘 𝑇
𝜇𝑘 𝑥𝑖 −ෝ
Σ෠ = 𝐾
σ𝑘=1 σ𝑔𝑖 =𝑘
𝑁−𝐾
Quadratic Discriminant function
• When the assumption of equal covariant matrix for all classes is
dropped, we get QDA
• Discriminant function
1
𝛿𝑘 𝑥 = − log Σ𝑘 − 𝑥 − 𝜇𝑘 𝑇 Σ𝑘−1 (𝑥 − 𝜇𝑘 ) + log 𝜋𝑘
2
Decision rule: 𝐺 𝑥 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑘 𝛿𝑘 (𝑥)
Example: Stock Market Data
Example: Stock Market Data
> plot(lda.fit)
Test Error Rate

You might also like