DADM S14 Linear Discriminant Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Session 14: Linear Discriminant Analysis

(LDA)

Dr. Mahesh K C 1
LDA: Basic Concept and Objectives
• A technique for analyzing multivariate data when the response variable is categorical
and the predictors are interval in nature.
• In most cases the dependent variable consists of two groups or classifications, like, high
versus normal blood pressure, loan defaulting versus non defaulting, use versus non use
of internet banking etc.
• The choice between three candidates, A, B or C in an election is an example where the
dependent variable consists of more than two groups.

• Objectives:
• Develop a discriminant function: A linear combination of predictors that will best
discriminate between the categories of response variable (groups).
• Examine whether there exist significance differences between the groups and the
predictors.
• Classify the cases to one of the groups based on the value of the predictors.
• Evaluate the accuracy of the classification.
Dr. Mahesh K C 2
Some examples LDA
• The technique can be used to answer the questions such as:

• Based on demographic characteristics, how do customers who exhibit


store loyalty differ from those who do not?
• Do heavy, medium and light users of soft drinks differ in terms of their
consumption of frozen foods?

• Do the various market segments differ in their media consumptions habit?


• What psychographic characteristics help in differentiating price- sensitive
and non-price-sensitive buyers?
• To study the bankruptcy problem.

Dr. Mahesh K C 3
Fisher’s Linear Discriminant Function
• Typically considered more of a statistical classification method than a data
mining method introduced by R A Fisher in 1936.
• Obtain linear combination (known as discriminant function) of independent
variables that will best discriminate the groups in the dependent variables.

• Idea is to find linear functions of the measurements that maximize the ratio of
between-class variability to within-class variability. In other words, obtain
groups that are homogeneous and differ the most from each other.

• For each record, these functions are used to compute scores that measure the
proximity of that record to each of the classes.
• A record is classified as belonging to the class for which it has the highest
classification score.
Dr. Mahesh K C 4
The LDA Model and assumptions
• Let X1, X2,…, Xk denotes the predictors and let D, the discriminant score. Then a linear
combinations of the predictors is given by:
D  b1 X 1    bk X k .
where bi stands for discriminant coefficients or weights. Note that with k groups we
need k-1 discriminators.
• Assumptions:
1) The groups must be mutually exclusive and have equal sample size.
2) Groups should have the same variance-covariance matrices on independent
variables.
3) The independent variables should be multivariate normally distributed.

• If assumption 3 is met, then LDA is more powerful tool than other classification
methods such as logistic regression (roughly 30% more efficient, see Efron 1975).
• LDA perform better as sample size increases.
Dr. Mahesh K C 5
Statistics associated with LDA
• Cannonical Correlation: measures the extent of association between the
discriminant function and the groups in the response variable.

• Confusion(Classification) matrix: A matrix representing the number of correctly


classified and misclassified cases. Correctly classified cases appear in the diagonal
and misclassified cases appear off-diagonal.

• Hit ratio (accuracy) : sum of the diagonal elements divided by the total number of
cases.
• Eigenvalue: the ratio of between-group to within-group sum of squares. Large
eigenvalues imply superior function.

Dr. Mahesh K C 6
The Iris Flower Data
• This famous (Fisher's or Anderson's) iris data set gives the measurements in
centimeters of the variables sepal length and width and petal length and width,
respectively, for 50 flowers from each of 3 species of iris. The species are Iris
setosa, Iris versicolor, and Iris virginica.
• Predictors: Sepal. Length, Sepal. Width, Petal. Length and Petal. Width
• Dependent Variable: Species with three levels Setosa, Versicolor and Virginica
• Total Observations: 150

• Iris setosa iris versicolor iris virginica


Dr. Mahesh K C 7
LDA of iris data
• Required R-packages: MASS & psych
• Scatter plot shows in most of the cases, a clear grouping of the Species.
• Partition the data into 70% training and 30% testing.
• The three groups had equal prior probability (33%).

• The group means (centroid) clearly show a separation between the groups and the
corresponding predictors.
• Since Species have three levels, we got two discriminant functions (LD1 and LD2) with
corresponding weights.
LD1  0.534Sepal.Length  2.125Sepal.Width  1.962 Petal.Length  3.561Petal.Width

LD 2  0.294 Sepal .Length  1.933Sepal .Width  1.143 Petal .Length  3.003 Petal .Width

Dr. Mahesh K C 8
Matrix plot: Iris data

Dr. Mahesh K C 9
LDA of iris data Cont’d
• The proportion of trace for LD1 = 0.993 and that of LD2 = 0.007. This implies the
% of separation achieved by the discriminant function.
• The predicted classification is: Setosa = 35, Versicolor = 36 and Virginica = 35 in
the training data.
• The eigenvalues corresponding to LD1 and LD2 are 44.23 and 3.71. Higher the
eigenvalue better the separation.
• The discriminant scores are obtained by using LD1 and LD2 for different record
values for the predictors.

Dr. Mahesh K C 10
Histogram of Discriminant Scores based on LD1 &LD2

The histogram of discriminant scores based on LD1 (left panel) shows


clear separation of Species and that of LD2 shows significant overlapping.

Dr. Mahesh K C 11
Confusion Matrix & Accuracy
• Confusion Matrix: A matrix summarizes the correct and incorrect classifications
that a classifier produced for a certain dataset (see table 1).
• Accuracy: The overall accuracy of the correct classification. For the training data,
accuracy is 100%.
Table 1: Confusion matrix for training data Table 2: Confusion matrix for test data
Predicted Actual (Training) Predicted Actual
Setosa Versicolor Virginica Setosa Versicolor Virginica
Setosa 35 0 0 Setosa 15 0 0
Versicolor 0 36 0 Versicolor 0 13 1
Virginica 0 0 35 Verginica 0 1 14

• For the validation data accuracy is 95.45% which is expected.

Dr. Mahesh K C 12
References

• Shmueli, G., Bruce, P .C, Yahav, I., Patel, N.R., Lichtendahl, K .C.
(2018), Data Mining for Business Analytics, Wiley.
• Larose, D.T. & Larose, C.D. (2016), Data Mining and Predictive
Analytics, 2nd edition, Wiley.
• Kumar, U.D., (2018), Business Analytics-The Science of Data-
Driven Decision Making, 1st edition, Wiley.

Dr. Mahesh K C 13

You might also like