MDA Session 14

27-07-2023
Canonical Discriminant
analysis
The Good, the Bad and the Dividing Line
Good Accounts
Bad Accounts
25
20
15
Return on Investm ent
10
0
0.00 0.50 1.00 1.50 2.00 2.50
-5 Current Ratio
-10
1
27-07-2023
Why Line?
• Simple -- Ease of interpretation
• ‘Best discriminator’ in the case of NORMAL populations with equal

variance-covariance matrix
• At certain times, not good enough..
Objective criterion for choosing the ‘Best’ line
Z  a X  bY
CR ROI
Choose a,b so that Z-values of ‘good accounts’ are as
‘different’ from the Z-values of ‘bad accounts’ as possible
between group variation ( Z1  Z 2 ) 2

Max  Max
a ,b within group variation a ,b
 (Z1i  Z1 ) 2   (Z 2i  Z 2 ) 2
2
27-07-2023
Optimal Choice of discriminant coefficients

1
a   x2  xy   X1  X 2 
      
b  xy  y2   Y1  Y2 
1   y2   xy   x 
   
 x2 y2   xy xy    xy  x2   y 
 y2 x   xy y
a 
 x2 y2   xy xy
 x2 y   xy x
b 
 x2 y2   xy xy
5
Numerical Illustration: Case

Good Accounts Bad Accounts
Account Current Return on Account Current Return on

Number Ratio Investment Number Ratio Investment
1 1.10 13 11 0.70 11
2 1.50 15 12 0.90 -4
3 1.20 17 13 0.80 6
4 0.90 21 14 1.30 2
5 1.60 7 15 1.10 6
6 2.20 8 16 0.50 8
7 0.90 16 17 0.30 8
8 1.00 13 18 1.40 6
9 1.30 8 19 0.90 3
10 1.30 3 20 1.10 14
Average 1.30 12.10 0.90 6.00

overall 1.10 9.05
x  1.30  0.90  0.40

X Y
y  12.10  6.00  6.10 6
3
27-07-2023
Numerical Illustration (cont.)

X Y
Account Current Return on (X-X_bar)*
Number Ratio Investment (X-X_bar)^2 (Y-Y_bar)^2 (Y-Y_bar)
1 1.10 13 0.000 15.603 0.000
2 1.50 15 0.160 35.403 2.380
3 1.20 17 0.010 63.203 0.795
4 0.90 21 0.040 142.803 -2.390
5 1.60 7 0.250 4.203 -1.025
6 2.20 8 1.210 1.103 -1.155
7 0.90 16 0.040 48.303 -1.390
8 1.00 13 0.010 15.603 -0.395
9 1.30 8 0.040 1.103 -0.210
10 1.30 3 0.040 36.603 -1.210
11 0.70 11 0.160 3.803 -0.780
12 0.90 -4 0.040 170.303 2.610
13 0.80 6 0.090 9.303 0.915
14 1.30 2 0.040 49.703 -1.410
15 1.10 6 0.000 9.303 0.000
16 0.50 8 0.360 1.103 0.630
17 0.30 8 0.640 1.103 0.840
18 1.40 6 0.090 9.303 -0.915
19 0.90 3 0.040 36.603 1.210
20 1.10 14 0.000 24.503 0.000
7
average 1.10 9.05 0.163 33.948 -0.075
Numerical Illustration (cont.)

 y2 x   xy y 33.948  0.4  0.075  6.1
a 2 2   2.539
 x  y   xy xy 0.163  33.948  0.075 2
 x2 y   xy x 0.163  6.1  0.075  0.4

b 2 2   0.185
 x  y   xy xy 0.163  33.948  0.0752
4
27-07-2023
X
Z  a X  b Y  a b   
Y 
1
a   x2  xy   X1  X 2 
      
b  xy  y2  
 1 2 
Y Y
General form of the discriminant function

If there are p independent variables
X  X 1 X2  X p
'
The discriminator function is :
( X 1  X 2 ) '  1 X  Z
9
Classification Rule based on discriminator

function
X 2 Y2  X 1 Y1 
x y
Z2 z Z1
Z-values Classify the new observation to population 1

If z is closer to Z1 than Z2 . 10
10
5
27-07-2023
Multiple Discriminant Analysis

• When your have to discriminate between MORE than two groups
• More than one [as many as min(g-1, p) ] discriminant functions may

be used
11
11
Canonical Correlation in Discriminant analysis
(X1, X2,….,Xp) (U1, U2,….,Ug-1)
Indicators of group-memberships
Find best linear combination that predict memberships

Find best linear combination among all which are
independent of the first.
12
12
6
27-07-2023
Correlation Multiple correlation

 canonical correlation
Correlation: between two variables. What is 2

R: between Y and (X1,X2…Xp) What is R2
CC: between (Y1,Y2,…Yq) and (X1,X2…Xp) What is CC2
13
13
Example: Multiple discriminant analysis

family income attitude travel Importance fam vac HH size age- head HH amount spent on holiday
50.2 5 8 3 43 2
70.3 6 7 4 61 3
62.9 7 5 6 52 3
48.5 7 5 5 36 1
52.7 6 6 4 55 3
75 8 7 5 68 3
46.2 5 3 3 62 2
57 2 4 6 51 2
64.1 7 5 4 57 3
68.1 7 6 5 45 3
73.4 6 7 5 44 3
71.9 5 8 4 64 3
56.2 1 8 6 54 2
49.3 4 2 3 56 3
62 5 6 2 58 3
Resort visit 1: visited the resort

Amount spent on vacation: 1 (Low) 2 (medium) 3 (High)
14
14
7
27-07-2023
Data for those who did not visit the resort

resort visit family income attitude travel Importance fam vac HH size age- head HH amount spen
2 32.1 5 4 3 58 1
2 36.2 4 3 2 55 1
2 43.2 2 5 2 57 2
2 50.4 5 2 4 37 2
2 44.1 6 6 3 42 2
2 38.3 6 6 2 45 1
2 55 1 2 2 57 2
2 46.1 3 5 3 51 1
2 35 6 4 5 64 1
2 37.3 2 7 4 54 1
2 41.8 5 1 3 56 2
2 57 8 3 2 36 2
2 33.4 6 8 2 50 1
2 37.5 3 2 3 48 1
Resort visit 2: Did not visit the resort

Amount spent on vacation: 1 (Low) 2 (medium) 3
(High) 15
15
Objective
• Predict/explain different categories of amount spent on
the basis of
• Annual family income ---Attitude towards travel
• Importance given to family vacation
• Household size ---Age of the Head of HH
• Which of the above variables are ‘good’ discriminators?

• Predict expense category of families information of
which may be available in terms of the predictor
variables
16
16
8
27-07-2023
Group Statistics Mean Std. Deviation

amount spent on vacation
1 family income 38.57 5.30
importance family vacation 4.70 1.89
travel attitude 4.50 1.72
household size 3.10 1.20
age of household head 50.30 8.10
Total family income 51.22 12.80
17
17
Within Group Correlation matrix
income travel attitude Im. fam. Vac. HH size age head HH

family income 1.00 0.05 0.31 0.38 -0.21
travel attitude 0.05 1.00 0.04 0.00 -0.34
importance family vacation 0.31 0.04 1.00 0.22 -0.01
household size 0.38 0.00 0.22 1.00 -0.03
age of household head -0.21 -0.34 -0.01 -0.03 1.00
18
18
9
27-07-2023
Discrimination power of variables individually

(amount spent)
Tests of Equality of Group Means
Wilks' Lambda F Sig.
family income 0.26 38.00 0.0000
importance family vacation 0.88 1.83 0.1797
travel attitude 0.79 3.63 0.0400
household size 0.87 1.94 0.1626
age of household head 0.88 1.80 0.1840
Wilk’s Lambda = Within group SS/ Total SS
Good discrimination between groups  Small Lambda

19
19
Results from MDA  pg 

 n   1 ln W
Wilks' Lambda W  2 
k Test of Function(s) Wilks' Lambda Chi-square df Sig.
0 1 through 2 0.166 44.831 10 0.0000
1 2 0.802 5.517 4 0.2383
q
1
 1 
i  k 1
(p-k)(g-k-1)
of W 1 B i
Eigenvalues
Function Eigenvalue % of Variance Cumulative % Canonical Correlation
1 3.82 93.93 93.93 0.89
2 0.25 6.07 100.00 0.44
a First 2 canonical discriminant functions were used in the analysi
i
=SSB/SSW
i
1  i 20
20
10
27-07-2023
Significance of discriminant functions: Justification

through W and CC
=SSW/SST
Discrim eigen CC Prop. Prop

fn. value explained unexplained
1 1 1
1 1
1  1 1  1 1  1
2 2 1 1 1
2 2
1  2 1  2 1  1 1  1 1  2
21
21
22
11
27-07-2023
Un-standardized Discriminant
Function Coefficients
Function 1 Function 2
family income 0.1543 -0.0620
importance family vacation -0.0695 0.2613
household size -0.1265 0.1003
(Constant) -11.0944 -3.7916
Use this to classify future observations 23
23
Standardized Discriminant
Function Coefficients
Function 1 Function 2
family income 1.0474 -0.4208
importance family vacation -0.1420 0.5335
household size -0.1632 0.1293
24
24
12
27-07-2023
Structure Matrix: Discriminant Loadings

Correlation between discriminant functions
and predictor variables
25
25
Functions at Group Centroids
amount spent on vacation Function 1 Function 2

1 -2.0410 0.4185
2 -0.4048 -0.6587
3 2.4458 0.2402
L H
Function 1 separates  1 from  3 INCOME
Function 2 separates  1 from  2 Travel, vacation & age
26
26
13
27-07-2023
plot(travel.can1)
library(heplots)
heplot(travel.can1, scale=6, fill=TRUE)
27
Territorial Map
1
* * 3
*
2
prepared by S. Das
28
14
27-07-2023
Territorial Map
1
* * 3
*
2
prepared by S. Das
29
Hold-out sample
resort visit family income attitude travel Importance fam vac HH size age- head HH amount spe
1 50.8 4 7 3 45 2
1 63.6 7 4 7 55 3
1 54.0 6 7 4 58 2
1 45.0 5 4 3 60 2
1 68.0 6 6 6 46 3
1 62.1 5 6 3 56 3
2 35.0 4 3 4 54 1
2 49.6 5 3 5 39 1
2 39.4 6 5 3 44 3
2 37.0 2 6 5 51 1
2 54.5 7 3 3 37 2
2 38.2 2 2 3 49 1
Random part of the data set aside for validation

30
30
15
27-07-2023
Estimating Misclassification probabilities /

Classification matrix
• Default (re-substitution) : use estimates from entire data to predict
classification
• Performance in Hold-out sample
• Principle of cross-validation
• While classifying a specific case use all but that observation
31
31
Classification matrix: Hit-Ratio

Can the correct classifications be attributed to chance?
Hit ratio : proportion of correct classifications

Actual group Predicted group membership
1 2 3
1 9 1 0
Hit Ratio =86.67%
2 1 9 0
3 0 2 8
Hold-out samples
Actual group Predicted group membership
1 2 3 Hit Ratio =75%
1 3 1 0
2 0 9 1
32
3 1 0 3
32
16
27-07-2023
https://www.rdocumentation.org/packages/candisc/versions/0.8-6/topics/candisc
33
Linear Discriminant Analysis (LDA) using R

Peter Nistrup
34
17
27-07-2023
Data
The dataset ‘Breast Cancer Wisconsin (Diagnostic) Data Set’ was used for the
analysis.
There are 569 observations on 32 variables.
Features are computed from a digitized image of a fine needle aspirate (FNA) of a
breast mass. They describe characteristics of the cell nuclei present in the image.
The diagnostic classification of the breast mass is given as either Benign or
Malignant.
This dataset is suitable for understanding how characteristics of the FNA image of
the breast mass relates to diagnosis of whether the mass is benign or malignant.
35
Data
ID number
Diagnosis (M = malignant, B = benign)
Ten real-valued features are computed for each cell nucleus; mean, standard
error and worst values are given for each of the features, thus bringing the
number of features to 30:
radius (mean of distances from center to points on the perimeter)
texture (standard deviation of gray-scale values)
perimeter
area
smoothness (local variation in radius lengths)
compactness (perimeter^2 / area - 1.0)
concavity (severity of concave portions of the contour)
concave points (number of concave portions of the contour)
symmetry
fractal dimension ("coastline approximation" - 1)
36
18
27-07-2023
Why LDA?
One of the objectives is to understand what qualities in a tumor contributes to

whether or not it is malignant.
From the PC1-PC2 plot, it is evident that there is clear separation of the two
categories.
37
Breast Cancer Diagnostic- Wisconsin Data

• Is LDA post PCA better than raw LDA ?
• ROC (Receiver operative characteristic) and AUC (Area under curve)
• Try with different seeds
38
19
27-07-2023
Why LDA?
LDA will try to find the decision boundary at which the classification is most
successful.
For example, consider only two dimensions and two distinct clusters; LDA will
project these clusters down to one dimension.
39
40
20

MDA Session 14

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MDA Session 14

Uploaded by

Copyright:

Available Formats

27-07-2023

The Good, the Bad and the Dividing Line

• Simple -- Ease of interpretation

• ‘Best discriminator’ in the case of NORMAL populations with equal

• At certain times, not good enough..

Objective criterion for choosing the ‘Best’ line

between group variation ( Z1  Z 2 ) 2

Optimal Choice of discriminant coefficients

Numerical Illustration: Case

Account Current Return on Account Current Return on

Average 1.30 12.10 0.90 6.00

x  1.30  0.90  0.40

Numerical Illustration (cont.)

Numerical Illustration (cont.)

 x2 y   xy x 0.163  6.1  0.075  0.4

General form of the discriminant function

The discriminator function is :

Classification Rule based on discriminator

Z-values Classify the new observation to population 1

Multiple Discriminant Analysis

• More than one [as many as min(g-1, p) ] discriminant functions may

Canonical Correlation in Discriminant analysis

(X1, X2,….,Xp) (U1, U2,….,Ug-1)

Find best linear combination that predict memberships

Correlation Multiple correlation

Correlation: between two variables. What is 2

Example: Multiple discriminant analysis

Resort visit 1: visited the resort

Data for those who did not visit the resort

Resort visit 2: Did not visit the resort

• Which of the above variables are ‘good’ discriminators?

Group Statistics Mean Std. Deviation

Within Group Correlation matrix

income travel attitude Im. fam. Vac. HH size age head HH

Discrimination power of variables individually

Wilk’s Lambda = Within group SS/ Total SS

Good discrimination between groups  Small Lambda

Results from MDA  pg 

Significance of discriminant functions: Justification

Discrim eigen CC Prop. Prop

Use this to classify future observations 23

Structure Matrix: Discriminant Loadings

Functions at Group Centroids

amount spent on vacation Function 1 Function 2

Function 2 separates  1 from  2 Travel, vacation & age

Random part of the data set aside for validation

Estimating Misclassification probabilities /

• Performance in Hold-out sample

Classification matrix: Hit-Ratio

Hit ratio : proportion of correct classifications

Linear Discriminant Analysis (LDA) using R

One of the objectives is to understand what qualities in a tumor contributes to

Breast Cancer Diagnostic- Wisconsin Data

• Try with different seeds

You might also like