L6 7 Discriminant Analysis PCA

01/03/2023
SMA 2437
MULTIVARIATE METHODS Discriminant
ANALYSIS
Lecture 6 and 7
@tkaranjah 2023 1 @tkaranjah 2023 2
Discriminant Analysis Discriminant Analysis

The objective is the Description of group
separation, in which linear functions of
NOTE the variables (discriminant functions) are
We use the term group to represent used to describe or elucidate the
either a population or a sample differences between two or more groups.
from the population.
The goals of descriptive discriminant
analysis therefore is to identifying the
relative contribution of the p variables to
separation of the groups.

THE DISCRIMINANT FUNCTION FOR TWO
GROUPS
We assume that the two populations to be

Therefore we may state that: compared have the same covariance matix
but distinct mean vectors
Discriminant functions are linear combinations
of variables that best separate groups. We work with samples y11, y12, . . . , y1n1 and
y21, y22, . . . , y2n2 from the two populations.
As usual, each vector yi j consists of
measurements on p variables.
1
01/03/2023
The discriminant function is the linear

combination of these p variables that
maximizes the distance between the two
(transformed) group mean vectors.

Thus the maximizing vector a is not unique.
However, its “direction”is unique; that is, the
relative values or ratios of a1, a2, . . . , ap are
unique, and projects points y onto the
line on which
is maximized.
2
01/03/2023

Example 8.2.
Samples of steel produced at two different
rolling temperatures are compared in Table 8.1
(Kramer and Jensen 1969a). The variables are
y1 = yield point and y2 = ultimate strength.

From the data, we calculate
We see that if the points were projected on

either the y1 or the y2 axis, there would be
considerable overlap. In fact, when the two
groups are compared by means of a t-
@tkaranjah 2023 15
statistic for each variable separately, both
@tkaranjah 2023 16
t’s are nonsignificant:
However, it is clear in Figure 8.3 that the two

groups can be separated. If they are projected
in an appropriate direction, as in Figure 8.1,
there will be no overlap.
The single dimension onto which the points
would be projected is the discriminant function
3
01/03/2023
Discriminant Analysis
PRINCIPAL
The values of the projected points are
found by calculating z for each
COMPONENT ANALYSIS
observation vector y in the two groups.
The results are given in Table 8.2, where
the separation provided by the
discriminant function is clearly evident.
The number of variables is reduced while retaining as much

PCA is a statistical technique used to describe the variation in a
set of multivariate data in terms of a set of uncorrelated of the variability of the original variables as possible.
variables; i.e used to explain/summarize the underlying
variance-covariance structure of a large set of variables through a
few linear combinations of these variables. The new variables known as principal components are
linear combinations of the original variables and are
We typically have a data matrix of n observations on p
correlated variables x1,x2,…xp uncorrelated.
It is preferable if PCA is applied to data that have
PCA looks for a transformation of the xi into p new variables yi
that are uncorrelated approximately the same scale in each original variable.
Suppose we have a population measured on p random variables

X1,…,Xp. Note that these random variables represent the p-axes of
the Cartesian coordinate system in which the population resides.
Consider our random vector
Our goal is to develop a new set of p axes (linear combinations of
the original p axes) in the directions of greatest variability:  X1 
X 
X =  
2
This is accomplished by rotating the axes
 
X2  
 X p 
with covariance matrix  and eigenvalues 1  2    p.

X1
4
01/03/2023
We can construct p linear combinations The principal components are those uncorrelated linear
combinations Y1,…,Yp whose variances are as large as possible.
Y1 = a'1 X = a11X 1 + a12X 2 + + a1p X p Thus the first principal component is the linear combination of
Y2 = a'2 X = a21X 1 + a22X 2 + + a2p X p maximum variance, i.e., we wish to solve the nonlinear
optimization problem
Yp = a'p X = a p1X 1 + a p2X 2 + + a pp X p
source of max a'1Σa1
nonlinearity a1 restrict to
It is easy to show that coefficient vectors
st a'1a1 = 1 of unit length
Var ( Yi ) = a'iΣai, i = 1, ,p
Cov ( Yi, Yk ) = a Σak, i, k = 1,
'
i ,p
The second principal component is the linear combination of The third principal component is the solution to the
maximum variance that is uncorrelated with the first nonlinear optimization problem
principal component, i.e., we wish to solve the nonlinear
optimization problem
max a'3Σa3
a3
max a'2Σa2 st a'3a3 = 1
a2
restricts restricts
st '
a a2 = 1
2 covariance a'1Σa3 = 0 covariances
to zero to zero
a'1Σa2 = 0 a'2Σa3 = 0
Generally, the ith principal component is the linear

combination of maximum variance that is uncorrelated with We can show that, for random vector X with covariance matrix 
all previous principal components, i.e., we wish to solve the and eigen-values 1  2    p the ith principal component is
nonlinear optimization problem given by
Yi = e'i X = e'i1X 1 + e'i2X 2 + + e'ip X p, i = 1, ,p

'
max a Σai i
ai
st a'iai = 1
a'k Σai = 0 k < i
5
01/03/2023
We can assess how well a subset of the principal

We can also show for random vector X with covariance matrix  components Yi summarizes the original random
and eigenvalue-eigenvector pairs (1 , e1), …, (p , ep) where 1  variables Xi – one common method of doing so is
 2     p,
λk proportion of total population variance
p
due to the kth principal component
σ11 + + σpp =
p
 Var ( X )
i = λ1 + + λp =
p
 Var (Y )
i
λ
i=1
i
i=1 i=1
We can also easily find the correlations between the original

random variables Xk and the principal components Yi:
If a large proportion of the total population
variance can be attributed to relatively few
principal components, we can replace the original eik λi
p variables with these principal components ρYi,Xk =
without loss of much information. σkk
These values are often used in interpreting the principal

components Yi.
Example: Suppose we have the following population of four First we need the covariance matrix :
observations made on three random variables X1, X2, and X3:
2.00 3.33 1.33
 
Σ = 3.33 8.00 4.67
X1 X2 X3 1.33 4.67 7.00
1.0 6.0 9.0
4.0 12.0 10.0 and the corresponding eigenvalue-eigenvector pairs:
3.0 12.0 15.0
4.0 10.0 12.0
Find the three population principal components Y1, Y2, and

Y3:
6
01/03/2023
so the principal components are:

 0.2910381
  Y1 = e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3
λ1 = 13.219396, e1 =  0.7342493
 0.6133309 Y2 = e'2X = - 0.4150386X 1 - 0.4807165X 2 + 0.7724340X 3
 - 0.4150386 Y3 = e'3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3
 
λ2 = 2.5344988, e2 =  - 0.4807165
 0.7724343  and
 0.8619976 σ11 + σ22 + σ33 = 2.0 + 8.0 + 7.0 = 17.0
 
λ3 = 0.3009542, e3 = -0.4793638
= 13.219396+3.3793317+0.4012723
 0.1648350
= λ1 + λ 2 + λ 3
and the proportion of total population variance due to the each Next we obtain the correlations between the original random
principal component is variables Xi and the principal components Yi:
λ1 13.219396 e11 λ1 0.2910381 13.219396

p = = 0.777611529 ρY1,X1 = = = 0.529085
17.0
λ
i=1
i
σ11 2.00
λ2 3.3793317 e21 λ1 0.7342493 13.219396

p
= = 0.198784220 ρY1,X2 = = = 0.333702
17.0 σ22 8.00
λ
i=1
i
λ3 0.4012723 e31 λ1 0.6133309 13.219396

p
= = 0.023604251 ρY1,X3 = = = 0.318568
17.0
λ
i=1
i
σ33 7.00
e12 λ2 -0.4150386 3.3793317 e13 λ3

ρY2,X1 = = = -0.381482 0.8619976 0.4012723
σ11 2.00 ρY3,X1 = = = 0.273021
σ11 2.00
e22 λ2 -0.4807165 3.3793317 e23 λ3 -0.4793640 0.4012723

ρY2,X2 = = = -0.110462 ρY3,X2 = = = -0.037957
σ22 8.00 σ22 8.00
e32 λ2 0.7724340 3.3793317 e33 λ3 0.1648350 0.4012723

ρY2,X3 = = = 0.202852 ρY3,X3 = = = 0.014917
σ33 7.00 σ33 7.00
7
01/03/2023
We can display these results in a correlation matrix: We could standardize the variables X1, X2, and X3, then work with
the resulting covariance matrix , but it is much easier to proceed
X1 X2 X3 directly with correlation matrix :
Y1 0.5290853 0.3337024 0.3185683
Y2 -0.3814819 -0.1104624 0.2028517
1.000 0.833 0.356 
Y3 0.2730207 -0.0379573 0.0149166  
ρ = 0.833 1.000 0.624 
0.356 0.624 1.000 
- the first principal component (Y1) is a mixture of all three
random variables (X1, X2, and X3)
and the corresponding eigenvalue-eigenvector pairs:
- the second principal component (Y2) is a trade-off between X1
and X3
- the third principal component (Y3) is a residual of X1

0.58437383 the principal components are:

 
λ1 = 2.2149347, e1 = 0.63457754
0.50578527 Y1 = e'1Z = 0.5843738Z1 + 0.6345775Z2 + 0.5057853Z3
-0.5449250 Y2 = e'2Z = -0.5449250Z1 - 0.1549791Z2 + 0.8240377Z3
 
λ2 = 0.6226418, e2 = -0.1549791 These results differ
from the covariance- Y3 = e'3Z = 0.6013018Z1 - 0.7571610Z2 + 0.2552315Z3
 0.8240377
based principal
 0.6013018 components!
  and
λ3 = 0.1624235, e3 = -0.7571610
 0.2552315 σ11 + σ22 + σ33 = 1.0 + 1.0 + 1.0 = 3.0
= 2.2149347 + 0.6226418 + 0.1624235
= λ1 + λ 2 + λ 3
Next we obtain the correlations between the original random

and the proportion of total population variance due to the each variables Xi and the principal components Yi:
principal component is
λ1 2.2149347
p
= = 0.738311567 ρY1,Z1 = e11 λ1 = 0.58437383 2.2149347 = 0.869703464
3.0
λ
i=1
i
ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907

λ2 0.6226418
p
= = 0.207547267
3
λ
i=1
i ρY1,Z3 = e31 λ1 = 0.5057853 2.2149347 = 0.752742749
λ3 0.1624235
p
= = 0.054141167 ρY2,Z1 = e12 λ2 = -0.5449250 0.6226418 = -0.429987538
3
λ
i=1
i
8
01/03/2023
We can display these results in a correlation matrix:

ρY2,Z2 = e22 λ2 = -0.1549791 0.6226418 = -0.122290294
Z1 Z2 Z3
ρY2,Z3 = e32 λ2 = 0.8240377 0.6226418 = 0.650228824 Y1 0.8697035 0.944420 0.7527427
Y2 -0.4299875 -0.122290 0.6502288
Y3 0.2423354 -0.305150 0.1028629
ρY3,Z1 = e13 λ3 = 0.6013018 0.1624235 = 0.242335443
the first principal component (Y1) is a mixture of all three random
variables (Z1, Z2, and Z3)
ρY3,Z2 = e23 λ3 = -0.7571610 0.1624235 = -0.305149504
- the second principal component (Y2) is a trade-off between Z1
and Z3
ρY3,Z3 = e33 λ3 = 0.2552315 0.1624235 = 0.102862886
- the third principal component (Y3) is a trade-off between Z1
and Z2
Interpretation of components:
Assess the weight of each of the original variables in the component;
• If you multiply one variable by a scalar you get different for example, suppose we have five original variables and the first
results if using a covariance matrix principal component is
Y1 = 0.897 X1 + 0.023X 2 − 0.768 X 3 + 0.169 X 4 − 0.324 X 5
• However the correlation matrix is invariant to scale
Then X1 and X 3 have the highest weights and are the most
important variables in the component.
• PCA should be applied on data that have approximately the
Another of assessing the relationship is through the correlation of the
same scale in each variable
original variable X j and the component Yi ;
aij i aij i
rij = =
i  jj  jj
How many components to keep? Régression

The following data gives the measurement in centimeters of
▪Select components that have a cumulative variance >70% four components of a certain plant for a sample of size twelve.
▪ Keep components with eigenvalues >1
9
01/03/2023
END OF LECTURE 6
8:08:43 AM
@tkaranjah 2023 55
10

L6 7 Discriminant Analysis PCA

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L6 7 Discriminant Analysis PCA

Uploaded by

Copyright:

Available Formats

01/03/2023

Discriminant Analysis Discriminant Analysis

@tkaranjah 2023 3 @tkaranjah 2023 4

Discriminant Analysis Discriminant Analysis

We assume that the two populations to be

@tkaranjah 2023 5 @tkaranjah 2023 6

Discriminant Analysis Discriminant Analysis

The discriminant function is the linear

@tkaranjah 2023 7 @tkaranjah 2023 8

Discriminant Analysis Discriminant Analysis

@tkaranjah 2023 9 @tkaranjah 2023 10

Discriminant Analysis Discriminant Analysis

@tkaranjah 2023 11 @tkaranjah 2023 12

Discriminant Analysis Discriminant Analysis

@tkaranjah 2023 13 @tkaranjah 2023 14

Discriminant Analysis Discriminant Analysis

We see that if the points were projected on

t’s are nonsignificant:

Discriminant Analysis Discriminant Analysis

However, it is clear in Figure 8.3 that the two

@tkaranjah 2023 19 @tkaranjah 2023 20

The number of variables is reduced while retaining as much

@tkaranjah 2023 21 @tkaranjah 2023 22

Suppose we have a population measured on p random variables

with covariance matrix  and eigenvalues 1  2    p.

@tkaranjah 2023 23 @tkaranjah 2023 24

@tkaranjah 2023 25 @tkaranjah 2023 26

@tkaranjah 2023 27 @tkaranjah 2023 28

Generally, the ith principal component is the linear

Yi = e'i X = e'i1X 1 + e'i2X 2 + + e'ip X p, i = 1, ,p

@tkaranjah 2023 29 @tkaranjah 2023 30

We can assess how well a subset of the principal

@tkaranjah 2023 31 @tkaranjah 2023 32

We can also easily find the correlations between the original

These values are often used in interpreting the principal

@tkaranjah 2023 33 @tkaranjah 2023 34

Find the three population principal components Y1, Y2, and

so the principal components are:

@tkaranjah 2023 37 @tkaranjah 2023 38

λ1 13.219396 e11 λ1 0.2910381 13.219396

λ2 3.3793317 e21 λ1 0.7342493 13.219396

λ3 0.4012723 e31 λ1 0.6133309 13.219396

@tkaranjah 2023 39 @tkaranjah 2023 40

e12 λ2 -0.4150386 3.3793317 e13 λ3

e22 λ2 -0.4807165 3.3793317 e23 λ3 -0.4793640 0.4012723

e32 λ2 0.7724340 3.3793317 e33 λ3 0.1648350 0.4012723

@tkaranjah 2023 41 @tkaranjah 2023 42

- the third principal component (Y3) is a residual of X1

0.58437383 the principal components are:

@tkaranjah 2023 45 @tkaranjah 2023 46

Next we obtain the correlations between the original random

ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907

@tkaranjah 2023 47 @tkaranjah 2023 48

We can display these results in a correlation matrix:

How many components to keep? Régression

▪ Keep components with eigenvalues >1

@tkaranjah 2023 53 @tkaranjah 2023 54

You might also like