Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

01/03/2023

SMA 2437
MULTIVARIATE METHODS Discriminant
ANALYSIS
Lecture 6 and 7
@tkaranjah 2023 1 @tkaranjah 2023 2

Discriminant Analysis Discriminant Analysis


The objective is the Description of group
separation, in which linear functions of
NOTE the variables (discriminant functions) are
We use the term group to represent used to describe or elucidate the
either a population or a sample differences between two or more groups.
from the population.
The goals of descriptive discriminant
analysis therefore is to identifying the
relative contribution of the p variables to
separation of the groups.

@tkaranjah 2023 3 @tkaranjah 2023 4

Discriminant Analysis Discriminant Analysis


THE DISCRIMINANT FUNCTION FOR TWO
GROUPS

We assume that the two populations to be


Therefore we may state that: compared have the same covariance matix
but distinct mean vectors
Discriminant functions are linear combinations
of variables that best separate groups. We work with samples y11, y12, . . . , y1n1 and
y21, y22, . . . , y2n2 from the two populations.
As usual, each vector yi j consists of
measurements on p variables.

@tkaranjah 2023 5 @tkaranjah 2023 6

1
01/03/2023

Discriminant Analysis Discriminant Analysis

The discriminant function is the linear


combination of these p variables that
maximizes the distance between the two
(transformed) group mean vectors.

@tkaranjah 2023 7 @tkaranjah 2023 8

Discriminant Analysis Discriminant Analysis

@tkaranjah 2023 9 @tkaranjah 2023 10

Discriminant Analysis Discriminant Analysis


Thus the maximizing vector a is not unique.
However, its “direction”is unique; that is, the
relative values or ratios of a1, a2, . . . , ap are
unique, and projects points y onto the
line on which

is maximized.

@tkaranjah 2023 11 @tkaranjah 2023 12

2
01/03/2023

Discriminant Analysis Discriminant Analysis


Example 8.2.
Samples of steel produced at two different
rolling temperatures are compared in Table 8.1
(Kramer and Jensen 1969a). The variables are
y1 = yield point and y2 = ultimate strength.

@tkaranjah 2023 13 @tkaranjah 2023 14

Discriminant Analysis Discriminant Analysis


From the data, we calculate

We see that if the points were projected on


either the y1 or the y2 axis, there would be
considerable overlap. In fact, when the two
groups are compared by means of a t-
@tkaranjah 2023 15
statistic for each variable separately, both
@tkaranjah 2023 16

t’s are nonsignificant:

Discriminant Analysis Discriminant Analysis

However, it is clear in Figure 8.3 that the two


groups can be separated. If they are projected
in an appropriate direction, as in Figure 8.1,
there will be no overlap.
The single dimension onto which the points
would be projected is the discriminant function
@tkaranjah 2023 17 @tkaranjah 2023 18

3
01/03/2023

Discriminant Analysis

PRINCIPAL
The values of the projected points are
found by calculating z for each
COMPONENT ANALYSIS
observation vector y in the two groups.
The results are given in Table 8.2, where
the separation provided by the
discriminant function is clearly evident.

@tkaranjah 2023 19 @tkaranjah 2023 20

The number of variables is reduced while retaining as much


PCA is a statistical technique used to describe the variation in a
set of multivariate data in terms of a set of uncorrelated of the variability of the original variables as possible.
variables; i.e used to explain/summarize the underlying
variance-covariance structure of a large set of variables through a
few linear combinations of these variables. The new variables known as principal components are
linear combinations of the original variables and are
We typically have a data matrix of n observations on p
correlated variables x1,x2,…xp uncorrelated.
It is preferable if PCA is applied to data that have
PCA looks for a transformation of the xi into p new variables yi
that are uncorrelated approximately the same scale in each original variable.

@tkaranjah 2023 21 @tkaranjah 2023 22

Suppose we have a population measured on p random variables


X1,…,Xp. Note that these random variables represent the p-axes of
the Cartesian coordinate system in which the population resides.
Consider our random vector
Our goal is to develop a new set of p axes (linear combinations of
the original p axes) in the directions of greatest variability:  X1 
X 
X =  
2
This is accomplished by rotating the axes
 
X2  
 X p 

with covariance matrix  and eigenvalues 1  2    p.


X1

@tkaranjah 2023 23 @tkaranjah 2023 24

4
01/03/2023

We can construct p linear combinations The principal components are those uncorrelated linear
combinations Y1,…,Yp whose variances are as large as possible.
Y1 = a'1 X = a11X 1 + a12X 2 + + a1p X p Thus the first principal component is the linear combination of
Y2 = a'2 X = a21X 1 + a22X 2 + + a2p X p maximum variance, i.e., we wish to solve the nonlinear
optimization problem
Yp = a'p X = a p1X 1 + a p2X 2 + + a pp X p
source of max a'1Σa1
nonlinearity a1 restrict to
It is easy to show that coefficient vectors
st a'1a1 = 1 of unit length

Var ( Yi ) = a'iΣai, i = 1, ,p
Cov ( Yi, Yk ) = a Σak, i, k = 1,
'
i ,p

@tkaranjah 2023 25 @tkaranjah 2023 26

The second principal component is the linear combination of The third principal component is the solution to the
maximum variance that is uncorrelated with the first nonlinear optimization problem
principal component, i.e., we wish to solve the nonlinear
optimization problem
max a'3Σa3
a3
max a'2Σa2 st a'3a3 = 1
a2
restricts restricts
st '
a a2 = 1
2 covariance a'1Σa3 = 0 covariances
to zero to zero
a'1Σa2 = 0 a'2Σa3 = 0

@tkaranjah 2023 27 @tkaranjah 2023 28

Generally, the ith principal component is the linear


combination of maximum variance that is uncorrelated with We can show that, for random vector X with covariance matrix 
all previous principal components, i.e., we wish to solve the and eigen-values 1  2    p the ith principal component is
nonlinear optimization problem given by

Yi = e'i X = e'i1X 1 + e'i2X 2 + + e'ip X p, i = 1, ,p


'
max a Σai i
ai

st a'iai = 1
a'k Σai = 0 k < i

@tkaranjah 2023 29 @tkaranjah 2023 30

5
01/03/2023

We can assess how well a subset of the principal


We can also show for random vector X with covariance matrix  components Yi summarizes the original random
and eigenvalue-eigenvector pairs (1 , e1), …, (p , ep) where 1  variables Xi – one common method of doing so is
 2     p,
λk proportion of total population variance
p
due to the kth principal component
σ11 + + σpp =
p

 Var ( X )
i = λ1 + + λp =
p

 Var (Y )
i
λ
i=1
i

i=1 i=1

@tkaranjah 2023 31 @tkaranjah 2023 32

We can also easily find the correlations between the original


random variables Xk and the principal components Yi:
If a large proportion of the total population
variance can be attributed to relatively few
principal components, we can replace the original eik λi
p variables with these principal components ρYi,Xk =
without loss of much information. σkk

These values are often used in interpreting the principal


components Yi.

@tkaranjah 2023 33 @tkaranjah 2023 34

Example: Suppose we have the following population of four First we need the covariance matrix :
observations made on three random variables X1, X2, and X3:
2.00 3.33 1.33
 
Σ = 3.33 8.00 4.67
X1 X2 X3 1.33 4.67 7.00
1.0 6.0 9.0
4.0 12.0 10.0 and the corresponding eigenvalue-eigenvector pairs:
3.0 12.0 15.0
4.0 10.0 12.0

Find the three population principal components Y1, Y2, and


Y3:
@tkaranjah 2023 35 @tkaranjah 2023 36

6
01/03/2023

so the principal components are:


 0.2910381
  Y1 = e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3
λ1 = 13.219396, e1 =  0.7342493
 0.6133309 Y2 = e'2X = - 0.4150386X 1 - 0.4807165X 2 + 0.7724340X 3
 - 0.4150386 Y3 = e'3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3
 
λ2 = 2.5344988, e2 =  - 0.4807165
 0.7724343  and
 0.8619976 σ11 + σ22 + σ33 = 2.0 + 8.0 + 7.0 = 17.0
 
λ3 = 0.3009542, e3 = -0.4793638
= 13.219396+3.3793317+0.4012723
 0.1648350
= λ1 + λ 2 + λ 3

@tkaranjah 2023 37 @tkaranjah 2023 38

and the proportion of total population variance due to the each Next we obtain the correlations between the original random
principal component is variables Xi and the principal components Yi:

λ1 13.219396 e11 λ1 0.2910381 13.219396


p = = 0.777611529 ρY1,X1 = = = 0.529085
17.0
λ
i=1
i
σ11 2.00

λ2 3.3793317 e21 λ1 0.7342493 13.219396


p
= = 0.198784220 ρY1,X2 = = = 0.333702
17.0 σ22 8.00
λ
i=1
i

λ3 0.4012723 e31 λ1 0.6133309 13.219396


p
= = 0.023604251 ρY1,X3 = = = 0.318568
17.0
λ
i=1
i
σ33 7.00

@tkaranjah 2023 39 @tkaranjah 2023 40

e12 λ2 -0.4150386 3.3793317 e13 λ3


ρY2,X1 = = = -0.381482 0.8619976 0.4012723
σ11 2.00 ρY3,X1 = = = 0.273021
σ11 2.00

e22 λ2 -0.4807165 3.3793317 e23 λ3 -0.4793640 0.4012723


ρY2,X2 = = = -0.110462 ρY3,X2 = = = -0.037957
σ22 8.00 σ22 8.00

e32 λ2 0.7724340 3.3793317 e33 λ3 0.1648350 0.4012723


ρY2,X3 = = = 0.202852 ρY3,X3 = = = 0.014917
σ33 7.00 σ33 7.00

@tkaranjah 2023 41 @tkaranjah 2023 42

7
01/03/2023

We can display these results in a correlation matrix: We could standardize the variables X1, X2, and X3, then work with
the resulting covariance matrix , but it is much easier to proceed
X1 X2 X3 directly with correlation matrix :
Y1 0.5290853 0.3337024 0.3185683
Y2 -0.3814819 -0.1104624 0.2028517
1.000 0.833 0.356 
Y3 0.2730207 -0.0379573 0.0149166  
ρ = 0.833 1.000 0.624 
0.356 0.624 1.000 
- the first principal component (Y1) is a mixture of all three
random variables (X1, X2, and X3)
and the corresponding eigenvalue-eigenvector pairs:
- the second principal component (Y2) is a trade-off between X1
and X3

- the third principal component (Y3) is a residual of X1


@tkaranjah 2023 43 @tkaranjah 2023 44

0.58437383 the principal components are:


 
λ1 = 2.2149347, e1 = 0.63457754
0.50578527 Y1 = e'1Z = 0.5843738Z1 + 0.6345775Z2 + 0.5057853Z3
-0.5449250 Y2 = e'2Z = -0.5449250Z1 - 0.1549791Z2 + 0.8240377Z3
 
λ2 = 0.6226418, e2 = -0.1549791 These results differ
from the covariance- Y3 = e'3Z = 0.6013018Z1 - 0.7571610Z2 + 0.2552315Z3
 0.8240377
based principal
 0.6013018 components!
  and
λ3 = 0.1624235, e3 = -0.7571610
 0.2552315 σ11 + σ22 + σ33 = 1.0 + 1.0 + 1.0 = 3.0
= 2.2149347 + 0.6226418 + 0.1624235
= λ1 + λ 2 + λ 3

@tkaranjah 2023 45 @tkaranjah 2023 46

Next we obtain the correlations between the original random


and the proportion of total population variance due to the each variables Xi and the principal components Yi:
principal component is
λ1 2.2149347
p
= = 0.738311567 ρY1,Z1 = e11 λ1 = 0.58437383 2.2149347 = 0.869703464
3.0
λ
i=1
i

ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907


λ2 0.6226418
p
= = 0.207547267
3
λ
i=1
i ρY1,Z3 = e31 λ1 = 0.5057853 2.2149347 = 0.752742749

λ3 0.1624235
p
= = 0.054141167 ρY2,Z1 = e12 λ2 = -0.5449250 0.6226418 = -0.429987538
3
λ
i=1
i

@tkaranjah 2023 47 @tkaranjah 2023 48

8
01/03/2023

We can display these results in a correlation matrix:


ρY2,Z2 = e22 λ2 = -0.1549791 0.6226418 = -0.122290294
Z1 Z2 Z3
ρY2,Z3 = e32 λ2 = 0.8240377 0.6226418 = 0.650228824 Y1 0.8697035 0.944420 0.7527427
Y2 -0.4299875 -0.122290 0.6502288
Y3 0.2423354 -0.305150 0.1028629
ρY3,Z1 = e13 λ3 = 0.6013018 0.1624235 = 0.242335443
the first principal component (Y1) is a mixture of all three random
variables (Z1, Z2, and Z3)
ρY3,Z2 = e23 λ3 = -0.7571610 0.1624235 = -0.305149504
- the second principal component (Y2) is a trade-off between Z1
and Z3
ρY3,Z3 = e33 λ3 = 0.2552315 0.1624235 = 0.102862886
- the third principal component (Y3) is a trade-off between Z1
and Z2
@tkaranjah 2023 49 @tkaranjah 2023 50

Interpretation of components:
Assess the weight of each of the original variables in the component;
• If you multiply one variable by a scalar you get different for example, suppose we have five original variables and the first
results if using a covariance matrix principal component is
Y1 = 0.897 X1 + 0.023X 2 − 0.768 X 3 + 0.169 X 4 − 0.324 X 5
• However the correlation matrix is invariant to scale
Then X1 and X 3 have the highest weights and are the most
important variables in the component.
• PCA should be applied on data that have approximately the
Another of assessing the relationship is through the correlation of the
same scale in each variable
original variable X j and the component Yi ;

aij i aij i
rij = =
i  jj  jj
@tkaranjah 2023 51 @tkaranjah 2023 52

How many components to keep? Régression


The following data gives the measurement in centimeters of
▪Select components that have a cumulative variance >70% four components of a certain plant for a sample of size twelve.

▪ Keep components with eigenvalues >1

@tkaranjah 2023 53 @tkaranjah 2023 54

9
01/03/2023

END OF LECTURE 6

8:08:43 AM

@tkaranjah 2023 55

10

You might also like