Multivariate Analysis Notes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Multi-Variate Analysis

Prof. Ayanendra Basu


2022

Contents
1 Introduction 2
1.1 Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Books to follow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Principal Components 3
2.1 Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Construction of Y’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5 A special case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 How to choose a ’k’ ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Contents by Lecture

Monday Tuesday Wednesday Thursday Friday


Lec 1 (1/8) Lec 2 (4/8)

1
Lecture 1, Aug, 1

1 Introduction
1.1 Topics
• Traditional Topics

Multivariate Analysis

Multivariate Normal

Wishart Distribution Hotelling’s T 2

MANOVA

Union Intersection Test

• Non-traditional Topics

Applied Multivariate Techniques

Dimension Reduction Classification

Principal Components Cluster Analysis

Factor analysis Discriminant Analysis

Canonical Correlation Analysis

1.2 Books to follow


1. T.W. Anderson : Multivariate Analysis
2. C.R. Rao : Linear Statistical Inference and its Applications
3. R. Johnson, D.W. Wichern : Applied Multivariate Statistical Analysis
4. Goldstein, Dillon
5. G.A.F. Seber : Multivariate Observations
6. Mardia, Kent, Bibby

2
7. S.F. Arnold : Linear Statistical Inference and Multivariate Analysis

2 Principal Components
Let, we have X , Σp×p (real, symmetric, p.d.).
∼ p×1
We would want to base the future analysis on k(<< p) variables.

2.1 Variability
Total variability of a dataset is defined as-
p
X
Total variability := σii , where σii is i-th diagonal of Σ
i=1
p
X
= σi2
i=1

We would like to reduce the dimension while retaining as much variability as possible.

X := (X1 , X2 , ..., Xp )T

We would like to replace this by Y1 , Y2 , ..., Yk , k << p, without losing out much on variability.

2.2 Construction of Y’s


Now we construct the Yi s sequentially.
Yi := l1T X , l1 is such that max. variability is attained.

Now, V ar(Yi ) = l1T Σl1
So we are trying to maximize

l1T Σl1
max Tl
l1 ̸=0 l1 1

2.2.1 Lemma
This is maximized when l1 is the eigenvector with the highest eigenvalue.

2.2.2 Proof
Let, λ1 ≥ λ2 ≥ ... ≥ λp = 0 are the eigenvalues of Σ and the corresponding eigenvectors would
be e1 , e2 , ..., ep .

3
l1T Σl1 l1T P ΛP T l1
=
l1T l1 l1T l1
(P T l1 )T Λ(P T l1 )
= [∵ P is orthogonal]
(P T l1 )T (P T l1 )
Y T ΛY
= T
Y Y
λi y 2
P
= P 2i
yi
≤λ1 (≥ λp )

l1T Σl1 eT1 Σe1 λ1 eT1 e1


∴ = = = λ1
l1T l1 eT1 e1 eT1 e1
So, the maximum is achieved.

2.3 Components
Y1 = eT1 X is the 1st Principal Component

V ar(Y1 ) = λ1
Principal components are defined to be uncorrelated.
Y2 = l2T X

So we would need to find:-

l2T Σl2
max Tl
, subject to Cov(Y1 , Y2 ) = 0
l1 ̸=0 l2 2

Now, Cov(Y1 , Y2 ) = Cov(eT1 X , l2T X ) = l2T Σe1


∼ ∼
So, for this to be 0, l2 ∈ vector space of {e2 , e3 , ..., ep }
So, the solution would be Y2 = eT2 X

V ar(Y2 ) = λ2
Similarly,

Yj = e T
j X, Var(Yj ) = λj , 1 ≤ j ≤ p

2.4 Note

T otal
p
X
V ar(Xi ) = σii σii
i=1
Xp
V ar(Yi ) = λi λi
i=1

If the variables are uncorrelated, i.e. Σ is diagonal then Y1 , Y2 , ..., Yp will just be a permutation
of X1 , X2 , ..., Xp in decreasing order of Variance.

4
2.5 A special case
Let, we have a bi-variate data (X1 , X2 ). We will tr to get its Principal components. We know
∼ ∼
the components will be uncorrelated. So once we get the 1st principal component, the other one
would be the perpendicular of that.
Let the data be like:-
Y1
X2

Y2

X1

Then Y1 will be the 1st principal component as most variation will be along this axes.
So basically the components will be some rotation of the rectangular axes.
This idea can be extended for higher dimension for multivariate normal.

2.6 How to choose a ’k’ ?


λi
Proportion of variability explained by i-th Principal Component := Pp
j=1 λj
Pk
λi
Proportion of variability explained by k Principal Components := Ppi=1
j=1 λj

• One of way of choosing is pre-determining a threshold for variability explained.


Pk
λ
We get k choosing the smallest k for which Ppi=1 i ≥ a where a is a pre-specified threshold.
j=1 λj

• Other way would be by using Scree Plot


We plot Ppλi against i.
j=1 λj

5
We stop and choose when you observe a major slope change.

Lecture 2, Aug, 4

You might also like