Professional Documents
Culture Documents
MV - Canonical Correlation (Final)
MV - Canonical Correlation (Final)
MV - Canonical Correlation (Final)
where
1 n
sik
n 1 j 1
xij xi xkj xk
The sample correlation matrix:
1 r12 r1 p
r 1
r2 p
12
R
p p
r1 p r2 p 1
where n
sik
x
j 1
ij xi xkj xk
rik
sii skk n n
x xi x xk
2 2
ij kj
j 1 j 1
Note:
1 1
R D SD
where
s11 0 0
0 s22 0
D
p p
0 0 s pp
Tests for Independence
and
Non-zero correlation
Tests for Independence
1 1 r 1 1 0
ln ln
The test statistic 2 1 r 2 1 0
z
1
n3
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if:
z z / 2
Partial Correlation
Conditional Independence
Recall
x1 q
If x has p-variate Normal distribution
x2 p q
1 q
with mean vector
2 p q
11 12
and Covariance matrix
12 22
Then the conditional distribution of xi given x j is qi-variate
Normal distribution
1
with mean vector i j = i ij jj x j j
and Covariance matrix ii j ii - ij jj1ij
11
The matrix 2 1 22 12 1
12
is called the matrix of partial variances and covariances.
The i, j
th
element of the matrix 2 1
ij 1,2....q
is called the partial covariance (variance if i = j)
between xi and xj given x1, … , xq.
ij 1,2....q
ij 1,2....q
ii 1,2....q jj 1,2....q
is called the partial correlation between xi and xj given
x1, … , xq.
Let
S11 S12
S
S12 S 22
The i, j
th
element of the matrix S 2 1
sij 1,2....q
is called the sample partial covariance (variance if i = j)
between xi and xj given x1, … , xq.
Also
sij 1,2....q
rij 1,2....q
sii 1,2....q s jj 1,2....q
t rij . x1 ,, x p
n p 2
The test statistic
1 rij2. x1 ,,x p
1 1 r 0
ij . x1 ,, x p
1 1 ij0. x1 ,, x p
ln ln
2 1 rij0. x1 ,, x p 2 1 ij0. x ,, x
z 1 p
1
n p 3
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if: z z / 2
The Multiple Correlation
Coefficient
Testing independence between a single
variable and a group of variables
Definition
We are interested if the variable y is independent of the vector x1
Equivalently
2
2
a
1y a
1 a 1 y 1y a
yy a11a yy a11a
Note: d a 1 y 1y a d a11a
da
a11a
da
a 1 y 1y a
d 2 a 1
2
da yy a11a
1 2 1 y 1y a a11a 211a a 1 y 1y a
2
yy a11a
1y a 2 a 11a 1 y 211a a 1 y
0
a11a
2
yy
or a 11a 1 y 11a a 1 y
a11a 1
or aopt 11 1 y k 11 1 y 1
a1y
The multiple correlation coefficient is independent of
the value of k.
1y aopt
y x1 ,, xn aopt
yy aopt 11aopt
1
1 y k 11 1 y
yy 1
k 11
11 1 y k
1
11 1 y
1 1
1y 11 1 y 1y 11 1 y
1
yy 1 y 11 1 y yy
We are interested
if the variable y is independent of the vector x1
if 1 y 0
1
1y 11 1 y
and y x1 ,, xn 0
yy
F F p, n p 1
Canonical Correlation Analysis
The problem
Quite often when one has collected data on several
variables.
The variables are grouped into two (or more) sets
of variables and the researcher is interested in
whether one set of variables is independent of the
other set.
In addition if it is found that the two sets of variates are
dependent, it is then important to describe and
understand the nature of this dependence.
The appropriate statistical procedure in this case is
called Canonical Correlation Analysis.
Canonical Correlation: An Example
• Reading,
• Language and
• Mathematics
A group of 65 third- and fourth-grade students were
rated after the instruction and immediately prior
taking the Scholastic Achievement tests on:
Then U1 and V1 are called the first pair of canonical variates and
1 is called the first canonical correlation coefficient.
derivation: ( 1st pair of Canonical variates and Canonical correlation)
1 1
U1 a1 x1 aq xq a1x1
Now V 1
1
1 b1 xq 1 bp q x p b1x2
a1 0 x1
Ax
0 b1 x2
U1
Thus has covariance matrix
V1
a1 0 11 12 a1 0
AA '
0 b1 12 22 0 b1
a111a1 a112b1
b112 a1 b1 22b1
derivation: ( 1st pair of Canonical variates and Canonical correlation)
1 1
U1 a1 x1 aq xq a1x1
Now V 1
1
1 b1 xq 1 bp q x p b1x2
a1 0 x1
Ax
0 b1 x2
U1
Thus has covariance matrix
V1
a1 0 11 12 a1 0 a111a1 a112b1
AA '
0 b1 12 22 0 b1 b a b b
1 12 1 1 22 1
a112b1
hence U1V1
a111a1 b1 22b1
Thus we want to choose a1 and b1
so that
a112b1
U1V1 is at a maximum
a111a1 b1 22b1
or 2
2
a112b1
is at a maximum
U1V 1
a111a1 b1 22b1
Let
2
a112b1
V
a111a1 b1 22b1
Computing derivatives
2
V 1
2 a112b1 12b1 a111a1 a112b1 211a1
0
a1
b1 22b1 a111a1
2
12b1 a111a1 a112b1 11a1
and
2
V
1
2 a112b1 12
a1 b1 22b1 a112b1 2 22b1
0
b1 a111a1
2
b1 22b1
b b 1
a1 b1 22b a112b1 22b1
12 a1
or b1 1 22 2212
a112b
2
Thus 1
a112b1
12 22 12 a1 11a1
b1 22b11 a111a1
2
1 1 a112b1
a1 a1 ka1
11 12 2212
b1 22b11 a111a1
This shows that a1 is an eigenvector of 11
1 1
12 22
12
2
a112b1
k U21V 1
a111a1 b1 22b1
Thus U21V 1 is maximized when k is the largest eigenvalue of
1 1
and a1 is the eigenvector associated with the
11 12 2212
largest eigenvalue.
b b 1 a112b1
Also a1
b1 1 22 2212 or
22b1 12 a1
a112b b1 22b1
2
and 1 1
11 12 2212 a1 a1
a112b1
b1 22b1 a111a1
2
12 11 12 2212
1 1
a1 12
a112b1
a1
b1 22b1 a111a1
2
1 1 a112b1
a
11 12 22 22b1 22b1
12
b
1 12 1
a112b1
b1 22b1
b1 22b1 a111a1 b1 22b1
a b
2
1 12 1
1 1
b b1
22 12 11 12 1
b b a
1 22 1 1 11a1
Summary:
The first pair of canonical variates
U1 a1x1 a11 x1 aq1 xq
V1 b1 x2 b11 xq 1 bp1 q x p
are found by finding a1 and b1, eigenvectors of the matrices
11
12 1
11
12 221 and 22112 1
12 respectively
associated with the largest eigenvalue (same for both matrices)
= 11
the largest eigenvalue of 22112 1
12
Note: 11
12 1
11
12 221 and 22112 1
12
and 1 1
11
22112 1
12 22 a 22 a.
1
1
11 12b b where b 22 a
22 12 1
1
Thus and b 22 a is an eigenvalue and
11
eigenvector of 22112 1
12 .
The remaining canonical variates and canonical
correlation coefficients
Now
V
2 a2 12b2 12b2 2111a2 311a1 512b1 0
a2
and
V
b2
2 a2 12b2 12
a2 22 22b2 412
a1 6 22b1 0
V
also 0, i 1, 6 gives the restrictions
i
These equations can used to show that a1 and b1
are eigenvectors of the matrices
11
12 1
11
12 221 and 22112 1
12 respectively
2 11
the 2nd largest eigenvalue of 12 1
12 221
= 11
the 2 nd largest eigenvalue of 22112 1
12
continuing
Coefficients for the ith pair of canonical variates, ai and bi
are eigenvectors of the matrices
11
12 1
11
12 221 and 22112 1
12 respectively
associated with the ith largest eigenvalue (same for both matrices)
The ith largest eigenvalue of the two matrices is the square of the
ith canonical correlation coefficienti
i 11
the i th largest eigenvalue of 12 1 1
12 22
= 11
the i th largest eigenvalue of 22112 1
12
Example
Variables
• relaxation Score (X1)
• motivation score (X2).
• Reading (Y1),
• Language (Y2) and
• Mathematics (Y3).
Summary Statistics
UNIVARIATE SUMMARY STATISTICS
-----------------------------
STANDARD
VARIABLE MEAN DEVIATION
CORRELATIONS
------------
1 2 3 4 5
Relax 1 1.000
Mot 2 0.391 1.000
Read 3 0.002 0.280 1.000
Lang 4 0.050 0.510 0.781 1.000
Math 5 0.127 0.340 0.713 0.556 1.000
Canonical Correlation statistics Statistics
CHI- TAIL
SQUARE D.F. PROB.
27.86 6 0.0001
0.35029 0.59186 1 1.56 2 0.4586
0.02523 0.15885
CNVRF1 CNVRF2
1 2
Relax 1 0.197 0.980
Mot 2 0.979 0.203
-----------------------------
CNVRS1 CNVRS2
1 2
Read 3 0.504 -0.361
Lang 4 0.900 -0.354
Math 5 0.565 0.391
------------------------------
Summary
U1 = 0.197 Relax + 0.979 Mot
V1 = 0.504 Read + 0.900 Lang + 0.565 Math
1 = .592