MV - Canonical Correlation (Final)

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

Correlation

The sample covariance matrix:


 s11 s12  s1 p 
s s11  s2 p 

S
12

p p     
 
 s1 p s2 p  s pp 

where
1 n
sik  
n  1 j 1
 xij  xi  xkj  xk 
The sample correlation matrix:
1 r12  r1 p 
r 1 
 r2 p 
 12
R
p p

   
 
 r1 p r2 p  1 

where n

sik
x
j 1
ij  xi  xkj  xk 
rik  
sii skk n n

x  xi  x  xk 
2 2
ij kj
j 1 j 1
Note:
1 1
R  D SD
where
 s11 0  0 
 
 0 s22  0 
D 
    
p p

 0 0  s pp 

Tests for Independence
and
Non-zero correlation
Tests for Independence

Test for zero correlation (Independence between a two


variables)
 n  2
The test statistic t  rij
1  rij2

If independence is true then the test statistic t will have a t -


distributions with  = n –2 degrees of freedom.

The test is to reject independence if:


t  t n/22 
Test for non-zero correlation (H0: 

1  1  r  1  1  0 
ln    ln  
The test statistic 2  1  r  2  1  0 
z
1
n3
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if:
z  z / 2
Partial Correlation

Conditional Independence
Recall

  x1  q
If x     has p-variate Normal distribution
 x2  p  q

  1  q
with mean vector     
 2  p  q
 11 12 
and Covariance matrix   

12  22 

 
Then the conditional distribution of xi given x j is qi-variate
Normal distribution
  1  
with mean vector i  j = i  ij  jj  x j   j 
and Covariance matrix ii  j  ii -  ij  jj1ij
 11
The matrix  2 1   22  12 1
12
is called the matrix of partial variances and covariances.
The i, j 
th
element of the matrix  2 1
 ij 1,2....q
is called the partial covariance (variance if i = j)
between xi and xj given x1, … , xq.

 ij 1,2....q
ij 1,2....q 
 ii 1,2....q jj 1,2....q
is called the partial correlation between xi and xj given
x1, … , xq.
Let
 S11 S12 
S
 S12 S 22 

denote the sample Covariance matrix


Let S 2 1  S 22 - S12 S111S12

The i, j 
th
element of the matrix S 2 1
sij 1,2....q
is called the sample partial covariance (variance if i = j)
between xi and xj given x1, … , xq.
Also

sij 1,2....q
rij 1,2....q 
sii 1,2....q s jj 1,2....q

is called the sample partial correlation between xi and xj


given x1, … , xq.
Test for zero partial correlation correlation (Conditional
independence between a two variables given a set of p
Independent variables)

t  rij . x1 ,, x p
n  p  2
The test statistic
1  rij2. x1 ,,x p

rij . x1 ,, x p = the partial correlation between y and y given x ,


i j 1
…, xp.
If independence is true then the test statistic t will have a t -
distributions with  = n – p - 2 degrees of freedom.

The test is to reject independence if:


t  t n/2 p  2
Test for non-zero partial correlation

H 0 : ij . x1 ,, x p  ij0. x1 ,, x p

The test statistic

1  1  r 0
ij . x1 ,, x p
 1  1  ij0. x1 ,, x p 
ln    ln  
2  1  rij0. x1 ,, x p  2  1  ij0. x ,, x 
z    1 p 
1
n p 3
If H0 is true the test statistic z will have approximately a
Standard Normal distribution
We then reject H0 if: z  z / 2
The Multiple Correlation
Coefficient
Testing independence between a single
variable and a group of variables
Definition

  y 1 has (p +1)-variate Normal distribution


Suppose x    
 x1  p
  y  1
with mean vector 
 1  p

 yy  1y 
and Covariance matrix     

 1y  11 


We are interested if the variable y is independent of the vector x1

The multiple correlation coefficient is the maximum


correlation between
 y and a linear combination of the
components of x1
Derivation

u   y  1 0   y  
Let        =        Ax
 v   ax1  0 a  x1 
This vector has a bivariate Normal distribution
with mean vector   y 
A     
 a1   
  yy  1y a 
and Covariance matrix AA      
a 1y a11a 

We are interested if the variable y is independent of the vector x1

The multiple correlation coefficient is the maximum


correlation between
 y and a linear combination of the
components of x1
The multiple correlation coefficient
 is the maximum
correlation between y and ax1

The correlation between y and ax1
 
  1y a
 a    
 yy a11a
 
Thus we want to choose a to maximize   a 

Equivalently
 2   
2 
 a  
 1y a 
 1 a 1 y 1y a
    
 yy  a11a   yy  a11a 
     
Note:  d  a 1 y 1y a     d  a11a      
 
da
   a11a    
da
  a 1 y 1y a 
d  2 a  1    
   2
da  yy  a11a 
        
1 2  1 y 1y a   a11a   211a  a 1 y 1y a 
  2
 yy  a11a 
        
 1y a 2  a 11a  1 y  211a  a 1 y 
    0
 a11a 
2
 yy  

    
or  a 11a  1 y  11a  a 1 y 

 
  a11a  1 
or aopt    11 1 y  k 11 1 y 1 

 a1y 
The multiple correlation coefficient is independent of
the value of k.
 
  1y aopt
 y x1 ,, xn    aopt    
 yy aopt  11aopt
 1 


 
 1 y k 11 1 y 
  
 yy  1
k  11
11 1 y  k
1

11 1 y 
 1   1 
 1y 11 1 y  1y 11 1 y
  1  
 yy  1 y 11 1 y  yy

We are interested
 if the variable y is independent of the vector x1

if  1 y  0
 1 
 1y 11 1 y
and  y x1 ,, xn  0
 yy

The sample Multiple correlation coefficient



 s yy s1y 
Let S     denote the sample covariance matrix.
 s1 y S11 

Then the sample Multiple correlation coefficient is


 1 
s1y S11 s1 y
ry x1 ,, xn 
s yy

Testing for independence between y and x1
2
The test statistic r
n  p  1 y x1 ,, xn
F
p 1  ry2x1 ,, xn
 1 
n  p  1 s1y S11 s1 y
  1 
p s yy  s1y S11 s1 y

If independence is true then the test statistic F will have an F-


distributions with 1 = p degrees of freedom in the numerator
and 1 = n – p + 1 degrees of freedom in the denominator

The test is to reject independence if:

F  F  p, n  p  1
Canonical Correlation Analysis
The problem
Quite often when one has collected data on several
variables.
The variables are grouped into two (or more) sets
of variables and the researcher is interested in
whether one set of variables is independent of the
other set.
In addition if it is found that the two sets of variates are
dependent, it is then important to describe and
understand the nature of this dependence.
The appropriate statistical procedure in this case is
called Canonical Correlation Analysis.
Canonical Correlation: An Example

In the following study the researcher was interested in


whether specific instructions on how to relax when
taking tests and how to increase Motivation , would
affect performance on standardized achievement tests

• Reading,
• Language and
• Mathematics
A group of 65 third- and fourth-grade students were
rated after the instruction and immediately prior
taking the Scholastic Achievement tests on:

• how relaxed they were (X1) and


• how motivated they were (X2).
In addition data was collected on the three
achievement tests
• Reading (Y1),
• Language (Y2) and
• Mathematics (Y3).
The data were tabulated on the next page
Relaxation Motivation Reading Language Math Relaxation Motivation Reading Language Math
Case X1 X2 Y1 Y2 Y3 Case X1 X2 Y1 Y2 Y3
1 7 14 311 436 154 34 40 20 362 416 107
2 43 25 501 455 765 35 40 18 596 592 622
3 32 21 507 473 702 36 35 17 431 346 493
4 17 12 453 392 401 37 33 17 361 414 404
5 23 12 419 337 284 38 40 27 663 451 651
6 10 16 545 538 414 39 31 15 569 462 398
7 22 21 509 512 491 40 29 19 699 622 478
8 13 19 320 308 517 41 37 16 187 223 221
9 31 21 357 296 496 42 21 23 1132 839 1044
10 24 26 485 372 685 43 24 15 457 410 400
11 26 21 811 748 902 44 19 14 413 448 520
12 35 20 367 436 393 45 33 22 569 605 615
13 24 17 242 349 137 46 19 19 650 685 440
14 20 8 237 140 331 47 26 22 424 427 482
15 38 27 417 648 618 48 20 15 475 604 742
16 32 19 429 446 458 49 22 21 519 612 446
17 14 11 555 579 438 50 37 22 338 463 327
18 24 12 599 497 414 51 41 28 674 613 534
19 38 25 403 383 606 52 29 35 381 624 565
20 30 8 550 324 674 53 25 12 199 171 316
21 22 25 377 496 242 54 27 21 577 523 699
22 36 28 671 585 710 55 22 20 425 466 402
23 3 22 498 488 481 56 4 11 392 192 354
24 44 28 477 583 260 57 27 22 401 520 558
25 24 25 609 413 670 58 28 23 321 410 460
26 33 18 521 522 716 59 33 20 682 433 743
27 24 21 495 645 491 60 33 24 719 727 1052
28 28 20 400 555 624 61 31 33 672 705 650
29 34 7 258 175 276 62 20 11 366 309 537
30 39 20 466 541 348 63 26 25 581 558 386
31 7 19 709 757 589 64 23 10 681 530 581
32 13 17 586 472 492 65 30 22 1019 917 880
33 32 18 418 361 428
Definition: (Canonical variates and Canonical correlations)

  x1  q have p-variate Normal distribution
Let x    
 x2  p  q
  11 12 
  1  q
with      and    
  
 2  p  q  12 22 

Let  1 1



U1  a1 x1  a1 x1    aq xq
 1 1
and V1  b1 x2  b1 xq 1    b p  q x p
be such that U1 and V1 have achieved the maximum correlation 1.

Then U1 and V1 are called the first pair of canonical variates and
1 is called the first canonical correlation coefficient.
derivation: ( 1st pair of Canonical variates and Canonical correlation)
1 1 
U1   a1 x1    aq xq   a1x1 
Now  V    1     
1
 1  b1 xq 1    bp  q x p  b1x2 
  
 a1 0  x1  
         Ax
 0 b1   x2 
U1 
Thus   has covariance matrix
 V1 
   
 a1 0  11 12   a1 0 
AA '          
 0 b1  12  22   0 b1 
 
   
 a111a1 a112b1 
     
 b112 a1 b1 22b1 
derivation: ( 1st pair of Canonical variates and Canonical correlation)
1 1 
U1   a1 x1    aq xq   a1x1 
Now  V    1     
1
 1  b1 xq 1    bp  q x p  b1x2 
  
 a1 0  x1  
         Ax
 0 b1   x2 
U1 
Thus   has covariance matrix
 V1 
       
 a1 0   11 12   a1 0   a111a1 a112b1 
AA '                
 0 b1  12  22   0 b1  b a b b 
   1 12 1 1 22 1 
 
a112b1
hence U1V1     
a111a1 b1 22b1
 
Thus we want to choose a1 and b1
so that
 
a112b1
U1V1      is at a maximum
a111a1 b1 22b1
or   2
 2
a112b1
    
 is at a maximum
U1V 1

 a111a1  b1 22b1 
Let
  2

a112b1
V    


 a111a1  b1 22b1 
Computing derivatives
       2 
V 1     
2 a112b1 12b1  a111a1   a112b1 211a1 
      0
a1 
b1 22b1   a111a1 
2

     

12b1  a111a1   a112b1 11a1
  
and
       2 
V
  
1 
2 a112b1 12    
 a1 b1 22b1  a112b1 2 22b1
0

 
b1  a111a1   
2
b1 22b1
 
       b b 1 
  
 a1 b1 22b  a112b1  22b1
12   a1
or b1  1 22   2212
a112b
  2
Thus 1

  a112b1
12  22 12 a1      11a1
 
 
b1 22b11  a111a1 
  2
1 1   a112b1
 a1      a1  ka1
11 12  2212
  
 
b1 22b11  a111a1 

This shows that a1 is an eigenvector of 11
1 1
12  22 
12
  2

a112b1
k       U21V 1


 a111a1  b1 22b1 
Thus U21V 1 is maximized when k is the largest eigenvalue of

1 1
 and a1 is the eigenvector associated with the
11 12  2212
largest eigenvalue.
    
 b b 1  a112b1 
Also  a1
b1  1 22   2212 or   
 22b1  12 a1
a112b b1 22b1
  2
and 1  1


11 12  2212 a1      a1

a112b1 
 
b1 22b1  a111a1 
  2
12 11 12 2212
1 1 
 a1      12

a112b1 
 a1
 
b1 22b1  a111a1 
    2 
1 1 a112b1
 a
 11 12 22    22b1         22b1
12
 b
1 12 1  

a112b1 
b1 22b1  
b1 22b1  a111a1  b1 22b1
 
 a b 
2
 1 12 1 
   
1 1
b      b1
22 12 11 12 1
b b  a
1 22 1 1 11a1 
Summary:
The first pair of canonical variates

U1  a1x1  a11 x1    aq1 xq

V1  b1 x2  b11 xq 1    bp1 q x p

 
are found by finding a1 and b1, eigenvectors of the matrices
 11
12 1
 11
12 221 and  22112 1
12 respectively
associated with the largest eigenvalue (same for both matrices)

The largest eigenvalue of the two matrices is the square of the


first canonical correlation coefficient1
   11
the largest eigenvalue of 12 1
12 221

=  11
the largest eigenvalue of  22112 1
12
Note:  11
12 1
 11
12 221 and  22112 1
12

have exactly the same eigenvalues (same for both matrices)


Proof:

 11
Let  and a be an eigenvalue and eigenvector of 12 1 1
12 22 .
then
1  
 11
12 1
12 22 a   a.

and 1  1 
 11
 22112 1
12 22 a   22 a.
   1 
1
 11 12b  b where b   22 a
 22 12 1

 1 
Thus  and b   22 a is an eigenvalue and
 11
eigenvector of  22112 1
12 .
The remaining canonical variates and canonical
correlation coefficients

The second pair of canonical variates



U 2  a2 x1  a1 2 x1    aq2 xq

V2  b2 x2  b12  xq 1  , so
 bthat
2
pq x p
 
are found by finding a2 and b2
1. (U2,V2) are independent of (U1,V1).
2. The correlation between U2 and V2 is
maximized
The correlation, 2, between U2 and V2 is called the second
canonical correlation coefficient.
The ith pair of canonical variates

U i  aix1  a1i  x1    aqi  xq
 i  i 

Vi  bi xi  b1 xq 1    bp q x p
 
are found by finding ai and bi, so that
1. (Ui,Vi) are independent of (U1,V1), …,
(Ui-1,Vi-1).
2. The correlation between Ui and Vi is
maximized

The correlation, 2, between U2 and V2 is called the second


canonical correlation coefficient.
derivation: ( 2nd pair of Canonical variates and Canonical correlation)
  
U1   a1x1   a1 0 
Now  V   bx   0 b  x
 1    1 2  =   1   1   Ax
U 2   a2 x1   a2 0   x2 
       
 V2  b2 x2   0 b2 
has covariancematrix
 a1 0 
     
 0 b1   11 12   a1 0 a2 0 
AA           
a   22   0 b1 0 b2 
 2 0   12
 0 b         
 2  a111a1 a112b1 a111a2 a112b2 
      
 * b1 22b1 b112  a2 b1 22b2 
    
 * * a2 11a2 a2 12b2 
 *  
 * * b2  22b2 
 
a2 12b2
Now U 2V2     
a2 11a2 b2 22b2
  2
2
and maximizing U 2V2  

a2 12b2 
  

 a2 11a2  b2 22b2 
 
 
2
Is equivalent to maximizing a2 12b2
subject to
         
a2 11a2  1, b2  22b2  1, a111a2  0, a112b2  0, b112
 a2  0
 
and b1 22b2
Using the Lagrange multiplier technique
  2    
  
V  a2 12b2  1 1  a2 11a2   2 1  b2 22b2 
       
 3 a111a2  4 a112b2  5b112
 a2  6b1 22b2
     
   
2
V  a2 12b2  1 1  a2 11a2   2 1  b2 22b2
       
 3 a111a2  4 a112b2  5b112
 a2  6b1 22b2

Now
V       
 
  2 a2 12b2 12b2  2111a2  311a1  512b1  0
a2
and
V       
b2

  2 a2 12b2 12
 a2  22 22b2  412
 a1  6 22b1  0

V
also  0, i  1, 6 gives the restrictions
i
 
These equations can used to show that a1 and b1
are eigenvectors of the matrices

 11
12 1
 11
12 221 and  22112 1
12 respectively

associated with the 2nd largest eigenvalue (same for both


matrices)

The 2nd largest eigenvalue of the two matrices is the square of


the 2nd canonical correlation coefficient2

2   11
the 2nd largest eigenvalue of 12 1
12 221

=  11
the 2 nd largest eigenvalue of  22112 1
12
continuing
 
Coefficients for the ith pair of canonical variates, ai and bi
are eigenvectors of the matrices

 11
12 1
 11
12 221 and  22112 1
12 respectively

associated with the ith largest eigenvalue (same for both matrices)

The ith largest eigenvalue of the two matrices is the square of the
ith canonical correlation coefficienti

i   11
the i th largest eigenvalue of 12 1 1
12 22

=  11
the i th largest eigenvalue of  22112 1
12
Example

Variables
• relaxation Score (X1)
• motivation score (X2).
• Reading (Y1),
• Language (Y2) and
• Mathematics (Y3).
Summary Statistics
UNIVARIATE SUMMARY STATISTICS
-----------------------------

STANDARD
VARIABLE MEAN DEVIATION

1 Relax 26.87692 9.50412


2 Mot 19.41538 5.83066
3 Read 499.03077 172.25508
4 Lang 485.83077 156.08957
5 Math 512.52308 195.18614

CORRELATIONS
------------

Relax Mot Read Lang Math

1 2 3 4 5
Relax 1 1.000
Mot 2 0.391 1.000
Read 3 0.002 0.280 1.000
Lang 4 0.050 0.510 0.781 1.000
Math 5 0.127 0.340 0.713 0.556 1.000
Canonical Correlation statistics Statistics

CANONICAL NUMBER OF BARTLETT'S TEST FOR


EIGENVALUE CORRELATION EIGENVALUES REMAINING EIGENVALUES

CHI- TAIL
SQUARE D.F. PROB.

27.86 6 0.0001
0.35029 0.59186 1 1.56 2 0.4586
0.02523 0.15885

BARTLETT'S TEST ABOVE INDICATES THE NUMBER OF CANONICAL


VARIABLES NECESSARY TO EXPRESS THE DEPENDENCY BETWEEN THE
TWO SETS OF VARIABLES. THE NECESSARY NUMBER OF CANONICAL
VARIABLES IS THE SMALLEST NUMBER OF EIGENVALUES SUCH THAT
THE TEST OF THE REMAINING EIGENVALUES IS NON-SIGNIFICANT.
FOR EXAMPLE, IF A TEST AT THE .01 LEVEL WERE DESIRED,
THEN 1 VARIABLES WOULD BE CONSIDERED NECESSARY.
HOWEVER, THE NUMBER OF CANONICAL VARIABLES OF PRACTICAL
VALUE IS LIKELY TO BE SMALLER.
continued
CANONICAL VARIABLE LOADINGS
---------------------------
(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR FIRST SET OF VARIABLES

CNVRF1 CNVRF2
1 2
Relax 1 0.197 0.980
Mot 2 0.979 0.203
-----------------------------

CANONICAL VARIABLE LOADINGS


---------------------------
(CORRELATIONS OF CANONICAL VARIABLES WITH ORIGINAL VARIABLES)
FOR SECOND SET OF VARIABLES

CNVRS1 CNVRS2
1 2
Read 3 0.504 -0.361
Lang 4 0.900 -0.354
Math 5 0.565 0.391
------------------------------
Summary
U1 = 0.197 Relax + 0.979 Mot
V1 = 0.504 Read + 0.900 Lang + 0.565 Math
1 = .592

U2 = 0.980 Relax + 0.203 Mot


V2 = 0.391 Math - 0.361 Read - 0.354 Lang
2 = .159

You might also like