Multicolline PDF

Multicollinearity in Regression
Principal Components Analysis

Standing Heights and Physical Stature
Attributes Among Female Police
Officer Applicants
S.Q. Lafi and J.B. Kaneene (1992). An Explanation of the Use of Principal Components
Analysis to Detect and Correct for Multicollinearity, Preventive Veterinary Medicine,
Vol. 13, pp. 261-275
Data Description
Subjects: 33 Females applying for police officer
positions
Dependent Variable: Y Standing Height (cm)
Independent Variables:
X1 Si ng Height (cm)
X2 Upper Arm Length (cm)
X3 Forearm Length (cm)
X4 Hand Length (cm)
X5 Upper Leg Length (cm)
X6 Lower Leg Length (cm)
X7 Foot Length (inches)
X8 BRACH (100X3/X2)
X9 TIBIO (100X6/X5)
Data
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Y
165.8
169.8
170.7
170.9
157.5
165.9
158.7
166.0
158.7
161.5
167.3
167.4
159.2
170.0
166.3
169.0
156.2
159.6
155.0
161.1
170.3
167.8
163.1
165.8
175.4
159.8
166.0
161.2
160.4
164.3
165.5
167.2
167.2
X1
88.7
90.0
87.7
87.1
81.3
88.2
86.1
88.7
83.7
81.2
88.6
83.2
81.5
87.9
88.3
85.6
81.6
86.6
82.0
84.1
88.1
83.9
88.1
87.0
89.6
85.6
84.9
84.1
84.3
85.0
82.6
85.0
83.4
X2
31.8
32.4
33.6
31.0
32.1
31.8
30.6
30.2
31.1
32.3
34.8
34.3
31.0
34.2
30.6
32.6
31.0
32.7
30.3
29.5
34.0
32.5
31.7
33.2
35.2
31.5
30.5
32.8
30.5
35.0
36.2
33.6
33.5
X3
28.1
29.1
29.5
28.2
27.3
29.0
27.8
26.9
27.1
27.8
27.3
30.1
27.3
30.9
28.8
28.8
25.6
25.4
26.6
26.6
29.3
28.6
26.9
26.3
30.1
27.1
28.1
29.2
27.8
27.8
28.6
27.1
29.7
X4
18.7
18.3
20.7
18.6
17.5
18.6
18.4
17.5
18.1
19.1
18.3
19.2
17.5
19.4
18.3
19.1
17.0
17.7
17.3
17.8
18.2
20.2
18.1
19.5
19.1
19.2
17.8
18.4
16.8
19.0
20.2
19.8
19.4
X5
40.3
43.3
43.7
43.7
38.1
42.0
40.0
41.6
38.9
42.8
43.1
43.4
39.8
43.1
41.8
42.7
44.2
42.0
37.9
38.6
43.2
43.3
40.1
43.2
45.1
42.3
41.2
42.6
41.0
47.2
45.0
46.0
45.2
X6
38.9
42.7
41.1
40.6
39.6
40.6
37.0
39.0
37.5
40.1
41.8
42.2
39.6
43.7
41.0
42.0
39.0
37.5
36.1
38.2
41.4
42.9
39.0
40.7
44.5
39.0
43.0
41.1
39.8
42.4
42.3
41.6
44.0
X7
6.7
6.4
7.2
6.7
6.6
6.5
5.9
5.9
6.1
6.2
7.3
6.8
4.9
6.3
5.9
6.0
5.1
5.0
5.2
5.9
5.9
7.2
5.9
5.9
6.3
5.7
6.1
5.9
6.0
5.0
5.6
5.6
5.2
X8
88.4
89.8
87.8
91.0
85.0
91.2
90.8
89.1
87.1
86.1
78.4
87.8
88.1
90.4
94.1
88.3
82.6
77.7
87.8
90.2
86.2
88.0
84.9
79.2
85.5
86.0
92.1
89.0
91.1
79.4
79.0
80.7
88.7
X9
96.5
98.6
94.1
92.9
103.9
96.7
92.5
93.8
96.4
93.7
97.0
97.2
99.5
101.4
98.1
98.4
88.2
89.3
95.3
99.0
95.8
99.1
97.3
94.2
98.7
92.2
104.4
96.5
97.1
89.8
94.0
90.4
97.3
Standardizing the Predictors

*
ij
X ij X j
X ij X
X ij X j
(n 1) S
i 1,...,33;
2
j
j 1,...,9
i 1
X 11*
*
X 21
*
X

*
X 33,1
X 12*
*
X 22
*
X 33,2
X 19*
*
X 29

*
X 33,9
X
rjk
ij
Xj
ik
Xk
1
r
X*'X* R 21
r91
i 1
X
i 1
ij
Xj
2 n
X
i 1
ik
Xk
r12 r19
1 r29

r92 1
Correlations Matrix of Predictors and

Inverse
R
1.0000
0.1441
0.2791
0.1483
0.1863
0.2264
0.3680
0.1147
0.0212
0.1441
1.0000
0.4708
0.6452
0.7160
0.6616
0.1468
-0.5820
-0.0984
0.2791
0.4708
1.0000
0.5050
0.3658
0.7284
0.4277
0.4420
0.4406
0.1483
0.6452
0.5050
1.0000
0.6007
0.5500
0.3471
-0.1911
-0.0988
0.1863
0.7160
0.3658
0.6007
1.0000
0.7150
-0.0298
-0.3882
-0.4099
0.2264
0.6616
0.7284
0.5500
0.7150
1.0000
0.2821
0.0026
0.3434
R^(-1)
1.52
-3.48
3.15
0.41
13.15
-13.28
-0.62
-3.41
10.21
-3.48
436.47
-390.31
-1.26
-83.83
77.01
1.18
425.55
-62.66
3.15
-390.31
353.99
-0.07
91.67
-87.90
-1.25
-382.59
68.23
0.41
-1.26
-0.07
2.46
4.89
-5.40
-0.81
-0.49
4.57
13.15
-83.83
91.67
4.89
817.17
-807.75
-2.21
-76.90
603.81
-13.28
77.01
-87.90
-5.40
-807.75
801.94
2.65
71.74
-597.88
0.3680
0.1468
0.4277
0.3471
-0.0298
0.2821
1.0000
0.2445
0.3971
-0.62
1.18
-1.25
-0.81
-2.21
2.65
1.77
1.12
-2.49
0.1147
-0.5820
0.4420
-0.1911
-0.3882
0.0026
0.2445
1.0000
0.5082
-3.41
425.55
-382.59
-0.49
-76.90
71.74
1.12
417.39
-58.24
0.0212
-0.0984
0.4406
-0.0988
-0.4099
0.3434
0.3971
0.5082
1.0000
10.21
-62.66
68.23
4.57
603.81
-597.88
-2.49
-58.24
448.37
Variance Inflation Factors (VIFs)

VIF measures the extent that a regression coefficients
variance is inflated due to correlations among the set
of predictors
VIFj = 1/(1-Rj2) where Rj2 is the coefficient of multiple
determination when Xj is regressed on the remaining
predictors.
Values > 10 are often considered to be problematic
VIFs can be obtained as the diagonal elements of R-1
VIFs
X1
1.52
X2
436.47
X3
353.99
X4
2.46
X5
817.17
X6
801.94
X7
1.77
X8
417.39
X9
448.37
Not surprisingly, X2, X3, X5, X6, X8, and X9 are problems (see definitions of X8 and X9)
Regression of Y on [1|X*]
E Yi 0 1 X i*1 9 X i*9
E Y 0 1 X*
Regression Statistics
Multiple R
0.944825
R Square
0.892694
Adjusted R Square 0.850704
Standard Error
1.890412
Observations
33
ANOVA
df
Regression
Residual
Total
Intercept
X1*
X2*
X3*
X4*
X5*
X6*
X7*
X8*
X9*
SS
9 683.7823
23 82.1941
32 765.9764
MS
75.9758
3.5737
F Significance F
21.2600
0.0000
Coefficients
Standard Error t Stat
P-value Lower 95%Upper 95%
164.5636
0.3291 500.0743
0.0000 163.8829 165.2444
11.8900
2.3307
5.1015
0.0000
7.0686 16.7114
4.2752 39.4941
0.1082
0.9147 -77.4246 85.9751
-3.2845 35.5676 -0.0923
0.9272 -76.8616 70.2927
4.2764
2.9629
1.4433
0.1624
-1.8528 10.4057
-9.8372 54.0398 -0.1820
0.8571 -121.6270 101.9525
25.5626 53.5337
0.4775
0.6375 -85.1802 136.3055
3.3805
2.5166
1.3433
0.1923
-1.8255
8.5865
6.3735 38.6215
0.1650
0.8704 -73.5211 86.2682
-9.6391 40.0289 -0.2408
0.8118 -92.4453 73.1670
Note the surprising

negative coefficients
for X3*, X5*, and X9*
Principal Components Analysis

Using Statistical or Matrix Computer Package, decompose
the p p correlation matrix R into its p eigenvalues and eigenvectors
p
*
X 'X R j v j v j ' VLV ' where j

j 1
V v1
v 2 vp
1 0
0
2
L

0 0
v j1
v
j2
j th eigenvalue and v j j th eigenvector

v jp
0
0
subject to:
j p
v j'v j 1
i 1
v j'v k 0
jk
Condition Index: j
max
j
Principal Components: W = X* V
While the columns of X* are highly correlated, the columns of W are uncorrelated
The ls represent the variance corresponding to each principal component
Police Applicants Height Data - I

V
0.1853
0.4413
0.3934
0.4182
0.4125
0.4645
0.2141
-0.0852
0.0474
0.1523
-0.2348
0.3336
-0.0813
-0.3000
0.1011
0.3577
0.5467
0.5261
0.8017
-0.0986
-0.1642
0.0284
-0.0121
-0.2518
0.3790
-0.0498
-0.3320
0.2782
-0.2312
0.2336
-0.2063
0.3508
0.1658
-0.5862
0.4536
-0.2685
-0.3707
-0.2551
0.1239
0.5765
0.0559
-0.2697
0.2139
0.3674
-0.4396
-0.2327
-0.3191
-0.3183
-0.3703
0.4669
0.3798
0.4811
0.0367
-0.1027
0.1754
-0.3973
-0.4953
0.5529
0.0250
0.2786
-0.2484
-0.0418
0.3445
-0.0005
0.5850
-0.5205
0.0009
0.1487
-0.1539
0.0009
0.5738
0.1089
0.0104
-0.1414
0.1397
0.0040
0.6106
-0.6040
-0.0022
-0.1352
0.4521
L
3.6304
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
2.4427
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1.0145
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.7656
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.6109
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3024
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.2322
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0009
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0005
Police Applicants Height Data - II

VLV'
1.0000
0.1441
0.2791
0.1483
0.1863
0.2263
0.3680
0.1147
0.0212
0.1441
1.0000
0.4708
0.6452
0.7160
0.6617
0.1468
-0.5820
-0.0985
0.2791
0.4708
1.0000
0.5051
0.3658
0.7284
0.4277
0.4420
0.4406
0.1483
0.6452
0.5051
1.0000
0.6007
0.5500
0.3471
-0.1911
-0.0988
0.1863
0.7160
0.3658
0.6007
1.0000
0.7150
-0.0298
-0.3882
-0.4098
0.2263
0.6617
0.7284
0.5500
0.7150
1.0000
0.2821
0.0026
0.3434
0.3680
0.1468
0.4277
0.3471
-0.0298
0.2821
1.0000
0.2445
0.3971
0.1147
-0.5820
0.4420
-0.1911
-0.3882
0.0026
0.2445
1.0000
0.5083
0.0212
-0.0985
0.4406
-0.0988
-0.4098
0.3434
0.3971
0.5083
1.0000
R
1.0000
0.1441
0.2791
0.1483
0.1863
0.2264
0.3680
0.1147
0.0212
0.1441
1.0000
0.4708
0.6452
0.7160
0.6616
0.1468
-0.5820
-0.0984
0.2791
0.4708
1.0000
0.5050
0.3658
0.7284
0.4277
0.4420
0.4406
0.1483
0.6452
0.5050
1.0000
0.6007
0.5500
0.3471
-0.1911
-0.0988
0.1863
0.7160
0.3658
0.6007
1.0000
0.7150
-0.0298
-0.3882
-0.4099
0.2264
0.6616
0.7284
0.5500
0.7150
1.0000
0.2821
0.0026
0.3434
0.3680
0.1468
0.4277
0.3471
-0.0298
0.2821
1.0000
0.2445
0.3971
0.1147
-0.5820
0.4420
-0.1911
-0.3882
0.0026
0.2445
1.0000
0.5082
0.0212
-0.0984
0.4406
-0.0988
-0.4099
0.3434
0.3971
0.5082
1.0000
Regression of Y on [1|W]
E Y 0 1 W
Multiple R
0.944825
R Square
0.892694
Adjusted R Square
0.850704
Standard Error 1.890412
Observations
33
ANOVA
df
Regression
Residual
Total
Intercept
W1
W2
W3
W4
W5
W6
W7
W8
W9
SS
9 683.7823
23 82.1941
32 765.9764
MS
75.9758
3.5737
F Significance F
21.2600
0.0000
Coefficients
164.5636
0.3291 500.0743
0.0000 163.8829 165.2444
12.1269
0.9922 12.2227
0.0000 10.0744 14.1793
4.5224
1.2096
3.7389
0.0011
2.0202
7.0245
7.6160
1.8769
4.0578
0.0005
3.7334 11.4985
4.9552
2.1605
2.2935
0.0313
0.4858
9.4246
-3.5819
2.4185
-1.4810
0.1522
-8.5850
1.4213
3.2973
3.4376
0.9592
0.3474
-3.8139 10.4085
6.8268
3.9230
1.7402
0.0952
-1.2885 14.9422
1.4226 64.0508
0.0222
0.9825 -131.0766 133.9219
-27.5954 87.0588
-0.3170
0.7541 -207.6903 152.4995
Note that W8 and

W9 have very small
eigenvalues and
very small
t-statistics
Condition indices
are 63.5 and 85.2,
Both well above 10
Reduced Model
Removing last 2 principal components due to
small, insignificant t-statistics and high condition
indices
Let V(g) be the pg matrix of the eigenvectors for
the g retained principal components (p=9, g=7)
Let W(g) = X*V(g)
Then regress Y on [1|W(g)]
V(g)
0.1853
0.4413
0.3934
0.4182
0.4125
0.4645
0.2141
-0.0852
0.0474
0.1523
-0.2348
0.3336
-0.0813
-0.3000
0.1011
0.3577
0.5467
0.5261
0.8017
-0.0986
-0.1642
0.0284
-0.0121
-0.2518
0.3790
-0.0498
-0.3320
0.2782
-0.2312
0.2336
-0.2063
0.3508
0.1658
-0.5862
0.4536
-0.2685
-0.3707
-0.2551
0.1239
0.5765
0.0559
-0.2697
0.2139
0.3674
-0.4396
-0.2327
-0.3191
-0.3183
-0.3703
0.4669
0.3798
0.4811
0.0367
-0.1027
0.1754
-0.3973
-0.4953
0.5529
0.0250
0.2786
-0.2484
-0.0418
0.3445
Reduced Regression Fit
SUMMARY OUTPUT
Multiple R
0.944575
R Square
0.892223
Adjusted R Square 0.862045
Standard Error
1.817195
Observations
33
ANOVA
df
Regression
Residual
Total
Intercept
W1
W2
W3
W4
W5
W6
W7
SS
7 683.4215
25 82.5549
32 765.9764
MS
97.6316
3.3022
F
Significance F
29.5657
0.0000
Coefficients
164.5636
0.3163 520.2229
0.0000 163.9121 165.2151
12.1268
0.9537 12.7151
0.0000 10.1625 14.0910
4.5224
1.1627
3.8895
0.0007
2.1277
6.9170
7.6160
1.8042
4.2213
0.0003
3.9002 11.3317
4.9551
2.0768
2.3859
0.0249
0.6777
9.2324
-3.5819
2.3249
-1.5407
0.1360
-8.3701
1.2063
3.2972
3.3044
0.9978
0.3279
-3.5084 10.1028
6.8268
3.7711
1.8103
0.0823
-0.9398 14.5934
Transforming Back to X-scale

^
(g) = V(g) (g)

s^2
s (g) s 2 V(g) L-1(g) V '(g)
3.3022
W1
W2
W3
W4
W5
W6
W7
gamma-hat(g)
12.1268
4.5224
7.6160
4.9551
-3.5819
3.2972
6.8268
X1*
X2*
X3*
X4*
X5*
X6*
X7*
X8*
X9*
beta-hat(g) StdErr
12.1779
2.0639
-0.4583
2.0549
1.3113
2.3006
4.3866
2.8275
6.8020
1.7926
9.1146
1.8993
3.3197
2.4118
1.8268
1.4407
2.6829
1.9731
V{beta-hatg}
4.2598 -0.1779
-0.1779
4.2228
-0.6883
3.6089
1.0454 -2.2379
-0.8386 -1.9307
-0.0887 -2.4561
-1.8757 -0.1330
-0.4214 -1.0423
0.9289 -0.7562
-0.6883
3.6089
5.2928
-2.3318
-1.3892
-2.9496
-0.3347
1.1128
-2.2031
1.0454
-2.2379
-2.3318
7.9948
-1.6401
-0.1911
-2.6329
0.1667
1.9223
-0.8386
-1.9307
-1.3892
-1.6401
3.2135
2.3480
1.4626
0.7180
-1.1223
-0.0887
-2.4561
-2.9496
-0.1911
2.3480
3.6074
0.1090
-0.1452
1.7520
-1.8757
-0.1330
-0.3347
-2.6329
1.4626
0.1090
5.8170
-0.1949
-1.7317
-0.4214
-1.0423
1.1128
0.1667
0.7180
-0.1452
-0.1949
2.0755
-1.2055
0.9289
-0.7562
-2.2031
1.9223
-1.1223
1.7520
-1.7317
-1.2055
3.8931
Comparison of Coefficients and SEs

Original Model
Intercept
X1*
X2*
X3*
X4*
X5*
X6*
X7*
X8*
X9*
Coefficients
Standard Error
164.5636
0.3291
11.8900
2.3307
4.2752 39.4941
-3.2845 35.5676
4.2764
2.9629
-9.8372 54.0398
25.5626 53.5337
3.3805
2.5166
6.3735 38.6215
-9.6391 40.0289
Principal Components
X1*
X2*
X3*
X4*
X5*
X6*
X7*
X8*
X9*
beta-hat(g) StdErr
12.1779
2.0639
-0.4583
2.0549
1.3113
2.3006
4.3866
2.8275
6.8020
1.7926
9.1146
1.8993
3.3197
2.4118
1.8268
1.4407
2.6829
1.9731

Multicolline PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multicolline PDF

Uploaded by

Copyright:

Available Formats

Multicollinearity in Regression

Principal Components Analysis

Standardizing the Predictors

Correlations Matrix of Predictors and

Variance Inflation Factors (VIFs)

Note the surprising

Principal Components Analysis

X 'X R j v j v j ' VLV ' where j

Police Applicants Height Data - I

Police Applicants Height Data - II

Note that W8 and

Reduced Regression Fit

Transforming Back to X-scale

(g) = V(g) (g)

s (g) s 2 V(g) L-1(g) V '(g)

Comparison of Coefficients and SEs

You might also like