HW9

นาย วฐา มินเสน รหัส 48850226
http://beam.to/statistics
HW Chapter 8
8.6. Data on x1 = sales and x2 = profits for the 10 largest U.S. industrial
corporations were listed in Exercise 1.4 of Chapter 1.
From Example 4.12
⎡62309 ⎤ ⎡1000520000 25576000 ⎤
x=⎢ ⎥ , S=⎢
⎣ 2927 ⎦ ⎣ 25576000 1430000 ⎥⎦
a) Determine the sample principal components and their variances for these
data.
⎡0.9997 ⎤ ⎡ −0.9997 ⎤
λˆ1 = 1.0012 x109 , eˆ1 = ⎢ ⎥ or eˆ1 = ⎢ ⎥
⎣ 0.0256 ⎦ ⎣ −0.0256 ⎦
⎡ −0.0256 ⎤ ⎡ 0.0256 ⎤
λˆ2 = 7.7570x105 , eˆ2 = ⎢ ⎥ or eˆ2 = ⎢ ⎥
⎣ 0.9997 ⎦ ⎣ −0.9997 ⎦
yî = eî′x = eî1 x1 + eî 2 x2
The sample principal components:
yˆ1 = 0.9997 x1 + 0.0256 x2
yˆ 2 = −0.0256 x1 + 0.9997 x2
Notice here that the variable x1 , with coefficient 0.9997 , receives the greatest
weight in the component ŷ1 . It also has the largest correlation (in absolute value) with
ŷ1 [see Part d]. That x1 contributes more to the determination of ŷ1 than does x2
Their variances:
Sample variance ( yˆ1 ) = eˆ1′Seˆ1 = λˆ1 = 1.0012 x109
Sample variance ( yˆ 2 ) = eˆ2′ Seˆ2 = λˆ2 = 7.7570x105
Sample covariance ( yˆ1 , yˆ 2 ) = eˆ1′Seˆ2 = 0
Notice because of its large variance, x1 completely dominates the first sample
principal component. Moreover, this first sample principal component explains
completely. [see Part b]
b) Find the proportion of the total sample variance explained by ŷ1
⎛ the proportion of ⎞
⎜ ⎟ λˆ1
⎜ the total sample variance ⎟ λˆ + λˆ = 0.9992 or 99.92%
=
⎜ ⎟ 1 2
⎝ explained by yˆ1 ⎠
1
c) Sketch the constant density ellipse ( x − x )′S ( x − x ) = 1.4 , and indicate
−1
the principal components ŷ1 and ŷ2 on your graph.
70,000 Profits (x2)
60,000
50,000
40,000
30,000
ŷ2
20,000
10,000
(62282,3969) (99735,3884) ŷ1
0 (62309,2927)
20,000 40,000 60,000 80,000 100,000 120,000
Sales (x1)
รูปที่ 1 Sketch the constant density ellipse ( x − x )′S −1 ( x − x ) = 1.4
กําหนดใหแกนนอน และแกนตั้งมีชวงหางที่เทากัน
10,000 Profits (x2)
9,000
8,000
7,000
6,000 ŷ2
5,000
(62282,3969) (99735,3884) ŷ1
4,000
3,000
(62309,2927)
2,000
1,000
0
20,000 40,000 60,000 80,000 100,000 120,000
Sales (x1)
รูปที่ 2 Sketch the constant density ellipse ( x − x )′S −1 ( x − x ) = 1.4
กําหนดใหแกนนอน และแกนตั้งมีชวงหางไมเทากัน
2
Note1: รูปที่ 2 เนื่องจาก แกนนอนและแกนตั้งมี Scale ตางกันจึงทําใหแกนของวงรี(Sample principal
components ŷ1 and ŷ2 ) วาดไมเปนเสนตั้งฉากกัน ซึ่งถากําหนดใหแกนนอน และแกนตั้งมีชวงหางที่เทากันจะทํา
ใหพิจารณารูปรางวงรีไดยากดังรูปที่ 1 ดังนั้นการวาดรูปวงรีที่ขจัด Scale ของแกนตั้งและแกนนอนออกไปใหเปนแกน
ที่ไรหนวยนั้นจะทําให แกนของวงรี(Sample principal components ŷ1 and ŷ2 ) วาดเปนเสนตั้งฉากกัน และ
พิจารณารูปไดงายกวา สามารถจัดทําไดโดยการ Standardizing the sample principal components ดังรูปที่
3 [see 8.7]
Note2: การหา λî c 2 eî i = 1, 2

⎡ 0.9997 ⎤ ⎡37426 ⎤
ŷ1 หาจาก λî ที่มีคามากที่สุด 1.0012 x109 1.4 ⎢ ⎥=⎢ ⎥
⎣ 0.0256 ⎦ ⎣ 957 ⎦
⎡37426 ⎤ ⎡62309 ⎤ ⎡99735⎤

ดังนั้นคูอันดับที่จะนําไป Plot ในกราฟคือ ⎢ ⎥+⎢ ⎥=⎢ ⎥
⎣ 957 ⎦ ⎣ 2927 ⎦ ⎣ 3884 ⎦
⎡ −0.0256 ⎤ ⎡ −26.7 ⎤
ŷ2 หาจาก λî ที่มีคาถัดมา 7.7570 x105 1.4 ⎢ ⎥=⎢ ⎥
⎣ 0.9997 ⎦ ⎣1041.8⎦
⎡ −26.7 ⎤ ⎡62309 ⎤ ⎡62282.3⎤

ดังนั้นคูอันดับที่จะนําไป Plot ในกราฟคือ ⎢ + =
1041.8⎥ ⎢ 2927 ⎥ ⎢ 3968.8 ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
3
d) Compute the correlation coefficients ryˆ1 , xk , k = 1, 2. What interpretation, if
any, can you give to the first principal component?
⎡0.9997 ⎤ ⎡ −0.0256 ⎤
เมื่อ eigenvectors คือ eˆ1 = ⎢ ⎥ , eˆ2 = ⎢ ⎥
⎣ 0.0256 ⎦ ⎣ 0.9997 ⎦
eˆ λˆ 0.9997 1.0012 x109
ryˆ1 , x1 = 11 1 = =1
s11 1000520000
eˆ12 λˆ1 0.0256 1.0012 x109

ryˆ1 , x2 = = = 0.6767
s22 1430000
Interpretation
The variable x1 , with coefficient 0.9997 [see part a], receives the greatest weight in
the component ŷ1 . It also has the largest correlation (in absolute value) with ŷ1 ,
( ryˆ1 , x1 = 1 ). The correlation of x1 with ŷ1 = 1 is the largest of correlation with ŷ1 . That
x1 contributes more to the determination of ŷ1 than does x2 . However, that x2 has
coefficient 0.0256 and the correlation = 0.6767 with ŷ1 , in this case, both variables aid
in the interpretation of ŷ1 .
Note: อยางไรก็ตามถึงแมวา x2 จะสามารถอธิบาย ŷ1 ได เพราะมีคาสัมประสิทธิ์ 0.0256 ในสมการของ ŷ1
แตคานี้เมื่อเทียบกับ x1 = 0.9997 แลวมีคานอยมากๆ จนอาจจะกลาวไดวา ŷ1 นั้นถูกอธิบายไดจากตัว x1 อยาง
มาก โดยจะอธิบายอีกครั้งในขอ 8.7
Can you give to the first principal component?

Yes, first sample principal component explains a proportion 0.9992 of the total
population variance. The second sample principal component is unimportant.
Note : เนื่องจาก eîk = สามารถกลับเครื่องหมายได

⎡ −0.9997 ⎤ ⎡ 0.0256 ⎤
⎣ −0.0256 ⎦ ⎣ −0.9997 ⎦
eˆ11 λˆ1 −0.9997 1.0012 x109
ryˆ1 , x1 = = = −1
s11 1000520000
eˆ12 λˆ1 −0.0256 1.0012 x109

ryˆ1 , x2 = = = −0.6767
s22 1430000
4
8.7. Convert the covariance matrix S in Exercise 8.6 to a sample
correlation
2.0 Profits (z2)
ŷ1
(1.0832,1.0832)
ŷ2 1.0
(-0.4761,0.4761)
0.0
-2.0 -1.0 0.0 (0,0) 1.0 2.0 3.0 4.0 5.0
Sales (z1)
-1.0
-2.0
-3.0
รูปที่ 3 Sketch the constant density ellipse of

standardizing the sample principal components
a) Find the sample principal of the total sample variance explained by

ŷ1 , ŷ2 and their variances.
matrix R .
⎡ 1 0.6762 ⎤
(V ) S (V )
1/ 2 −1 1/ 2 −1
=R=⎢
1 ⎥⎦
⎣ 0.6762
⎡0.7071⎤ ⎡ −0.7071⎤
λˆ1 = 1.6762 , eˆ1 = ⎢ ⎥ or eˆ1 = ⎢ ⎥
⎣0.7071⎦ ⎣ −0.7071⎦
⎡ −0.7071⎤ ⎡ 0.7071 ⎤
λˆ2 = 0.3238 , eˆ2 = ⎢ ⎥ or eˆ2 = ⎢ ⎥
⎣ 0.7071 ⎦ ⎣ −0.7071⎦
yî = eî′z = eî1 z1 + eî 2 z2
The sample principal components:
yˆ1 = 0.7071z1 + 0.7071z2
yˆ 2 = −0.7071z1 + 0.7071z2
5
Their variances:
Sample variance ( yˆ1 ) = eˆ1′Reˆ1 = λˆ1 = 1.6762
Sample variance ( yˆ 2 ) = eˆ2′ Reˆ2 = λˆ2 = 0.3238
Sample covariance ( yˆ1 , yˆ 2 ) = eˆ1′Reˆ2 = 0
b) Compute the proportion of the total sample variance explained by ŷ1 .
⎛ the proportion of ⎞
⎜ ⎟ λˆ1 λˆ1 λˆ1
⎜ the total sample variance ⎟ p tr ( R) λˆ + λˆ = 0.8381
= = = or 83.81%
⎜ ⎟ 1 2
⎝ explained by yˆ1 ⎠
c) Compare the correlation coefficients ryˆ1 , zk , k = 1, 2. Interpret ŷ1 .
⎡0.7071⎤ ⎡ −0.7071⎤
⎣0.7071⎦ ⎣ 0.7071 ⎦
ryˆ1 , z1 = eˆ11 λˆ1 = 0.7071 1.6762 = 0.9155
ryˆ1 , z2 = eˆ12 λˆ2 = 0.7071 1.6762 = 0.9155
Note : เนื่องจาก eîk = สามารถกลับเครื่องหมายได ดังนั้นถากลับเครื่องหมายเปน

⎡ −0.7071⎤ ⎡ 0.7071 ⎤
เมื่อ eigenvectors คือ eˆ1 = ⎢ ⎥, eˆ2 = ⎢ ⎥
⎣ −0.7071⎦ ⎣ −0.7071⎦
ryˆ1 , z1 = eˆ11 λˆ1 = −0.7071 1.6762 = −0.9155
ryˆ1 , z2 = eˆ12 λˆ2 = −0.7071 1.6762 = −0.9155
Interpretation
The variable z1 and z2 , with same coefficient 0.7071 , receive great weight in the
component ŷ1 . They also have large correlation (in absolute value) with ŷ1 ,
( ryˆ1 , x1 = 0.9155 , ryˆ1 , x2 = 0.9155 ). The correlation of z1 is as large as that for z2 ,
indicating that the variables are about equally important to the first sample principal
component. Further, in this case, both coefficients are reasonably large and they have
same sign, we would argue that both variables aid in the interpretation of ŷ1 .
6
d) Compare the components obtained in Part a with those obtained in
Exercise 8.6(a). Given the original data displayed in Exercise 1.4, do you
feel that it is better to determine principal components from the sample
covariance matrix or sample correlation matrix? Explain.
The sample principal components of 8.6:

yˆ1 = 0.9997 x1 + 0.0256 x2
yˆ 2 = −0.0256 x1 + 0.9997 x2
The sample principal components of 8.7:

yˆ1 = 0.7071z1 + 0.7071z2
yˆ 2 = −0.7071z1 + 0.7071z2
การหา The sample principal components จาก S ดังในขอ 8.6 ถึงแมวา x2 จะสามารถอธิบาย ŷ1
ได เพราะมีคาสัมประสิทธิ์ 0.0256 ในสมการของ ŷ1 แตคานี้เมื่อเทียบกับ x1 = 0.9997 แลวมีคานอยมากๆ จน
อาจจะกลาวไดวา ŷ1 นั้นถูกอธิบายไดจากตัว x1 อยางมากแตเพียงตัวแปรเดียว แตในการหา The sample
principal components จาก R ดังในขอ 8.7 นั้นแตกตางกันเพราะทั้งตัวแปร z1 และ z2 สามารถอธิบายตัว
แปร ŷ1 ไดดีเทาเทียมกัน ไมสามารถตัดตัวแปรใดทิ้งไปได(พิจารณาจาก correlation coefficients ทั้งคู
เทากับ 0.9155) ซึ่งการตัดสินใจเลือกใชแบบใดนั้นขึ้นอยูกับเหตุผลดังนี้ (ประกอบการใหเ หตุผ ลจาก
Richard A. Johnson and Dean W. Wichern, Applied multivariate statistics analysis,
fifth edition, page 435.)
“Variables should probably be standardized if they are measured on scales with
widely differing ranges or if the units of measurement are not commensurate. For
example, if x1 represents annual sales in the the $10,000 to $350,000 range and x2 is
the ratio (net annual income)/(total assets) that falls in the .01 to .60 range, then the
total variation will be due almost exclusively to dollar sales. In this case, we would
expect a single (important) principal component with a heavy weighting of x1 .
Alternatively, if both variables are standardized, their subsequent magnitudes will be
of the same order, and x2 (or z2 ) will play a larger role in the construction of the
principal components.”
ดังนั้นเมื่อพิจารณาขอมูล Exercise 1.4 พบวา ตัวแปรทั้ง 2 มีชวงของขอมูลแตกตางกันอยางมาก การ
พิจารณาหาคาดวย The sample principal components จาก Sample correlation matrix จึงมี
ความเหมาะสมกวา ดังนั้นอธิบาย The first sample principal component of ŷ1
ดวย yˆ1 = 0.7071z1 + 0.7071z2 โดยจะพบวาคาของชวงขอมูลที่มากกวาในตัวแปร x1 ที่มีคาความ
แปรปรวนมาก จะถูกนํามาลดคาสัมประสิทธิ์ของ The sample principal components มากกวาตัวแปร
x2 เมื่อแปลง z ⇒ x ดังนี้
yˆ1 = 0.7071z1 + 0.7071z2

0.7071 0.7071
yˆ1 = ( x1 − x1 ) + ( x2 − x2 )
1000520000 143000000
yˆ1 = 0.0000223547( x1 − x1 ) + 0.000591( x2 − x2 )

HW9

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HW9

Uploaded by

Copyright:

Available Formats

นาย วฐา มินเสน รหัส 48850226

The sample principal components:

yˆ1 = 0.9997 x1 + 0.0256 x2

Sample variance ( yˆ1 ) = eˆ1′Seˆ1 = λˆ1 = 1.0012 x109

Sample variance ( yˆ 2 ) = eˆ2′ Seˆ2 = λˆ2 = 7.7570x105

Sample covariance ( yˆ1 , yˆ 2 ) = eˆ1′Seˆ2 = 0

b) Find the proportion of the total sample variance explained by ŷ1

the principal components ŷ1 and ŷ2 on your graph.

70,000 Profits (x2)

10,000 Profits (x2)

Note2: การหา λˆi c 2 eˆi i = 1, 2

⎡37426 ⎤ ⎡62309 ⎤ ⎡99735⎤

⎡ −26.7 ⎤ ⎡62309 ⎤ ⎡62282.3⎤

eˆ12 λˆ1 0.0256 1.0012 x109

Can you give to the first principal component?

Note : เนื่องจาก eˆik = สามารถกลับเครื่องหมายได

eˆ12 λˆ1 −0.0256 1.0012 x109

รูปที่ 3 Sketch the constant density ellipse of

a) Find the sample principal of the total sample variance explained by

The sample principal components:

yˆ1 = 0.7071z1 + 0.7071z2

Sample variance ( yˆ1 ) = eˆ1′Reˆ1 = λˆ1 = 1.6762

Sample variance ( yˆ 2 ) = eˆ2′ Reˆ2 = λˆ2 = 0.3238

Sample covariance ( yˆ1 , yˆ 2 ) = eˆ1′Reˆ2 = 0

b) Compute the proportion of the total sample variance explained by ŷ1 .

c) Compare the correlation coefficients ryˆ1 , zk , k = 1, 2. Interpret ŷ1 .

ryˆ1 , z1 = eˆ11 λˆ1 = 0.7071 1.6762 = 0.9155

ryˆ1 , z2 = eˆ12 λˆ2 = 0.7071 1.6762 = 0.9155

Note : เนื่องจาก eˆik = สามารถกลับเครื่องหมายได ดังนั้นถากลับเครื่องหมายเปน

The sample principal components of 8.6:

The sample principal components of 8.7:

yˆ1 = 0.7071z1 + 0.7071z2

You might also like