ECN302E ProblemSet06 PredictionWithManyRegressorsAndBDPart2 Solutions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ECN302E - Problem Set 06 - Solutions

Prediction With Many Regressors And Big Data - Part2

Created at: 11 April 2023, 01:22 PM

Table of contents
Question 1: Solution 2

Question 2 3
Question 2: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Question 3 4
Question 3: Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1
Question 1: Solution
Define the following terms in your own words.

a) Principal components of a set of standardized variables X are linear combinations of those variables,
where the linear combinations are chosen so that the principal components are mutually uncorrelated and
sequentially contain as much of the information in the original variables as possible.
Specifically, the linear combination weights for the first principal component are chosen to maximize its
variance, in this sense capturing as much of the variation of the X’s as possible.
b) Scree plot is the plot of the sample variance of the jth principal component relative to the total sample
!
var(PCj )
variance in the Xs that is, the sample value of Pk against the number of the principal
j=1 var(Xj )
component, j.
c) Principal components regression is a dimension reduction method that derives principal components
from a large set of variables, Xs, by taking advantage of high mutual correlation between Xs. Then uses
these derived principal components to predict the outcome variable, Y. Thus we avoid issues such as
multicollinearity.
One important decision in principal component regression is to choose the optimum number of principal
components. To decide this, we can use m-fold cross validation. Scree plot also helps to see the effect of
each additional principal component, visually.

2
Question 2
Suppose a data set with 10 variables produces a scree plot that is flat. What does this tell you about the
correlation of the variables? What does this suggest about the usefulness of using the first few principal
components of these variables in a predictive regression?

Question 2: Solution
A flat scree plot says that each principal component explains the same fraction of the sample variability in the
Xs. This will occur when the Xs are mutually uncorrelated. In this case, principal components will not simplify
the predictive regression, because they will be the same as the Xs.

3
Question 3
Let X1 and X2 be two positively correlated random variables, both with variance 1.

a) The first principal component, PC1 , is the linear combination of X1 and X2 that maximizes
(X1 + X2 )
var (w1 X1 + w2 X2 ), where w21 + w22 = 1. Show that PC1 = √ .
2
(Hint: First derive an expression for var (w1 X1 + w2 X2 ) as a function of w1 and w2 , and then form the
Lagrangian function for the optimization.)
(X1 – X2 )
b) The second principal component is PC2 = √ . Show that cov(PC1 , PC2 ) = 0.
2
c) Show that var(PC1 ) = 1 + ρ and var(PC2 ) = 1 – ρ, where ρ = cor(X1 , X2 ).

4
Question 3: Solution
a) We will simplify the objective function, then form lagrangian to find possible values of w1 , w2 .

max var (w1 X1 + w2 X2 ) subject to w21 + w22 = 1


w1 ,w2

var (w1 X1 + w2 X2 ) = w21 var (X1 ) +w22 var (X2 ) + 2cov(w1 X1 , w2 X2 )


| {z } | {z } | {z }
1 1 2cov(X1 ,X2 )w1 w2

= w21 + w22 + 2ρw1 w2 ρ = cor(X1 , X2 ) =


cov(X1 , X2 )
σ σ
= cov(X1 , X2 )
X1 X2
|{z} |{z}
1 1

Lagrangian:
L(w1 , w2 , γ) = objective function + γ(equality constraint)
= w21 + w22 + 2ρw1 w2 + γ(w21 + w22 – 1)
= w21 + w22 + 2ρw1 w2 + γw21 + γw22 – γ
First-order Conditions:
∂L(w1 , w2 , γ)
= 2w1 + 2ρw2 + 2γw1 = 0
∂w1
∂L(w1 , w2 , γ)
= 2w2 + 2ρw1 + 2γw2 = 0
∂w2
∂L(w1 , w2 , γ)
= w21 + w22 – 1 = 0 ⇒ w21 + w22 = 1 the weights intersect on unit circle
∂γ
r
1
PC1 is on 45◦ line on Figure 14.5 of textbook, so absolute value of weights are equal: |w1 | = |w2 | =
2
but this information is not given in the root of this question. So we manually solve the lagrangian:

2w1 + 2ρw2 –2γw1


=
2w2 + 2ρw1 –2γw2
w1 + ρw2 w1
=
w2 + ρw1 w2
w1 w2 + ρw2 = w2 w1 + ρw21
2
cross-multiply the equation above

ρw2 = ρw21
q 2 q
w22 = w21
|w2 | = |w1 | absolute value of the weights are equal

The objective function is w21 + w22 + 2ρw1 w2 , where ρ > 0, !so that the values of w1!and w2 that maximizes
r r
1 1
the variance have the same sign. Thus w1 = w2 = or w1 = w2 = – . Both yield the same
2 2
objective function because regardless of sign of w1 and w2 , each term is positive: w21 + w22 +2 ρ w1 w2 .
|{z} |{z} |{z} | {z }
+ + + +

PC1 = w1 X1 + w2 X2
r r
1 1
= X1 + X2
2 2
(X1 + X2 )
= √ ■
2

5
b) ! The red and blue terms in 3.b are NOT the red and blue terms in 3.a. They just make following easier.
" #
(X1 + X2 ) (X1 – X2 )
cov[PC1 , PC2 ] = cov √ , √
2 2
!
1 h i
= cov (X1 + X2 ), (X1 – X2 )
2
!
1 h i h i
= cov X1 , (X1 – X2 ) + cov X2 , (X1 – X2 ) Bilinearity of covariance for first element
2
!
1 h i h i h i h i
= cov X1 , X1 –cov X1 , X2 + cov X2 , X1 – cov X2 , X2 Bilinearity of covariance for second elements
2 | {z }| {z } | {z }
var(X1 ) 0 var(X2 )
!
1
= var(X1 ) – var(X2 )
2
1 
= 1–1 the root of the question says that variances are 1.
2
=0 ■

c) If variances of both X1 and X2 are 1, then their standard deviations are also 1 (see the boxes below).
" #
h i (X1 + X2 )
var PC1 = var √
2
var(X1 + X2 )
=
2
var(X1 ) + var(X2 ) + 2cov(X1 , X2 )
=
2
1 + 1 + 2cor(X1 , X2 ) cov(X1 , X2 )
= ρ = cor(X1 , X2 ) = = cov(X1 , X2 )
2 σX 1 σX 2
|{z} |{z}
1 1

2 + 2cor(X1 , X2 )
=
2
2(1 + ρ)
=
2
=1+ρ ■

" #
h i (X1 – X2 )
var PC2 = var √
2
var(X1 – X2 )
=
2
var(X1 ) + var(X2 ) – 2cov(X1 , X2 )
=
2
1 + 1 – 2cor(X1 , X2 ) cov(X1 , X2 )
= ρ = cor(X1 , X2 ) = = cov(X1 , X2 )
2 σX 1 σX 2
|{z} |{z}
1 1

2 – 2cor(X1 , X2 )
=
2
2(1 – ρ)
=
2
=1–ρ ■

You might also like