36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)

36-401 Modern Regression HW #2 Solutions
DUE: 9/15/2017
Problem 1 [36 points total]

(a) (12 pts.)
In Lecture Notes 4 we derived the following estimators for the simple linear regression model:
βb0 = Y − βb1 X
cXY
βb1 = 2 ,
sX
where
n n
1X 1X
cXY = (Xi − X)(Yi − Y ) and s2X = (Xi − X)2 .
n i=1 n i=1
Since the formula for βb0 depends on βb1 we will calculate Var(βb1 ) first. Some simple algebra1 shows we can
rewrite βb1 as Pn
1
n i=1 (Xi − X)i
β1 = β1 +
b .
s2X
Now, treating the Xi0 s as fixed, we have
Pn !
1
n i=1 (Xi − X)i
Var βb1 = Var β1 +
s2X
Pn !
1
n i=1 (Xi − X)i
= Var
s2X
1
Pn
n2 i=1 (Xi − X)2 Var(i )
=
s4X
σ2
Pn
n2 i=1 (Xi − X)2
=
s4X
σ2 2
n sX
=
s4X
2
σ
= .
n · s2X
1 See (16)-(22) of Lecture Notes 4
1
Thus, Var(βb0 ) is given by

Var(βb0 ) = Var Y − βb1 X
n Pn !
1
2 1X n i=1 (Xi − X)(Yi − Y)
= Var(Y ) + X Var(βb1 ) − 2XCov Yi , 1
Pn
n i=1 n i=1 (Xi − X)
2
n n
!
σ2 2 2X X X
= + X Var(βb1 ) − Pn Cov Yi , (Xi − X)(Yi − Y )
n n i=1 (Xi − X)2 i=1 i=1
n
σ2 2 2X X
= + X Var(βb1 ) − Pn (Xi − X)Cov(Yi , Yi )
n n i=1 (Xi − X)2 i=1
n
σ2 2 2Xσ 2 X
= + X Var(β1 ) − Pn
b (Xi − X)
n n i=1 (Xi − X)2 i=1
| {z }
=0
σ2 2
= + X Var(βb1 )
n
2
σ2 σ2 X
= +
n n · s2X
2
σ 2 s2X + X
=
n · s2X
n
σ 2 i=1 Xi2
P
= .
n2 · s2X
(b) (6 pts.)
n
X n
X
bi = Yi − (βb0 + βb1 Xi )
i=1 i=1
n
X
= Yi − (Y − βb1 X) − βb1 Xi
i=1
n
X n
X
= (Yi − Y ) + (βb1 X − βb1 Xi )
i=1 i=1
= (nY − nY ) + (nβb1 X − nβb1 X)

=0+0
=0
2
(c) (12 pts.)
n
X n
X
Ybi b
i = (βb0 + βb1 Xi )b
i
i=1 i=1
n
X n
X
= βb0 i +βb1
b Xi b
i
i=1 i=1
| {z }
=0
n
X
= βb1 Xi b
i
i=1
Xn n
X
= βb1 i − βb1 X
Xi b i
b
i=1 i=1
| {z }
=0
n
X
= βb1 (Xi − X)b
i
i=1
Xn
= βb1 (Xi − X) Yi − (βb0 + βb1 Xi )
i=1
n
X
= βb1 (Xi − X) Yi − (Y − βb1 X) − βb1 Xi
i=1
n
X
= βb1 (Xi − X) (Yi − Y ) − βb1 (Xi − X)
i=1
n
X n
X
= βb1 (Xi − X)(Yi − Y ) − βb12 (Xi − X)2
i=1 i=1
= βb1 · n · cXY − βb12 · n · s2X

cXY
= βb1 · n · cXY − βb1 · 2 · n · s2X
sX
=0
Note: The above implies

n n n n n
1 X b 1 X b 1X 1 X b 1Xb
Yi − i −
Yj b i =
b Yi − Yj · b
i
n i=1 n j=1 n j=1 n i=1 n j=1
n n n
1Xb 1 Xb X
= i − 2
Yi b Yj i
b
n i=1 n j=1 i=1
n n
! n
!
1Xb 1Xb 1X
= i −
Yi b Yj i
b
n i=1 n j=1 n i=1
| {z } | {z }
=0 =0
= 0.
Linear Algebra interpretation: The observed residuals are orthogonal to the fitted values.
Statistical interpretation: The observed residuals are linearly uncorrelated with the fitted values.
3
(d) (6 pts.)
From the result in part (c) we have βb1 = 0.

Substituting this into the equation for βb0 , we obtain the intercept
n
1 X
βb0 = Y − βb1 i
b
n i=1
=Y −0·0
=Y.
4
Problem 2 [24 points]
(a) (8 pts.)
We compute the least squares estimate βb1 by minimizing the empirical mean squared error via a 1st derivative
test.
n
!
∂ \ ∂ 1X
M SE(β1 ) = (Yi − β1 Xi )2
∂β1 ∂β1 n i=1
n
2X
= (Yi − β1 Xi )(−Xi )
n i=1
Setting the derivative equal to 0 yields

n
2X
− (Yi − β1 Xi )(Xi ) = 0
n i=1
n
X
(Yi Xi − β1 Xi2 ) = 0
i=1
n
X n
X
Yi Xi − β1 Xi2 = 0
i=1 i=1
Pn
Yi Xi
=⇒ β1 = Pi=1
b n 2 .
i=1 Xi
Furthermore, !
n
∂2 \ ∂ 2X
M SE(β1 ) = − (Yi Xi − β1 Xi2 )
∂β12 ∂β1 n i=1
n
2X 2 ,
= X
n i=1 i
>0
so βb1 is indeed the minimizer of the empirical MSE.
5
(b) (8 pts.)
"P #
n
i=1 Yi X i
E βb1 = E Pn 2
i=1 Xi
"P #
n
i=1P Xi (β1 Xi + i )
=E n 2
i=1 Xi
" P #
n Pn
β1 i=1 Xi2 + i=1 Xi i
=E Pn 2
i=1 Xi
" Pn #
Xi i
= E β1 + Pi=1 n 2
i=1 Xi
" n #
1 X
= β1 + P n 2E Xi i
i=1 Xi i=1
n
1 X
= β1 + P n Xi · E[i ]
i=1 Xi2 i=1
|{z}
=0
= β1
Thus, if the true model is linear and through the origin, then βb1 is an unbiased estimator for β1 .
(c) (8 pts.)
If the true model is linear, but not necessarily through the origin, then the bias of the regression-through-the-
origin estimator βb1 is

Bias(βb1 ) = E βb1 − β1
"P #
n
i=1 Yi Xi
= E Pn 2 − β1
i=1 Xi
"P #
n
i=1 XiP(β0 + β1 Xi + i )
=E n 2 − β1
i=1 Xi
" P #
n Pn Pn
β0 i=1 Xi + β1 i=1 Xi2 + i=1 Xi i )
=E Pn 2 − β1
i=1 Xi
" Pn Pn #
β0 i=1 Xi + i=1 Xi i
= E β1 + Pn 2 − β1
i=1 Xi
Pn n
β0 i=1 Xi 1 X
= β1 + Pn 2 + P n 2 Xi · E[i ] −β1
i=1 Xi i=1 Xi i=1 |{z}
=0
Pn
β0 i=1 Xi
= Pn 2 .
i=1 Xi
6
(a) (5 pts.)
set.seed(1)
n <- 100
X <- runif(n, 0, 1)
Y <- 5 + 3 * X + rnorm(n, 0, 1)
plot(X,Y, cex = 0.75)

model <- lm(Y ~ X)
abline(model, lwd = 2, col = "red")
9
8
7
Y
6
5
4
0.0 0.2 0.4 0.6 0.8 1.0
X
Figure 1: One hundred data points with the simple linear regression fit
(b) (5 pts.)
n <- 100
betas <- rep(NA,1,1000)
for (itr in 1:1000){

X <- runif(n, 0, 1)
Y <- 5 + 3 * X + rnorm(n, 0, 1)
model <- lm(Y ~ X)
betas[itr] <- model$coefficients[2]
}
7
mean(betas)
## [1] 3.019629
(1) (1000)
Since 1000 is a reasonably large number of trials we expect the mean of β1 , . . . , β1 to be close to

E βb1 ] = E E βb1 | X1 , . . . , Xn

= E β1 ]
= E[3]
= 3.
In the above experiment, we have

1000
1 X (i)
β = 3.019629.
1000 i=1 1
hist(betas, xlab = expression(hat(beta)[1]), prob = FALSE, main = "", breaks = 50)

60
50
40
Frequency
30
20
10
0
2.0 2.5 3.0 3.5 4.0

^
β1
Figure 2: Histogram of linear regression slope parameters for Gaussian data
(c) (5 pts.)
n <- 100

X <- runif(n, 0, 1)
Y <- 5 + 3 * X + rcauchy(n, 0, 1)
8
model <- lm(Y ~ X)
}
par(mfrow = c(1,2))
hist(betas, xlab = expression(hat(beta)[1]), prob = FALSE, main = "", xlim = c(3-20,3+20),
breaks = 750)
abline(v = 3, col = "red", lwd = 2)
100
400
80
300
Frequency
Frequency
60
200
40
100
20
0
−10 0 10 20 −600 −400 −200 0 200

^ ^
β1 β1
Figure 3: Histogram of linear regression slope parameters for Cauchy data (Left: restricted to the window
(-17,23). Right: The full window.)
(1) (1000)
Notice that the distribution of β1 , . . . , β1 still seems to be approximately centered around βb1 = 3, but
the tails are now much fatter. In particular, from the plot on the right, we see that at least one trial of the
experiment resulted in a value around βb ≈ −600.
(d) (5 pts.)
set.seed(1)
n <- 100
X <- runif(n, 0, 1)
W <- X + rnorm(n, 0, sqrt(2))
Y <- 5 + 3 * X + rnorm(n, 0, 1)
plot(X,Y, cex = 0.75)

model <- lm(Y ~ W)
abline(model, lwd = 2, col = "red")
n <- 100

X <- runif(n, 0, 1)
W <- X + rnorm(n, 0, sqrt(2))
9
10
9
8
7
Y
6
5
4
3
0.0 0.2 0.4 0.6 0.8 1.0
X
Figure 4: One hundred observations of Y vs. X with the simple linear regression fit of Y on W
Y <- 5 + 3 * X + rnorm(n, 0, 1)
model <- lm(Y ~ W)
}
mean(betas)
## [1] 0.1198059
In the above experiment, we have

1000
1 X (i)
β = 0.06132475.
1000 i=1 1
From this, and Figure 5, we conclude having errors on the Xi ’s biases βb1 downwards.
10
50
40
Frequency
30
20
10
0
−0.2 −0.1 0.0 0.1 0.2 0.3 0.4

^
β1
Figure 5: Histogram of linear regression slope parameters for data with errors on the X’s
11
data(airquality)
(a) (5 pts.)
summary(airquality)
## Ozone Solar.R Wind Temp

## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
pairs(airquality, cex = 0.5)
(b) (5 pts.)
with(airquality, plot(Solar.R, Ozone, xlab = "Solar Radiation", ylab = "Ozone"))

model <- lm(Ozone ~ Solar.R, data = airquality)
abline(model, col = "red", lwd = 2)
Ozone and Solar Radiation appear to be positively correlated.
(c) (5 pts.)
summary(model)$coefficients
## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 18.5987278 6.74790416 2.756223 0.0068560215
## Solar.R 0.1271653 0.03277629 3.879795 0.0001793109
The intercept and slope of the least squares regression are
βb0 = 18.59873 and βb1 = 0.12717
12
0 100 250 60 80 0 10 20 30
150
Ozone
50
0
250
Solar.R
100
0
20
Wind
10
5
80
Temp
60
9
8
Month
7
6
5
30
20
Day
10
0
0 50 150 5 10 20 5 6 7 8 9
Figure 6: Pairwise relationships of variables in the airquality data set
13
150
100
Ozone
50
0
0 50 100 150 200 250 300
Solar Radiation
Figure 7: Ozone vs. solar radiation observations in the **airquality** data set
14
100
Residuals
50
0
−50
0 50 100 150 200 250 300
Solar Radiation
Figure 8: Linear regression residuals vs. solar radiation
(d) (5 pts.)
resids <- airquality$Ozone - predict(model, newdata = data.frame(Solar.R = airquality$Solar.R))

plot(airquality$Solar.R, resids, xlab = "Solar Radiation", ylab = "Residuals")
abline(h = 0)
No, the standard regression assumptions do not hold. The residuals are not symmetric about zero so the
linear functional form assumption is not suitable. Furthermore, the residuals are highly heteroskedastic.
15

36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)

Uploaded by

Copyright:

Available Formats

36-401 Modern Regression HW #2 Solutions

Problem 1 [36 points total]

1 See (16)-(22) of Lecture Notes 4

= (nY − nY ) + (nβb1 X − nβb1 X)

= βb1 · n · cXY − βb12 · n · s2X

Note: The above implies

From the result in part (c) we have βb1 = 0.

Setting the derivative equal to 0 yields

so βb1 is indeed the minimizer of the empirical MSE.

plot(X,Y, cex = 0.75)

0.0 0.2 0.4 0.6 0.8 1.0

for (itr in 1:1000){

In the above experiment, we have

hist(betas, xlab = expression(hat(beta)[1]), prob = FALSE, main = "", breaks = 50)

2.0 2.5 3.0 3.5 4.0

Figure 2: Histogram of linear regression slope parameters for Gaussian data

for (itr in 1:1000){

−10 0 10 20 −600 −400 −200 0 200

plot(X,Y, cex = 0.75)

for (itr in 1:1000){

0.0 0.2 0.4 0.6 0.8 1.0

In the above experiment, we have

−0.2 −0.1 0.0 0.1 0.2 0.3 0.4

## Ozone Solar.R Wind Temp

with(airquality, plot(Solar.R, Ozone, xlab = "Solar Radiation", ylab = "Ozone"))

Ozone and Solar Radiation appear to be positively correlated.

## Estimate Std. Error t value Pr(>|t|)

βb0 = 18.59873 and βb1 = 0.12717

Figure 6: Pairwise relationships of variables in the airquality data set

0 50 100 150 200 250 300

0 50 100 150 200 250 300

Figure 8: Linear regression residuals vs. solar radiation

resids <- airquality$Ozone - predict(model, newdata = data.frame(Solar.R = airquality$Solar.R))

You might also like