Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

STAT 3008: Applied Linear Regression

2014-15 Term 2
Assignment #4 Solutions
Problem 1:
(a) Recall from Ch5 that we can transform the WLS into OLS as follows:
Y X e Z M We ,

where Z WY, M WX, d We ~ 2 I n

Hence the OLS is given by (M' M) 1 M' Z (X' WX ) 1 X' WY . Now,


X HY XX' WX 1 X' WY HY H XX' WX 1 X' W .
Y

(b) H' X(X' WX ) 1 X' W' WX (X' WX ) 1 X' H . Hence H is NOT a symmetric matrix.
(c) HH XX' WX 1 X' WXX' WX 1 X' W XX' WX 1 X' W H
(d) HX XX' WX 1 X' WX X H . From part (c), we can put U=H.
(e) tr (H) tr XX' WX 1 X' W tr X' WX 1 X' WX tr[I p1 ] p 1
Problem 2: (a) The scatterplot below suggests positive association between the response
and the predictor. However, the relationship does not seem to be linear.

(b) Fitted Model y 375 473.74 x , R2=85.33%.


R Code: library(car); library(alr3); x<-baeskel$Sulfur;
fit<-lm(y~x); summary(fit)

y<-baeskel$Tension; plot(x,y);

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
x

375.00

29.14 12.868 1.51e-07 ***

473.74

62.11

7.627 1.78e-05 ***

Residual standard error: 59.85 on 10 degrees of freedom


Multiple R-squared: 0.8533,

Adjusted R-squared:

0.8386

F-statistic: 58.17 on 1 and 10 DF, p-value: 1.785e-05

(c) From the residual plot (1st chart below), there is a systematic pattern (quadratic!) on the
residuals, violating the assumptions of the null plot. The Normal Q-Q plot (2nd chart)
suggests that (i) the studentized residuals have thinner tails than the normal distribution,
and (ii) there is no outlier as the studentized residuals are within [-2,2].
Page 1/5

R Code: par(mfrow=c(2,2)); plot(fit)

(d) From the table of influence diagnostics below, |DFFITSi|<1 and |DFBETASi|<1 for all the
data points, suggesting that there is no influence point based on either the DFFITS or the
DFBETAS.

x
x

n
Problem 3: (a) ( X' X )
xj

j
2
j

x 2j

1
1

( X' X )

nSXX x j

1 x1

2
1 x2 1 x j
1
Hence, H X ( X' X ) X'
nSXX x j

1 x n

x j 1

n x1

xj

1 1

x2 xn

Consider the (i,i) element of H:

x x
x n n
2

hii

2
j

2 xi x j nx
nSXX

2
i

2
j

2 xi x j nxi2

nSXX
Page 2/5

SXX n( xi x ) 2 1 ( xi x ) 2

nSXX
n
SXX

(b) If xi a for i=1,2, , n-1, and xn a (n 1) , then

x a, SXX (n 1) 2 (n 1) 2 2 (n 1)n 2 .

Hence,
1
2
n ( n 1)n 2
1 ( xi x ) 2 1 ( xi a ) 2
hii

2 2
n
SXX
n ( n 1)n 2 1 ( n 1)

2
n ( n 1)n

i 1,2, , n 1
in

1
i 1,2, , n 1
n 1
1
in

Problem 4: (a)

> AIC.F<-stepAIC(fit0,scope=list(lower=fit0, upper=fit1),direction="forward",trace=1) # forward selection


Start:

AIC=253.58

Step:

y~1

AIC=181.58

y ~ x4
Df Sum of Sq

RSS

AIC

Df Sum of Sq

RSS

AIC

+ x4

1661.66

884.73 181.57

+ x3

141.523 743.21 171.37

+ x2

1120.45 1425.95 214.99

+ x5

90.016 794.72 176.06

+ x1

504.47 2041.92 240.12

+ x6

49.592 835.14 179.54

+ x3

462.22 2084.18 241.55

+ x2

25.046 859.69 181.56

+ x6

360.22 2186.17 244.90

<none>

+ x5

281.47 2264.92 247.38

+ x1

<none>

2546.39 253.58

Page 3/5

884.73 181.57
1

8.291 876.44 182.92

Step:

AIC=171.37

Step:

y ~ x4 + x3

AIC=168.96

y ~ x4 + x3 + x6

Df Sum of Sq

RSS

AIC

Df Sum of Sq

RSS

AIC

+ x6

45.431 697.78 168.96

<none>

+ x2

21.519 721.69 171.32

+ x1

15.1423 682.64 169.42

+ x2

13.2307 684.55 169.62

+ x5

4.2635 693.52 170.53

<none>

743.21 171.37

+ x1

8.535 734.68 172.56

+ x5

1.637 741.57 173.22

697.78 168.96

> AIC.B<-stepAIC(fit1,scope=list(lower=fit0, upper=fit1),direction="backward",trace=1) # backward selection


Start:

AIC=172.67

Step:

y ~ x1 + x2 + x3 + x4 + x5 + x6
Df Sum of Sq

RSS

AIC=169.42

y ~ x1 + x3 + x4 + x6
AIC

Df Sum of Sq

2.93

678.29 170.98

- x1

- x5

2.95

678.31 170.98

<none>

- x1

7.01

682.38 171.40

- x6

52.04

734.68 172.56

- x3

148.70

831.34 181.22

675.36 172.67

- x6

44.82

720.18 175.17

- x4

- x3

59.18

734.54 176.55

Step:

- x4

Step:

646.12 1321.48 217.66

682.64 169.42

AIC=168.96

<none>
RSS

AIC

RSS

AIC

697.78 168.96

- x6

45.43

743.21 171.37

137.36

835.14 179.54

- x5

4.34

682.64 169.42

- x3

- x1

15.22

693.52 170.53

- x4

<none>

1222.35 1904.99 239.26

Df Sum of Sq

y ~ x1 + x3 + x4 + x5 + x6

697.78 168.96

y ~ x3 + x4 + x6

AIC=170.98

Df Sum of Sq

15.14

AIC

- x2

<none>

RSS

1278.97 1976.75 239.85

678.29 170.98

- x6

54.83

733.12 174.42

- x3

74.01

752.30 176.22

- x4

1190.09 1868.39 239.90

> BIC.F<-stepAIC(fit0,scope=list(lower=fit0, upper=fit1),direction="forward",trace=1,k=log(n)) # forward selection


Start:

AIC=254.06

Step:

y~1

AIC=182.55

y ~ x4
Df Sum of Sq

RSS

AIC

Df Sum of Sq

RSS

AIC

+ x4

1661.66

884.73 182.54

+ x3

141.523 743.21 172.83

+ x2

1120.45 1425.95 215.96

+ x5

90.016 794.72 177.52

+ x1

504.47 2041.92 241.09

+ x6

49.592 835.14 180.99

+ x3

462.22 2084.18 242.52

<none>

+ x6

360.22 2186.17 245.87

+ x2

25.046 859.69 183.02

+ x5

281.47 2264.92 248.35

+ x1

8.291 876.44 184.37

<none>

2546.39 254.06

Page 4/5

884.73 182.54

Step:

AIC=172.83

Step:

y ~ x4 + x3

y ~ x4 + x3 + x6

Df Sum of Sq
+ x6

AIC=170.9

RSS

AIC

Df Sum of Sq

45.431 697.78 170.90

<none>

<none>

743.21 172.83

RSS

AIC

697.78 170.90

+ x1

15.1423 682.64 171.85

+ x2

21.519 721.69 173.26

+ x2

13.2307 684.55 172.04

+ x1

8.535 734.68 174.50

+ x5

4.2635 693.52 172.95

+ x5

1.637 741.57 175.16

> BIC.B<-stepAIC(fit1,scope=list(lower=fit0, upper=fit1),direction="backward",trace=1,k=log(n)) # backward selection


Start:

AIC=176.07

Step:

y ~ x1 + x2 + x3 + x4 + x5 + x6
Df Sum of Sq

y ~ x1 + x3 + x4 + x6

RSS

AIC

- x2

2.93

678.29 173.88

- x5

2.95

678.31 173.89

- x1

- x1

7.01

682.38 174.31

<none>

<none>

AIC=171.85

Df Sum of Sq

675.36 176.07

52.04

734.68 174.50

148.70

831.34 183.16

44.82

720.18 178.08

- x3

- x3

59.18

734.54 179.46

- x4

- x4

Step:

AIC=173.89

697.78 170.90

- x6

Step:

AIC

682.64 171.85

- x6

646.12 1321.48 220.57

15.14

RSS

1222.35 1904.99 241.20

AIC=170.9

y ~ x3 + x4 + x6

y ~ x1 + x3 + x4 + x5 + x6

Df Sum of Sq

Df Sum of Sq

RSS

AIC

<none>

RSS

AIC

697.78 170.90

- x5

4.34

682.64 171.85

- x6

45.43

743.21 172.83

- x1

15.22

693.52 172.95

- x3

137.36

835.14 180.99

- x4

<none>

678.29 173.88

- x6

54.83

733.12 176.84

- x3

74.01

752.30 178.65

- x4

1278.97 1976.75 241.31

1190.09 1868.39 242.33

From the above, (x3, x4, x6) were chosen regardless of the model selection criteria (AIC/BIC) and the model
selection method (forward/backward) we used. Hence the parsimonious model (with R2 = 72.6%) is
height at age 18 = 12.82162 -0.35326 (weight at age 9) +1.25046 (height at age 9) -0.06729 (strength at age 9)
> fit2<-lm(y~x3+x4+x6); summary(fit2)
Estimate

Std. Error

t value

Pr(>|t|)

(Intercept)

12.82162

12.70635

1.009

0.316625

x3

-0.35326

0.09801

-3.605

0.000601 ***

x4

1.25046

0.11369

10.999

< 2e-16 ***

x6

-0.06729

0.03246

-2.073

0.042086 *

Residual standard error: 3.252 on 66 degrees of freedom


Multiple R-squared:

0.726,

Adjusted R-squared:

0.7135

(b) From the R-output, VIR5 1 /(1 R ) 1 /(1 0.8343) 6.0346


2
5

> fitx5<-lm(x5~x1+x2+x3+x4+x6);

1/(1-summary(fitx5)$r.squared)

[1] 6.034562

Page 5/5

You might also like