Professional Documents
Culture Documents
Cu 3008 Assignment 4 - Solutions 2014T2
Cu 3008 Assignment 4 - Solutions 2014T2
2014-15 Term 2
Assignment #4 Solutions
Problem 1:
(a) Recall from Ch5 that we can transform the WLS into OLS as follows:
Y X e Z M We ,
(b) H' X(X' WX ) 1 X' W' WX (X' WX ) 1 X' H . Hence H is NOT a symmetric matrix.
(c) HH XX' WX 1 X' WXX' WX 1 X' W XX' WX 1 X' W H
(d) HX XX' WX 1 X' WX X H . From part (c), we can put U=H.
(e) tr (H) tr XX' WX 1 X' W tr X' WX 1 X' WX tr[I p1 ] p 1
Problem 2: (a) The scatterplot below suggests positive association between the response
and the predictor. However, the relationship does not seem to be linear.
y<-baeskel$Tension; plot(x,y);
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
x
375.00
473.74
62.11
Adjusted R-squared:
0.8386
(c) From the residual plot (1st chart below), there is a systematic pattern (quadratic!) on the
residuals, violating the assumptions of the null plot. The Normal Q-Q plot (2nd chart)
suggests that (i) the studentized residuals have thinner tails than the normal distribution,
and (ii) there is no outlier as the studentized residuals are within [-2,2].
Page 1/5
(d) From the table of influence diagnostics below, |DFFITSi|<1 and |DFBETASi|<1 for all the
data points, suggesting that there is no influence point based on either the DFFITS or the
DFBETAS.
x
x
n
Problem 3: (a) ( X' X )
xj
j
2
j
x 2j
1
1
( X' X )
nSXX x j
1 x1
2
1 x2 1 x j
1
Hence, H X ( X' X ) X'
nSXX x j
1 x n
x j 1
n x1
xj
1 1
x2 xn
x x
x n n
2
hii
2
j
2 xi x j nx
nSXX
2
i
2
j
2 xi x j nxi2
nSXX
Page 2/5
SXX n( xi x ) 2 1 ( xi x ) 2
nSXX
n
SXX
x a, SXX (n 1) 2 (n 1) 2 2 (n 1)n 2 .
Hence,
1
2
n ( n 1)n 2
1 ( xi x ) 2 1 ( xi a ) 2
hii
2 2
n
SXX
n ( n 1)n 2 1 ( n 1)
2
n ( n 1)n
i 1,2, , n 1
in
1
i 1,2, , n 1
n 1
1
in
Problem 4: (a)
AIC=253.58
Step:
y~1
AIC=181.58
y ~ x4
Df Sum of Sq
RSS
AIC
Df Sum of Sq
RSS
AIC
+ x4
1661.66
884.73 181.57
+ x3
+ x2
+ x5
+ x1
+ x6
+ x3
+ x2
+ x6
<none>
+ x5
+ x1
<none>
2546.39 253.58
Page 3/5
884.73 181.57
1
Step:
AIC=171.37
Step:
y ~ x4 + x3
AIC=168.96
y ~ x4 + x3 + x6
Df Sum of Sq
RSS
AIC
Df Sum of Sq
RSS
AIC
+ x6
<none>
+ x2
+ x1
+ x2
+ x5
<none>
743.21 171.37
+ x1
+ x5
697.78 168.96
AIC=172.67
Step:
y ~ x1 + x2 + x3 + x4 + x5 + x6
Df Sum of Sq
RSS
AIC=169.42
y ~ x1 + x3 + x4 + x6
AIC
Df Sum of Sq
2.93
678.29 170.98
- x1
- x5
2.95
678.31 170.98
<none>
- x1
7.01
682.38 171.40
- x6
52.04
734.68 172.56
- x3
148.70
831.34 181.22
675.36 172.67
- x6
44.82
720.18 175.17
- x4
- x3
59.18
734.54 176.55
Step:
- x4
Step:
682.64 169.42
AIC=168.96
<none>
RSS
AIC
RSS
AIC
697.78 168.96
- x6
45.43
743.21 171.37
137.36
835.14 179.54
- x5
4.34
682.64 169.42
- x3
- x1
15.22
693.52 170.53
- x4
<none>
Df Sum of Sq
y ~ x1 + x3 + x4 + x5 + x6
697.78 168.96
y ~ x3 + x4 + x6
AIC=170.98
Df Sum of Sq
15.14
AIC
- x2
<none>
RSS
678.29 170.98
- x6
54.83
733.12 174.42
- x3
74.01
752.30 176.22
- x4
AIC=254.06
Step:
y~1
AIC=182.55
y ~ x4
Df Sum of Sq
RSS
AIC
Df Sum of Sq
RSS
AIC
+ x4
1661.66
884.73 182.54
+ x3
+ x2
+ x5
+ x1
+ x6
+ x3
<none>
+ x6
+ x2
+ x5
+ x1
<none>
2546.39 254.06
Page 4/5
884.73 182.54
Step:
AIC=172.83
Step:
y ~ x4 + x3
y ~ x4 + x3 + x6
Df Sum of Sq
+ x6
AIC=170.9
RSS
AIC
Df Sum of Sq
<none>
<none>
743.21 172.83
RSS
AIC
697.78 170.90
+ x1
+ x2
+ x2
+ x1
+ x5
+ x5
AIC=176.07
Step:
y ~ x1 + x2 + x3 + x4 + x5 + x6
Df Sum of Sq
y ~ x1 + x3 + x4 + x6
RSS
AIC
- x2
2.93
678.29 173.88
- x5
2.95
678.31 173.89
- x1
- x1
7.01
682.38 174.31
<none>
<none>
AIC=171.85
Df Sum of Sq
675.36 176.07
52.04
734.68 174.50
148.70
831.34 183.16
44.82
720.18 178.08
- x3
- x3
59.18
734.54 179.46
- x4
- x4
Step:
AIC=173.89
697.78 170.90
- x6
Step:
AIC
682.64 171.85
- x6
15.14
RSS
AIC=170.9
y ~ x3 + x4 + x6
y ~ x1 + x3 + x4 + x5 + x6
Df Sum of Sq
Df Sum of Sq
RSS
AIC
<none>
RSS
AIC
697.78 170.90
- x5
4.34
682.64 171.85
- x6
45.43
743.21 172.83
- x1
15.22
693.52 172.95
- x3
137.36
835.14 180.99
- x4
<none>
678.29 173.88
- x6
54.83
733.12 176.84
- x3
74.01
752.30 178.65
- x4
From the above, (x3, x4, x6) were chosen regardless of the model selection criteria (AIC/BIC) and the model
selection method (forward/backward) we used. Hence the parsimonious model (with R2 = 72.6%) is
height at age 18 = 12.82162 -0.35326 (weight at age 9) +1.25046 (height at age 9) -0.06729 (strength at age 9)
> fit2<-lm(y~x3+x4+x6); summary(fit2)
Estimate
Std. Error
t value
Pr(>|t|)
(Intercept)
12.82162
12.70635
1.009
0.316625
x3
-0.35326
0.09801
-3.605
0.000601 ***
x4
1.25046
0.11369
10.999
x6
-0.06729
0.03246
-2.073
0.042086 *
0.726,
Adjusted R-squared:
0.7135
> fitx5<-lm(x5~x1+x2+x3+x4+x6);
1/(1-summary(fitx5)$r.squared)
[1] 6.034562
Page 5/5