Professional Documents
Culture Documents
Confidence Interval, Model Fitness and Prediction: S S T B
Confidence Interval, Model Fitness and Prediction: S S T B
Example
Example
CEO data
• 90% CI for b0 and b1
o t0.05,57 = 1.672
• 90% CI for b0
o b0 = 242.70
o SE(b0) = 168.76
o 242.70 ± 1.672 × 168.76 = (-39.47, 524.87)
o Include zero
• 90% CI for b1
o b1 = 3.1327
o SE(b1) = 3.22
o 3.1327 ± 1.672 × 3.22 = (-2.26, 8.53)
o Include zero
1
3.2. Quality of fitted model
• Does the data fit the model adequately?
• Will the model predict the response well enough?
• R2 and correlation
o For simple linear regression (with only one regressor), the coefficient of determination is
just equal to the square of the correlation coefficient between X and Y.
o Since
S
SSR = b12 S XX (exercise), b1 = XY , SST = SYY
S XX
o Therefore
2
b2S S 2 S XX ⎛⎜ S XY ⎞
⎟ = r2
R = 1 XX = XY
2
=
SYY 2
S XX SYY ⎜⎝ S XX SYY ⎟
⎠
2
Example
Example
Shock data
• When all cases are considered
o R2 = 127.74 / 199.06 = 0.6417
o The linear regression model explains 64.17% of the variance of Y
o The model fits the data fairly well
• When obs. 3 and 14 are removed
o R2 = 101.35 / 125.33 = 0.8086
o The linear regression model explains 80.86% of the variance of Y
o The model fits the reduced data much better
Example
CEO data
• R2 = 45896/2820832 = 0.0163
o The linear regression model explains only 1.63% of the variance of Y
o The model does not fit the data well
3.3. Prediction
(a) Mean response
• Mean of Y given X = x0
E(Y | X = x0) = E( β0 + β 1 X + ε | X = x0) = β0 + β1 x0
• Estimated mean response
yˆ = b0 + b1 x0
• Standard error for mean response given X = x0
Var ( yˆ ) = Var (b 0 +b1 x0 )
= Var ( y + b1 (x0 − x )) Q b0 = y − b1 x
= Var ( y ) + Var (b1 ( x0 − x )) Q Cov( y , b1 ) = 0
⎛ 1 (x − x )2 ⎞
= σ 2 ⎜⎜ + 0 ⎟
⎝n S XX ⎟⎠
o Substitute s2 for σ2, standard error of estimated mean response
1 ( x0 − x )
2
s{yˆ } = s +
n S XX
3
• Confidence interval for mean response
o Under the condition of normal errors,
ŷ is normal since it is a linear combination of yi
s2 is σ 2 χ n2− 2 independent of ŷ
o A 100(1 – α)% confidence interval for mean response
1 (x0 − x )
2
yˆ ± tα / 2,n − 2 s +
n S XX
Example
160
140
120
100
80
60
40
20 40 60 80
Example
Shock data
• 90% CI for the mean responses at number of shocks = 0, 8 and 15
• t0.05,14 = 1.761
x0 yˆ (x0 ) SE[ yˆ (x0 )] 90% CI
0 10.48 1.0776 8.58 12.38
8 6.19 0.5676 4.58 6.58
15 1.29 1.0776 -0.61 3.19
4
• SE is symmetric
o SE[ yˆ (0)] = SE[ yˆ (15)]
• The confidence band does not include 90% of the observations. Why?
Example
CEO data
• 95% CI for the mean responses at number of shocks = 32, 55 and 74
• t0.025,57 = 2.002
x0 yˆ (x0 ) SE[ yˆ ( x0 )] 95% CI
32 342.949 69.287 204.204 481.694
55 415.001 30.815 353.294 476.708
74 474.523 77.944 318.442 630.603
• The confidence intervals are very width on both ends away from the center of the data
5
(b) Individual response
• Predicted individual value of Y given X = x0
Y | X = x0 = β 0 + β1 x0 + ε
• Estimated individual response
yˆ = b0 + b1 x0
• Prediction interval for individual response
o Consider a single observation at X = x0 denoted by y0 ( = β 0 + β1 x0 + ε ) which is
independent of ŷ
o Expected value of ŷ equals to expected value of y0
E ( y0 − yˆ ) = E (β 0 + β1 x0 + ε − (b0 + b1 x0 ))
= β 0 + β1 x0 − E (b0 ) − E (b1 )x0 = 0
o Variance of the difference between the observation and the prediction given X = x0
Var ( y0 − yˆ ) = Var ( y0 ) + Var ( yˆ )
⎛ 1 (x − x )2 ⎞
= σ 2 + σ 2 ⎜⎜ + 0 ⎟
⎟
⎝ n S XX ⎠
⎛ 1 ( x − x )2 ⎞
= σ 2 ⎜⎜1 + + 0 ⎟
⎝ n S XX ⎟⎠
o Under normal assumption, since y0 and ŷ are normally distributed,
y0 − yˆ
~ N (0,1)
1 ( x0 − x )
2
σ 1+ +
n S XX
o s2 ~ σ 2 χ n2− 2 and is independent of y0 − yˆ
Replace σ by s
y0 − yˆ
~ tn−2
1 (x − x )
2
s 1+ + 0
n S XX
Prediction interval (CI for individual response)
1 (x0 − x )
2
yˆ ± tα / 2,n −2 s 1 + +
n S XX
Example
⎛ 1 (70 − 50 ) ⎞
2
o SE ( yˆ (70 )) = 7.5⎜⎜1 + + ⎟ = 3.0220
⎝ 10 3400 ⎟⎠
• A 95% CI for individual prediction at lot size = 55
o 120 ± 2.306 × 2.8819 = (113.35,126.65)
6
• A 95% CI for individual prediction at lot size = 70
o 150 ± 2.306 × 3.0220 = (143.03,156.97 )
• Prediction interval (blue lines) is wider than mean response interval (red lines)
180
160
140
120
100
80
60
40
20 40 60 80
Example
Shock data
• 90% CI for the individual prediction at number of shocks = 0, 8 and 15
• t0.05,14 = 1.761
7
Example
CEO data
• 95% CI