Professional Documents
Culture Documents
FCDS - RA ch2 Sp21
FCDS - RA ch2 Sp21
FCDS - RA ch2 Sp21
(x i - x) (Yi - Y) n
ˆ 1 = i=1
n
= k i Yi (2.2)
(x
i=1
i - x) 2 i=1
Since the errors i are NID(0, σ2), the observations Yi are NID( β0 + β1xi , σ2). Now ̂1
is a linear combination of the r.v.’s Yi, so ̂1 is normally distributed with mean and
variance (see (1.11)),
2
E(ˆ 1 ) = 1 and var(ˆ 1 ) =
Sx x
Therefore, the statistic
ˆ -
1 1
Z=
/ Sx x
2
- 15 -
is n-2 . Furthermore, we can show that the MSE and ̂1 are independent r.v.'s. These
2
where tα/2,n-2 denotes the (α/2)100 percentile of the t distribution with n-2 d.f. Therefore
a 100(1 - α) % confidence limits for β1 are
MSE
ˆ 1 t ,n 2
2
Sx x
Example 2.1
Let us return to the Westwood Company lot size example 1.1. Suppose we want
a 95% confidence limits for β1. From the results obtained in example 1.1 we have,
MSE 7.5
= = .04697
Sx x 3400
and for a 95% confidence coefficient, we have from the t-distribution table,
t ,n2 t 0.025,8 2.306
2
- 16 -
2.0 - 2.306 (0.04697 ) 1 2.0 + 2.306 (0.04697 )
1.89 1 2.11
- 17 -
given in Example 1.1, using regression model (2.1). The two alternative hypotheses are:
H 0 : 1 = 0 VS H a : 1 0
Hence,
| ˆ 1 0 | 2.0
| Tc | = = = 42.58
MSE 0.04697
Sx x
and tα/2, n-2 =t0.025, 8= 2.306, hence | Tc | > t /2 , n-2 and therefore H0 must be rejected
- 18 -
ˆ 0 - 0
P -t ,n 2 t ,n 2 =1 -
2
1 x
2
2
MSE +
n Sx x
Therefore a 100(1 - α) % confidence limits for β0 are
1 x2
ˆ 0t ,n 2 MSE +
2
n Sx x
where tα/2,n-2 denotes the (α/2)100 percentile of the t distribution with n-2 d.f.
Example 2.2
Let us return to the Westwood Company lot size example 1.1. Suppose we want
a 95% confidence limits for β0.
From the results obtained in Examples 1.1 and 2.1, the 95% confidence interval
is given by
ˆ - t 1 x2 ˆ 1 x2
0 ,n 2
MSE + 0 0 t 2 ,n 2 MSE n +
2
n Sx x Sx x
10.0 2.306(2.503) 0 10.0 2.306(2.503)
4.23 0 15.77
- 19 -
2.4 Interval Estimation of the Mean Response
Let xh be the level of the regressor variable for which we wish to estimate the
mean response, say E[Y| xh ]. We assume that xh is any value of the regressor variable
within the range of the original data on x used to fit the model. An unbiased point
estimator of E[Y| xh ] is found from the fitted model as
E [Y | xh ] = yˆ h = ˆ 0 + ˆ 1xh
To obtain a 100(1-α)% C.I. on E[Y| xh ], first note that ŷ h is normally distributed r.v.
because it is a linear combination of linear r.v.'s ̂ 0 and ̂1 . Since, as noted in chapter
(1), cov( Y , ̂1 ) = 0, then the variance of ŷ h is
var ( yˆ h ) = var (ˆ 0 ˆ 1xh )
= var Y ˆ 1 (xh x)
= var ( Y ) + (xh x)2 var ( ˆ 1 )
1 (xh x)2
= + 2
n Sx x
Thus the sampling distribution of the statistic
yˆ h - E (Y | xh )
1 (xh x)2
MSE +
n Sx x
is t with n-2 d.f. Consequently a 100(1-α) % C.I. for the mean response at the point x
= xh is
1 (xh x)2
yˆ h t ,n 2 MSE + (2.11)
2
n Sx x
where tα/2,n-2 denotes the (α/2)100 percentile of the t distribution with n-2 d.f.
Example 2.3
Consider finding a 95% C.I. on E(Y| xh ) for the Westwood Company lot size
data in example 1.1. This confidence interval is found from (2.11) as
1 (x x)2 1 (xh x)2
yˆ h t ,n2 MSE + h E(Y | x h ) ˆ
y h t ,n 2 MSE +
2
n Sx x 2
n Sx x
- 20 -
If we substitute values of xh and the fitted value ŷ h at that value xh into this last
equation, we will obtain the 95% C.I. on the mean response. Table 2.1 contains the
95% confidence limits on E[Y| xh ] for several values of xh . These confidence limits
are illustrated graphically in Fig. 2.1 Note that the width of the confidence interval
increases as | xh - x | increases.
xh ŷ h Lower Upper
Confidence limit Confidence limit
20 50 46.185 53.815
30 70 67.053 72.947
40 90 87.728 92.272
50 110 108.002 111.998
60 130 127.728 132.272
70 150 147.053 152.947
80 170 166.185 173.815
- 21 -
Upper 95%
confidence limit
ŷ 10 2x
Lower 95%
confidence limit
- 22 -
freedom.
Example 2.4
The Westwood Company wishes to predict the number of man-hours required in
the forthcoming production run of size 55 with a 95% prediction interval.
From the results given before we have,
ŷ 0 = 10 + 2 (55) = 120
Thus the 95% prediction interval is found from (2.8) as
1 (x0 x)2 1 (x0 x)2
yˆ 0 t 2, n-2 MSE 1 + Y0 yˆ 0 t 2, n-2 MSE 1 +
n S xx n Sx x
( 55 - 50 )
2
(55 - 50)
2
120 - 2.306 7.5 1.0 + 0.1 + Y0 120 + 2.306 7.5 1.0 + 0.1 +
3400 3400
113.4 Y0 126.6
Yi Y = ŷ i - Y + Yi - yˆ i (2.13)
( Y - Y) = (yˆ - Y) + ( Y - yˆ i ) + 2 ( yˆ i - Y ) ( Yi - yˆ i )
2 2 2
i i i (2.14)
i=1 i=1 i=1 i=1
Note that the third term on the R.H.S. of (2.13) is zero since,
n n n n n
2 ( yˆ i - Y ) ( Yi - yˆ i ) = 2 yˆ i( Yi - yˆ i ) 2Y ( Yi - yˆ i ) = 2 yˆ iei - 2Y ei = 0
i=1 i=1 i=1 i=1 i=1
Therefore (2.14) becomes
- 23 -
n n n
( Yi - Y) = (yˆ i - Y) + ( Yi - yˆ i )
2 2 2
i=1
which measures the total variability in the observations.
n
SSR = ( yˆ i - Y ) = ˆ 1S xy = ˆ 12S xx = Re gression Sum of Squares
2
i=1
which measures the variability of Yi that is associated with the regression line. And
n n
SSE = (Yi yˆ i ) ei2 = SST SSR S yy ˆ 1S xy = S yy ˆ 12Sxx = Residual Sum of Squares
2
i=1 i 1
which measures the variability in the data with the regression line
Table 2.2. The ANOVA table for Testing Significance of regression line.
- 24 -
Source Sum of D.F. Mean Square Fc
Squares
Regression SSR = 13600 1 MSR=13600 Fc = 13600/7.5 = 1813
Residual SSE = 60 8 MSE = 7.5
Total SST = 13660 9
From the F distribution table we have, F.05 , 1 , 8 = 5.32. and since Fc = 1813 > 5.32 , we
conclude that H0 is highly significant, that is there is a strong linear association between
man-hours and lot-size. This result is identical with that obtained when the t test was
used in section 2.1.
- 25 -
0.403, that is, less than 50% of the total variation in the Yi is explained by the
regression.
If all observations fall on the fitted regression line, SSE = 0 and R2 = 1. This case is
shown in Fig. 2.2c.
If the slope of the fitted regression line is ̂1 = 0 so that yˆ i = Y , all i, SSR = 0,
SSE = SST and R2 = 0. This case is shown in Fig. 2.2d. Here, there is no linear
association between X and Y in the sample data.
For the Westwood Company lot size example 1.1., we obtained SST = 13660
and SSE = 60, hence
2 SSE 60
R =1 - =1 - = 0.996.
Sy y 13660
Thus, the variation in man-hours is reduced by 99.6% when lot size is considered.
- 26 -
(x - x ) (y - y) S xy
r= =
(x - x ) . (y - y )
2 2
S xx S yy
Hence,
2
S xy S xy S xy ˆ S xy SSR
2
r = = = 1 = = R2
S xx S yy S xx S yy S yy S yy
and therefore,
r= R 2 = Coefficient of det er mination
<+><+><+><+><+><+><+><+>
- 27 -
EXERCISES
d- Show E(
ˆ 2 ) E(MSR) 2 12Sxx
e- Show that ŷ 0 = ˆ 0 ˆ 1x0 is a random variable having the normal distribution
1 (x0 - x )2
with mean Y|x0 and variance +
2
. Use this result to setup 100(1-
n S xx
α) % confidence interval for Y|x0 .
[2] The following data show the optical densities of a certain substance at different
concentration levels. Assume that the simple linear regression model is
appropriate.
Level of concentration (X) : 80 120 160 200 240 280 320 360 400 440 480 520
Optical density (Y) : .17 .22 .28 .29 .36 .38 .43 .54 .55 .60 .62 .72
Summary calculation results are:
X = 300 , Y = .4325 , S xx = 228800 , S yy = .334625,
S xy = 275 , ˆ 0 = 0.07192, ˆ 1 = 0.0012
a- Calculate the sample correlation coefficient.
b- Find the equation of the regression line describing the linear relationship
between the Level of concentration and Optical density.
c- Predict the Optical density when the Level of concentration is x = 0.48 units.
d- Estimate σ2.
e- Calculate the coefficient of determination and interpret the result.
f- Obtain a 98% confidence interval of the mean optical density for substances their
concentration level is 500.
g- Obtain a 98% prediction interval of the optical density for a substance its
concentration level is 500.
h- Is the prediction interval in (b) wider than the confidence interval in (a)? Should it
be? Why?
i- Set up the ANOVA table, conduct an F test of whether or not β1 = 0 and stat the
- 28 -
decision rule and conclusion. Use α = 0.01.
j- What is the absolute magnitude of the reduction in the variation of Y when X is
introduced into the regression model? What is the relative reduction? What is the
name of the latter measure?
[3] The accompanying MINITAB output is based on some observational data for the
level of concentration (X) and optical density (Y).
Regression Analysis
The regression equation is
Y = █ + 0.0012 X
Predictor Coef Stdev t-ratio p
Constant 0.0719 0.01397 5.15 0.000
X █ 0.00004231 28.41 0.000
s = 0.02024 R-sq = █ R-sq(adj) = █
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 █ █ 806.92 0.000
Error █ █ 0.00041
Total 11 0.33463
- 29 -
The regression equation is
y = ♦ + 1.00 x
Predictor Coef SE Coef T P
Constant 20.000 1.749 ♦ 0.000
x ♦ 0.05538 18.06 0.000
S=1.73205 R-Sq=♦% R-Sq(adj)= ♦%
Analysis of Variance
Source DF SS MS F P
Regression 1 ♦ 978.0 ♦ 0.000
Residual Error ♦ ♦ 3.0
Total 9 1002.0
[7] In a study of the effect of a dietary component (X) on plasma lipid composition (Y),
the following data were obtained on a sample of 10 experimental animals
X 18 19 41 34 45 32 40 42 28 21
Y 35 39 53 54 66 50 57 62 46 38
a- Obtain the least squares estimates of β0 and β1 and state the estimated regression
equation function.
b- Calculate the residuals and verify that their sum is zero. What is their relation to
the error terms εi's ?
c- Obtain the estimate of σ2 directly from the residuals and using its formula.
d- Test the null hypothesis H0 : β1 = 0 against the alternative Ha : β1 0 , α =
0.01.What is your conclusion?
e- Obtain a point estimate and a 98% confidence interval of the expected plasma
- 30 -
lipid composition when the dietary component X = 35.
f- Obtain a 98% prediction interval of the expected plasma lipid composition when
the dietary component X = 35.
g- Is the prediction interval in (f) wider than the confidence interval in (e)?
Should it be? Why?
[8] In the spaces provide, write T or F to indicate whether the statement is true or false
a. The prediction interval for the response at a new observation x0 is wider when x0
is far from X ---
b. The regression line Y = β0 + β1 x2 + ε can be considered as a linear regression
model ----
c. When all the observations in a given problem fall on the fitted regression line
then R2=1 ---
d. In regression analysis, if there is no linear relationship between X and Y, this
implies that we do not reject the null hypothesis β0 = 0 --
e. In regression analysis, the error terms εi included in the hypothetical model must
n
satisfy; x
i=1
i i 0 ---
<-><-><-><-><-><-><-><-><->
- 31 -