FCDS - RA ch2 Sp21

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Chapter 2

INFERENCES IN REGRESSION ANALYSIS

2.1 Model Assumptions


We are often interested in testing hypotheses and constructing confidence
intervals about the model parameters. These procedures require, as we have stated in the
previous chapter, that we make the additional assumption of normality of the errors εi .
Therefore, throughout this chapter, we assume that the normal error regression model is
applicable. This model is:
Yi  0  1 xi  i , i = 1 , 2 , ... , n (2.1)
where
i- β0 and β1 are unknown parameters.
ii- The regressors x1 ,x2 , ,xn are known constants.
iii- The error terms  1,  2, ... ,  n are assumed to be NID(0, σ2 ).

2.2 Inferences Concerning β1


The point estimator of β1, given by (1.7), can be written as follows
n

(x i - x) (Yi - Y) n
ˆ 1 = i=1
n
=  k i Yi (2.2)
(x
i=1
i - x) 2 i=1

Since the errors  i are NID(0, σ2), the observations Yi are NID( β0 + β1xi , σ2). Now ̂1
is a linear combination of the r.v.’s Yi, so ̂1 is normally distributed with mean and
variance (see (1.11)),

2
E(ˆ 1 ) = 1 and var(ˆ 1 ) =
Sx x
Therefore, the statistic
ˆ -
 1 1
Z=
 / Sx x
2

is distributed as N(0,1). However, the residual variance σ2 is usually unknown and


therefore we can't use the statistic Z in inferences for β1. Since the residual mean square
error MSE is an unbiased estimator of σ2, and the distribution of the statistic,
2
(n - 2) MSE e 
U= =  i 
 
2

- 15 -
is  n-2 . Furthermore, we can show that the MSE and ̂1 are independent r.v.'s. These
2

conditions imply that the statistic,


Z ˆ 1 - 1 ˆ 1 - 1
T= = = (2.3)
 MSE
2
U/( n - 2 ) MSE
. 2
Sx x  Sx x
is distributed as the t-distribution with (n - 2) degrees of freedom.

2.2.1 Confidence interval for β1


Since
ˆ 1 - 1
T= ~ t n 2
MSE
Sx x
then
 
 
 ˆ 1 - 1
P -t  ,n 2   t  ,n  2  = 1 - 
 2 MSE 2 
 
 Sx x 
From which it follows that
 MSE MSE 
P  ˆ 1 - t  ,n 2  1  ˆ 1 + t  ,n 2  =1- 
 Sx x Sx x 
2 2

where tα/2,n-2 denotes the (α/2)100 percentile of the t distribution with n-2 d.f. Therefore
a 100(1 - α) % confidence limits for β1 are
MSE
ˆ 1 t  ,n 2
2
Sx x
Example 2.1
Let us return to the Westwood Company lot size example 1.1. Suppose we want
a 95% confidence limits for β1. From the results obtained in example 1.1 we have,
MSE 7.5
= = .04697
Sx x 3400
and for a 95% confidence coefficient, we have from the t-distribution table,
t  ,n2  t 0.025,8  2.306
2

Therefore the 95% confidence interval is given by


ˆ - t MSE ˆ  t MSE
 1 ,n  2
 1   1 ,n  2
2
Sx x 2
Sx x

- 16 -
2.0 - 2.306 (0.04697 )  1  2.0 + 2.306 (0.04697 )
1.89  1  2.11

2.2.2 Tests concerning β1


Suppose that we wish to test the hypothesis that the slope equals a specific value,
say β10. The appropriate hypotheses are
H0 : 1 = 10 VS Ha : 1  10 (2.4)
For such a test, the appropriate test statistic is the T statistic given by (2.3). Thus, if H0
is true, then
ˆ 1 - 1 0
T= ~ t n 2 (2.5)
MSE
Sx x
The statistic T is used to test H0 by comparing the calculated value Tc from (2.5) with
the upper α/2 percentage point of the tn-2 distribution ( tα/2, n-2 ) and rejecting the null
hypothesis if
| Tc | > t /2 , n-2
A very important special case of (2.4) is
H0 : 1 = 0 VS Ha : 1  0 (2.6)
This hypothesis relates to the significance of regression. Failing to reject H0 : β1 = 0
implies that there is no linear relationship between X and Y. This is illustrated in Fig.
2.1. Note that this may imply either that X is of little value in explaining the variation in
Y and that the best estimator of Y for any x is ŷ = Y or that the true relationship
between X and Y is not linear (Fig. 2.1)

Fig. 2.1 Situations where the hypothesis H a : 1  0 can not be rejected


For example, suppose that we are interested in testing whether or not there is a
linear association between man-hours and lot size of the Westwood Company data

- 17 -
given in Example 1.1, using regression model (2.1). The two alternative hypotheses are:
H 0 : 1 = 0 VS H a : 1  0
Hence,
| ˆ 1  0 | 2.0
| Tc | = = = 42.58
MSE 0.04697
Sx x
and tα/2, n-2 =t0.025, 8= 2.306, hence | Tc | > t /2 , n-2 and therefore H0 must be rejected

indicating that the regression line is highly significant.

2.3 Inferences Concerning β0


The point estimator of β0 was given in (1.8) as follows
ˆ = Y - ˆ x 0 1 (2.7)
Again, since the observations Yi are NID( β0 + β1xi , σ2) and ̂ 0 is a linear combination
of the observations Yi, so ̂ 0 is normally distributed with mean
E(ˆ 0 ) =  0
and variance ,
2 1 x 
2
ˆ
var( 0 ) =   + 
 n Sx x 
Therefore the statistic
ˆ 0 -  0
Z=
 1 x2 
  +
2

 n Sx x 
is distributed as N(0,1). Similarly, the statistic,
ˆ 0 -  0
T= (2.8)
 1 x2 
MSE  + 
 n Sx x 
is distributed as the t-distribution with (n - 2) degrees of freedom.

2.3.1 Confidence interval for β0


ˆ 0 -  0
Since T = ~ t n 2 , then
1 x  2
MSE  + 
 n Sx x 

- 18 -
 
 
 ˆ 0 -  0 
P  -t  ,n 2   t  ,n 2  =1 - 

2
 1 x
2
 2

 MSE  +  
  n Sx x  
Therefore a 100(1 - α) % confidence limits for β0 are
 1 x2 
ˆ 0t  ,n 2 MSE  + 
2
 n Sx x 
where tα/2,n-2 denotes the (α/2)100 percentile of the t distribution with n-2 d.f.

Example 2.2
Let us return to the Westwood Company lot size example 1.1. Suppose we want
a 95% confidence limits for β0.
From the results obtained in Examples 1.1 and 2.1, the 95% confidence interval
is given by

ˆ - t   1 x2  ˆ  1 x2 
0 ,n  2
MSE  +    0   0  t 2 ,n 2 MSE  n + 
2
 n Sx x   Sx x 
10.0  2.306(2.503)   0  10.0  2.306(2.503)
4.23   0  15.77

2.3.2 Tests concerning β0


Suppose that we wish to test the hypothesis that the slope equals a specific value,
say β00. The appropriate hypotheses are
H0 : 0 = 00 VS Ha : 0  00 (2.9)
For such a test, the appropriate test statistic is the T statistic given by (2.8). Thus, if H0
is true, the statistic
ˆ 0   00
T= ~ t n 2 (2.10)
 1 x2 
MSE  + 
 n Sx x 
The statistic T is used to test H0 by comparing the calculated value Tc from (2.10) with
the upper α/2 percentage point of the tn-2 distribution ( tα/2 , n-2 ) and rejecting the null
hypothesis if
| T0 | > t /2,n-2

- 19 -
2.4 Interval Estimation of the Mean Response
Let xh be the level of the regressor variable for which we wish to estimate the
mean response, say E[Y| xh ]. We assume that xh is any value of the regressor variable
within the range of the original data on x used to fit the model. An unbiased point
estimator of E[Y| xh ] is found from the fitted model as

E [Y | xh ] = yˆ h = ˆ 0 + ˆ 1xh
To obtain a 100(1-α)% C.I. on E[Y| xh ], first note that ŷ h is normally distributed r.v.
because it is a linear combination of linear r.v.'s ̂ 0 and ̂1 . Since, as noted in chapter
(1), cov( Y , ̂1 ) = 0, then the variance of ŷ h is
var ( yˆ h ) = var (ˆ 0  ˆ 1xh )


= var Y  ˆ 1 (xh  x) 
= var ( Y ) + (xh  x)2 var ( ˆ 1 )
 1 (xh  x)2 
=  + 2

n Sx x 
Thus the sampling distribution of the statistic
yˆ h - E (Y | xh )
 1 (xh  x)2 
MSE  + 
n Sx x 
is t with n-2 d.f. Consequently a 100(1-α) % C.I. for the mean response at the point x
= xh is
 1 (xh  x)2 
yˆ h t  ,n 2 MSE  +  (2.11)
2
n Sx x 
where tα/2,n-2 denotes the (α/2)100 percentile of the t distribution with n-2 d.f.

Example 2.3
Consider finding a 95% C.I. on E(Y| xh ) for the Westwood Company lot size
data in example 1.1. This confidence interval is found from (2.11) as
 1 (x  x)2   1 (xh  x)2 
yˆ h  t  ,n2 MSE  + h   E(Y | x h )  ˆ
y h  t  ,n  2 MSE  + 
2
 n Sx x  2
n Sx x 

 (x  50)2   (xh  50)2 


yˆ h - 2.306 7.5 0.1 + h   E(Y | x h )  ˆ
y h  2.306 7.5  0.1 + 
 3400   3400 

- 20 -
If we substitute values of xh and the fitted value ŷ h at that value xh into this last
equation, we will obtain the 95% C.I. on the mean response. Table 2.1 contains the
95% confidence limits on E[Y| xh ] for several values of xh . These confidence limits
are illustrated graphically in Fig. 2.1 Note that the width of the confidence interval
increases as | xh - x | increases.

xh ŷ h Lower Upper
Confidence limit Confidence limit

20 50 46.185 53.815
30 70 67.053 72.947
40 90 87.728 92.272
50 110 108.002 111.998
60 130 127.728 132.272
70 150 147.053 152.947
80 170 166.185 173.815

Table 2.1 Confidence Limits on E[Y/xh] for several values of xh

2.5 Prediction of New Observation


An important application of the regression model is the prediction of new
observation Y corresponding to a specified level of the regressor variable X. The
distinction between estimation of the mean response E[Y| xh ], discussed in the
preceding section, and prediction of a new response Y0(new) or Y0 (here Y0 is a future
observation of Y corresponding to x = x0 ), discussed now, is basic. In the former case,
we estimate the mean of the distribution of Y. In the present case, we predict an
individual outcome drawn from the distribution of Y. Of course, the great majority of
individual outcomes deviate from the mean response, and this must be allowed for in
the procedure for predicting Y0.
Now consider obtaining an interval estimate of this future observation Y0. The C.I. on
the mean response E[Y| x 0 ] (Equation (2.11)) is inappropriate for this problem because
it is an interval estimate on the mean of Y (a parameter), not a probability statement
about future observations from the distribution. We will develop a prediction interval
for the future observation Y0. Consider the r.v., Y0 - E[Y|x0], since the mean response
E[Y| x 0 ] of Y0 is unknown, we take its unbiased point estimate ŷ 0 and therefore we
consider in replace the r.v.,
Y = Y0 - yˆ 0

- 21 -
Upper 95%
confidence limit

ŷ  10  2x

Lower 95%
confidence limit

Fig. 2.1 Confidence band for the regression line


Westwood Company example.

Then Y is normally distributed with mean zero and variance


var (Y) = var (Y0  yˆ 0 )
= var(Y0 )  var(yˆ 0 )
 1 (xh  x)2 
=   +
2 2

n Sx x 
2 1 (xh  x)2 
  1  + 
 n Sx x 
because the future observation Y0 is independent of ŷ 0 . Thus the sampling distribution
of the statistic
Y0 - yˆ 0
 1 (x0  x)2 
MSE 1  + 
 n Sx x 
is the t distribution with n-2 d.f. Consequently a 100(1-α) % prediction interval on a
future observation at x0 is
 1 (x0  x)2 
yˆ 0 t  2, n-2 MSE 1  +  (2.12)
 n Sx x 
where tα/2,n-2 denotes the (α/2)100 percentile of the t distribution with n-2 degrees of

- 22 -
freedom.

Example 2.4
The Westwood Company wishes to predict the number of man-hours required in
the forthcoming production run of size 55 with a 95% prediction interval.
From the results given before we have,
ŷ 0 = 10 + 2 (55) = 120
Thus the 95% prediction interval is found from (2.8) as
 1 (x0  x)2   1 (x0  x)2 
yˆ 0  t  2, n-2 MSE 1  +   Y0  yˆ 0  t  2, n-2 MSE 1  + 
 n S xx   n Sx x 

 ( 55 - 50 ) 
2
 (55 - 50) 
2
120 - 2.306 7.5 1.0 + 0.1 +   Y0  120 + 2.306 7.5 1.0 + 0.1 + 
 3400   3400 
113.4  Y0  126.6

2.6 ANOVA Approach to Regression Analysis


An alternative way for testing the statistical hypotheses (2.6) is the analysis of
variance procedure. The analysis of variance approach is based on the partitioning of
sums of squares and degrees of freedom associated with the response variable Y.
Consider the total deviation Yi - Y , the basic quantity measuring the total variation of
the observations Yi. We can decompose this deviation as follows:

Yi  Y = ŷ i - Y + Yi - yˆ i (2.13)

Total Deviation of Deviation


deviation fitted around fitted
regression regression
value around line
mean
Squaring both sides of (2.13) and summing over all n observations produces
n n n n

( Y - Y) = (yˆ - Y) + ( Y - yˆ i ) + 2 ( yˆ i - Y ) ( Yi - yˆ i )
2 2 2
i i i (2.14)
i=1 i=1 i=1 i=1
Note that the third term on the R.H.S. of (2.13) is zero since,
n n n n n
2  ( yˆ i - Y ) ( Yi - yˆ i ) = 2  yˆ i( Yi - yˆ i )  2Y  ( Yi - yˆ i ) = 2  yˆ iei - 2Y ei = 0
i=1 i=1 i=1 i=1 i=1
Therefore (2.14) becomes

- 23 -
n n n

( Yi - Y) = (yˆ i - Y) + ( Yi - yˆ i )
2 2 2

i=1 i=1 i=1 (2.15)


SST = SSR + SSE
where
n
SST = ( Yi - Y ) = S yy = Total Sum of Squares
2

i=1
which measures the total variability in the observations.
n
SSR = ( yˆ i - Y ) = ˆ 1S xy = ˆ 12S xx = Re gression Sum of Squares
2

i=1
which measures the variability of Yi that is associated with the regression line. And
n n
SSE = (Yi  yˆ i )   ei2 = SST  SSR  S yy  ˆ 1S xy = S yy  ˆ 12Sxx = Residual Sum of Squares
2

i=1 i 1
which measures the variability in the data with the regression line

Source Sum of Squares D.F. Mean Square F0


Regression ˆ 2S
SSR =  1 MSR MSR
1 xx F
MSE
Residual ˆ 2S
SSE = Syy -  n-2 MSE
1 xx

Total SST = Syy n-1

Table 2.2. The ANOVA table for Testing Significance of regression line.

The corresponding ANOVA table is shown in table (2.2).


To test the hypothesis H0 : β1=0 , the test statistic used in the ANOVA table is
SSR /1 MSR
Fc = =
SSE / (n - 2) MSE
where MSR and MSE are the regression and residual mean squares respectively. The
expected values of these mean squares are
E ( MSR ) =  2 + 1 Sx x
2
E ( MSE ) =  2 and
Then if the null hypothesis H0 : β1 = 0 is true, the test statistic Fc follows the F1, n-2
distribution. The expected mean squares indicate that if the observed value of Fc is
large, then it is likely that the slope β1  0. Therefore, to test the hypothesis H0 : β1 = 0,
compute the test statistic Fc and reject H0 at α significance level if
Fc > Fα , 1 , n-2
For example, the ANOVA table for the Westwood Company example is shown in table
(2.3).

- 24 -
Source Sum of D.F. Mean Square Fc
Squares
Regression SSR = 13600 1 MSR=13600 Fc = 13600/7.5 = 1813
Residual SSE = 60 8 MSE = 7.5
Total SST = 13660 9

Table 2.3. The ANOVA table for Westwood Company.

From the F distribution table we have, F.05 , 1 , 8 = 5.32. and since Fc = 1813 > 5.32 , we
conclude that H0 is highly significant, that is there is a strong linear association between
man-hours and lot-size. This result is identical with that obtained when the t test was
used in section 2.1.

2.7 Coefficient of Determination


The quantity
SSR SSE
R2 = =1-
S yy S yy
is called the coefficient of determination. Since Syy is a measure of the variability in Y
without considering the effect of the regressor variable X and SSE is a measure of the
variability in Y remaining after X has been considered, R2 is often called the proportion
of variation explained by the regressor X. Because
0  SSR  Sy y
it follows that
0  R2  1
The coefficient of determination measures the closeness of fit of the regression equation
to the observed values of Y. When the quantities ( y i - yˆ i ) are small, the SSE is small.
This leads to a large SSR that leads, in turn, to a large value of R2. This is illustrated in
Fig.2.2.
In Fig. 2.2a we see that the observations all lie close to the regression line, and
we would expect R2 to be large. In fact, the computed R2 for this data is 0.986,
indicating that about 99% of the total variation in the Yi is explained by the regression.
In Fig. 2.2b we illustrate a case where the Yi are widely scattered about the
regression line, and there we suspect that R2 is small. The computed R2 for this data is

- 25 -
0.403, that is, less than 50% of the total variation in the Yi is explained by the
regression.
If all observations fall on the fitted regression line, SSE = 0 and R2 = 1. This case is
shown in Fig. 2.2c.
If the slope of the fitted regression line is ̂1 = 0 so that yˆ i = Y , all i, SSR = 0,
SSE = SST and R2 = 0. This case is shown in Fig. 2.2d. Here, there is no linear
association between X and Y in the sample data.
For the Westwood Company lot size example 1.1., we obtained SST = 13660
and SSE = 60, hence
2 SSE 60
R =1 - =1 - = 0.996.
Sy y 13660
Thus, the variation in man-hours is reduced by 99.6% when lot size is considered.

Fig. 2.2 R2 as a measure of closeness of fit of the regression line.

2.7.1 Correlation Coefficient


The sample correlation coefficient between the variables X and Y is

- 26 -
 (x - x ) (y - y) S xy
r= =
(x - x ) . (y - y )
2 2
S xx S yy
Hence,
2
S xy  S xy  S xy ˆ S xy SSR
2
r = =  = 1 = = R2
S xx S yy  S xx  S yy S yy S yy
and therefore,
r=  R 2 =  Coefficient of det er mination

2.7.2 Adjusted Coefficient of Determination


The coefficient of determination R2 is biased estimator for the population
coefficient of determination ρ2. An unbiased point estimator for ρ2 is provided by
2 SSE /( n - 2) (n - 1) MSE
R =1- =1-
S yy /(n - 1) SST
and is called the Adjusted Coefficient of Determination.
For the Westwood Company lot size example 1.1, we have
2 ( n - 1 ) MSE ( 9 ) 7.5
R =1- =1- = 0.995.
SST 13660

<+><+><+><+><+><+><+><+>

- 27 -
EXERCISES

[1] For the simple linear regression model,


Yi  0  1 xi  i , i = 1 , 2 , ... , n
where 1 ,  2 ,, n are independent N(0,σ ) r.v.'s. Making use of the fact that the
2

least squares estimates of β0 and β1 are given by ˆ = Y - ˆ x and ˆ = S / S


0 1 1 xy xx
and let e1 ,e2 ,,en be the resulting residuals; prove that:
a- Show that cov(ˆ 0 , ˆ 1 )   x 2 / Sxx
b- Show that cov(Y, ˆ )  0 .
1
c- Show E(
ˆ )  E(MSE)   2
2

d- Show E(
ˆ 2 )  E(MSR)   2  12Sxx
e- Show that ŷ 0 = ˆ 0  ˆ 1x0 is a random variable having the normal distribution
 1 (x0 - x )2 
with mean  Y|x0 and variance   +
2
 . Use this result to setup 100(1-
 n S xx 
α) % confidence interval for  Y|x0 .

[2] The following data show the optical densities of a certain substance at different
concentration levels. Assume that the simple linear regression model is
appropriate.
Level of concentration (X) : 80 120 160 200 240 280 320 360 400 440 480 520
Optical density (Y) : .17 .22 .28 .29 .36 .38 .43 .54 .55 .60 .62 .72
Summary calculation results are:
X = 300 , Y = .4325 , S xx = 228800 , S yy = .334625,
S xy = 275 , ˆ 0 = 0.07192, ˆ 1 = 0.0012
a- Calculate the sample correlation coefficient.
b- Find the equation of the regression line describing the linear relationship
between the Level of concentration and Optical density.
c- Predict the Optical density when the Level of concentration is x = 0.48 units.
d- Estimate σ2.
e- Calculate the coefficient of determination and interpret the result.
f- Obtain a 98% confidence interval of the mean optical density for substances their
concentration level is 500.
g- Obtain a 98% prediction interval of the optical density for a substance its
concentration level is 500.
h- Is the prediction interval in (b) wider than the confidence interval in (a)? Should it
be? Why?
i- Set up the ANOVA table, conduct an F test of whether or not β1 = 0 and stat the

- 28 -
decision rule and conclusion. Use α = 0.01.
j- What is the absolute magnitude of the reduction in the variation of Y when X is
introduced into the regression model? What is the relative reduction? What is the
name of the latter measure?
[3] The accompanying MINITAB output is based on some observational data for the
level of concentration (X) and optical density (Y).
Regression Analysis
The regression equation is
Y = █ + 0.0012 X
Predictor Coef Stdev t-ratio p
Constant 0.0719 0.01397 5.15 0.000
X █ 0.00004231 28.41 0.000
s = 0.02024 R-sq = █ R-sq(adj) = █

Analysis of Variance
SOURCE DF SS MS F p
Regression 1 █ █ 806.92 0.000
Error █ █ 0.00041
Total 11 0.33463

a- Fill in all places with █.


b- What is the sample size n?
c- What is the sample correlation coefficient? is it +ve or -ve? why?
d- Does there appear to be a useful linear relationship between the two variables? Why
e- Given that X = 300 and S x x = 228800 , obtain a 98% confidence interval of
the mean optical density for substances their concentration level is 500.
[4]a- In the simple linear regression model (2.1), suppose the value of the predictor X
is replaced by cX, where c is some non zero constant. How are ̂ 0 , ̂1 , ̂ 2 , R2
and the t -test of H0: β1 = 0 affected by this change?
b- Suppose each value of the response Y is replaced by dY, for some d ≠ 0. Repeat
(a).
[5] Various doses of poisonous substance were given to groups of 25 mice and the
following results were observed;
Dose (mg) X : 4 6 8 10 12 14 16 18 20 22
Number of deaths Y: 1 3 8 9 14 16 20 21 26 27
a. Find the equation of the least squares line fit to these data.
b. Find a 95% prediction interval for the number of deaths in a group of 25 mice
who receive a 7 mg dose of this poison.
c. Test the null hypothesis H0 : β1 = 0 against the alternative Ha : β1  0 , α = 0.1
[6] A given data about the body weight (X) and metabolic clearance rate/body
weight (Y) of cattle gives the following MINITAB output:
Regression Analysis: y versus x

- 29 -
The regression equation is
y = ♦ + 1.00 x
Predictor Coef SE Coef T P
Constant 20.000 1.749 ♦ 0.000
x ♦ 0.05538 18.06 0.000
S=1.73205 R-Sq=♦% R-Sq(adj)= ♦%
Analysis of Variance
Source DF SS MS F P
Regression 1 ♦ 978.0 ♦ 0.000
Residual Error ♦ ♦ 3.0
Total 9 1002.0

a- Fill all places with ♦.


b- What is the sample size n?
c- What is the sample correlation coefficient? Is it +ve or –ve? Why?
d- Does there appear to be a useful linear relationship between the two variables?
Why?
e- Given that X = 30 , Sxx = 978 , obtain a 98 % confidence interval of the mean
rate/body weight (Y) for a body weight (X) 40.
f- Obtain a 98 % prediction interval of the mean rate/body weight (Y) for a body
weight(X) 40.
g- Is the prediction interval in (f) wider than the confidence interval in (e)?
Should it be? Why?

[7] In a study of the effect of a dietary component (X) on plasma lipid composition (Y),
the following data were obtained on a sample of 10 experimental animals

X 18 19 41 34 45 32 40 42 28 21
Y 35 39 53 54 66 50 57 62 46 38
a- Obtain the least squares estimates of β0 and β1 and state the estimated regression
equation function.
b- Calculate the residuals and verify that their sum is zero. What is their relation to
the error terms εi's ?
c- Obtain the estimate of σ2 directly from the residuals and using its formula.
d- Test the null hypothesis H0 : β1 = 0 against the alternative Ha : β1  0 , α =
0.01.What is your conclusion?
e- Obtain a point estimate and a 98% confidence interval of the expected plasma

- 30 -
lipid composition when the dietary component X = 35.
f- Obtain a 98% prediction interval of the expected plasma lipid composition when
the dietary component X = 35.
g- Is the prediction interval in (f) wider than the confidence interval in (e)?
Should it be? Why?
[8] In the spaces provide, write T or F to indicate whether the statement is true or false
a. The prediction interval for the response at a new observation x0 is wider when x0
is far from X ---
b. The regression line Y = β0 + β1 x2 + ε can be considered as a linear regression
model ----
c. When all the observations in a given problem fall on the fitted regression line
then R2=1 ---
d. In regression analysis, if there is no linear relationship between X and Y, this
implies that we do not reject the null hypothesis β0 = 0 --
e. In regression analysis, the error terms εi included in the hypothetical model must
n
satisfy; x 
i=1
i i  0 ---

f. The adjusted R2 is always smaller that R2 itself ------


g. In simple linear regression, if x1 > x2 then var( ŷ1 ) > var( ŷ2 ) ------

<-><-><-><-><-><-><-><-><->

- 31 -

You might also like