Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

University of Alberta

Department of Mathematical and Statistical Sciences

STAT 252

EXTRA FINAL REVIEW QUESTIONS

Instructor: Greg Wagner

SOLUTIONS

The questions below are extra parts of questions that are on the Practice Final Exam. These are to help
you get a ready for a greater variety of possible questions. So for each of the questions below, look at the
information and incomplete computer output given in the relevant question in the Practice Final Exam in
order to answer them.

Question 3 (see Question 3 in Practice Final Exam for information and computer output)

(e) (1 mark): What percentage of the variation in cholesterol level in men is explained by the regression
of cholesterol level against fat consumption?

The percentage of variation in the response variable that is explained by the variation in the explanatory
variable is the coefficient of determination, which was calculated in part (b).
Therefore, 91.0% of the variation in cholesterol level in men is explained by the regression of cholesterol
level against fat consumption.

(f) (1 mark): Find the standard error of the model.

SS ERROR 502.165
MS ERROR = = = 83.6942
n−2 8−2
The standard error of the model is: ˆ = MS ERROR = 9.148

(g) (3 marks): Calculate a 99% confidence interval for the slope of the regression line. Use this
confidence interval to write a conclusion about whether the slope the regression line is significant.

At the 99% confidence level and df = 8 – 2 = 6, t0.01/2 = t0.005 = 3.707

A 99% confidence interval for the slope is:


ˆi  t /2  SE ( ˆi )
1.790  3.707  0.230
1.790 ± 0.85261
(0.94, 2.64)

Interpretation: We can be 99% confident that the slope of population regression line is somewhere
between 0.94 and 2.64.

Since 0 is not inside this confidence interval, we can be 99% confident that the slope of the regression
line is significantly different from 0. In other words, the slope of the regression line is significant.

1
(h) (3 marks): Although the questions states that the data fit the assumptions of linear regression
analysis, based on the graphs shown above, verify that each of the assumptions are met (excluding
the independence assumption).

1. The assumption on linearity can be verified since, in the scatterplot, the data points fall fairly closely
along a straight line. Moreover, in the residual plot the data points form a horizontal, linear pattern
along the residual axis.
2. The assumption of equal standard deviations can be verified in the residual plot (as well as the
scatterplot) since the deviations of y-values are approximately the same for all values of x.
3. The assumption concerning normality can be verified since, in the normal probability plot, the data
points fall roughly along a straight line.

Question 4 (see Question 4 in Practice Final Exam for information and computer output)

(c) (5 marks): At the 1% significance level, test whether there is a main effect of Method (Method 1
versus 2) on the strength of the metals.

H0: There is no main effect of method on the strength of the metals. (The means for methods are equal).
Ha: There is a main effect of method on the strength of the metals. (At least two means for methods are
different)

SSA / (a − 1) MSA 48.214 / (2 − 1) 48.214


FA = = = = = 0.032
SSE / (n − ab) MSE 53485.714 / (42 − 3  2) 1485.714
df (level of method) = [(a – 1), (n-ab)]
= [(2 – 1), (42 – (2)(3)] =(1, 36)
P > 0.25 There is weak evidence against Ho.
Since P > α (0.05), do not reject Ho.

Conclusion: The data do not provide sufficient evidence to conclude that there is a significant main effect
of method on strength of the metals.

(d) (5 marks): At the 1% significance level, test whether there is a main effect of the type of metal (steel,
alloy and titanium) on the strength.

H0: There is no main effect of the type of metal on the strength. (The means for metals are equal).
Ha: There is a main effect of type of metal on the strength. (At least two means for metals are different)

SSB / (b − 1) MSB 30389.286 / (3 − 1) 15194.643


FB = = = = = 10.227
SSE / (n − ab) MSE 53485.714 / (42 − 3  2) 1485.714
df (metal) = [(a – 1), (n-ab)]
= [(3 – 1), (42 – (2)(3)] =(2, 36)
P < 0.001 There is extremely strong evidence against Ho.
Since P < α (0.05), reject Ho

Conclusion: The data provide sufficient evidence to conclude that there is a significant main effect of the
type of metal on strength.

2
(e) (5 marks): At the 1% significance level, test whether the effect of Method on the strength of the
metals depends upon the type of Metal being produced, in other words, test whether there is an
interaction effect between Method and Metal.

H0: There is no interaction effect between method and type of metal.


Ha: There is an interaction effect between method and type of metal.

SSAB / (a − 1)(b − 1) MSAB 10246.429 / (2 − 1)(3 − 1) 5123.214


FAB = = = = = 3.448
SSE / (n − ab) MSE 53485.714 / (42 − 3  2) 1485.714

df (Interaction) = [(a – 1)(b-1), (n-ab)]


= [(2 – 1)(3 – 1), (42 – (2)(3)] =(2, 36)
0.05 > P > 0.025. So, there is strong evidence against H0.
Since P > α (0.01), do not reject Ho. (Since a significance level of 1% requires very strong evidence.)

Conclusion: The data do not provide sufficient evidence to conclude that there is a significant interaction
effect between method and type of metal.

Question 5 (see Question 5 in Practice Final Exam for information and computer output)

(f) (2 marks): What percentage of the variation in diatom density is explained by (or accounted for by) the
regression model? (Note: Determine the adjusted percentage.)

SSTOTAL = SS REGR + SS ERROR = 7510.626 + 9.374 = 7520


SS 7520
MST = TOTAL = = 940
n −1 9 −1
MS ERROR 1.8748
2
Radj = 1− = 1− = 0.998
MSTOTAL 940

The adjusted coefficient of determination shows that 99.8% of the variation in diatom density is explained
by (or accounted for by) the regression model.

(g) (4 marks): At the 1% significance level, perform a hypothesis test to determine whether light intensity
is useful in predicting diatom density (in other words, whether the relationship between light intensity
and diatom density is significant).

Ho:  2 = 0 (Light intensity is not useful for predicting diatom density.)


Ha:  2  0 (Light intensity is useful for predicting diatom density.)
ˆ2 0.639
t= = = 4.204
SE ( ˆ2 ) 0.152
df = n − (k + 1) = 9 − (3 + 1) = 5
P-value: (0.005 > P > 0.0025) x 2  0.01 > P > 0.005 There is very strong evidence against Ho
Since P < α (0.01), reject Ho
Conclusion: At the 1% significance level, the data provide sufficient evidence to conclude that light
intensity is useful for predicting diatom density or the relationship between light intensity and diatom
density is significant.

3
(h) (4 marks): At the 1% significance level, perform a hypothesis test to determine whether there is a
significant negative relationship between depth and diatom density.

Ho: 1 = 0 (There is no significant relationship between depth and diatom density.)


Ha: 1  0 (There is a significant negative relationship between depth and diatom density)
ˆ1 −0.306
t= = = −2.593
SE ( ˆ1) 0.118

df = n − (k + 1) = 9 − (3 + 1) = 5
P-value: 0.025 > P > 0.02 There is strong evidence against Ho
Since P > α (0.01), do not reject Ho

Conclusion: At the 1% significance level, the data do not provide sufficient evidence to conclude that
there is a significant negative relationship between depth and diatom density.

(i) (1 mark): The marine ecologist only recorded depth, light intensity and diatom density. How, then,
were the numbers for the interaction term obtained?

The numbers in the interaction term were arrived at by multiplying each value for water depth by the
corresponding value for light intensity.

(j) (4 marks): Based on the values of the predictor variables given in part (d) (depth = 70 m, light = 18%,
interaction term = 1260), what is the 95% confidence interval for mean diatom density at those values
of the predictor variables? [Note again: SE(Fit) = 0.793]

At df = 5, t /2 = t0.05/2 = t0.025 = 2.571


Based on the values of the predictor variables given in part (d), yˆ p = 22.77

yˆ p  t /2  SE( Fit )
22.77  2.571 0.793
22.77  2.039
(20.73, 24.81)
We are 95% confident that mean diatom density at the values of the predictor variables given in part (d) is
between 20.73 and 24.81 cells per ml.

(k) (3 marks): Compare the length of the prediction interval in part (e) with the confidence interval in part
(j). Explain the difference between these two confidence intervals and explain any possible difference
in their lengths.

Based on the prediction interval in part (e), if we take random samples where the predictor variables have
the values given part (d), we can be 95% confident that any single observation of diatom density will
between 18.70 and 26.84 cells per ml; whereas, based on the confidence interval in part (j), we can be
95% confident that the means of those samples will be between 20.73 and 24.81. The reason for this is
that a confidence interval for the mean response of the response variable at given values of the predictor
variables will always be shorter than the prediction interval for all single observation responses at those
same values of the predictor variables.

You might also like