Professional Documents
Culture Documents
Stat 252-Extra Final Review Questions-Greg-Solutions
Stat 252-Extra Final Review Questions-Greg-Solutions
STAT 252
SOLUTIONS
The questions below are extra parts of questions that are on the Practice Final Exam. These are to help
you get a ready for a greater variety of possible questions. So for each of the questions below, look at the
information and incomplete computer output given in the relevant question in the Practice Final Exam in
order to answer them.
Question 3 (see Question 3 in Practice Final Exam for information and computer output)
(e) (1 mark): What percentage of the variation in cholesterol level in men is explained by the regression
of cholesterol level against fat consumption?
The percentage of variation in the response variable that is explained by the variation in the explanatory
variable is the coefficient of determination, which was calculated in part (b).
Therefore, 91.0% of the variation in cholesterol level in men is explained by the regression of cholesterol
level against fat consumption.
SS ERROR 502.165
MS ERROR = = = 83.6942
n−2 8−2
The standard error of the model is: ˆ = MS ERROR = 9.148
(g) (3 marks): Calculate a 99% confidence interval for the slope of the regression line. Use this
confidence interval to write a conclusion about whether the slope the regression line is significant.
Interpretation: We can be 99% confident that the slope of population regression line is somewhere
between 0.94 and 2.64.
Since 0 is not inside this confidence interval, we can be 99% confident that the slope of the regression
line is significantly different from 0. In other words, the slope of the regression line is significant.
1
(h) (3 marks): Although the questions states that the data fit the assumptions of linear regression
analysis, based on the graphs shown above, verify that each of the assumptions are met (excluding
the independence assumption).
1. The assumption on linearity can be verified since, in the scatterplot, the data points fall fairly closely
along a straight line. Moreover, in the residual plot the data points form a horizontal, linear pattern
along the residual axis.
2. The assumption of equal standard deviations can be verified in the residual plot (as well as the
scatterplot) since the deviations of y-values are approximately the same for all values of x.
3. The assumption concerning normality can be verified since, in the normal probability plot, the data
points fall roughly along a straight line.
Question 4 (see Question 4 in Practice Final Exam for information and computer output)
(c) (5 marks): At the 1% significance level, test whether there is a main effect of Method (Method 1
versus 2) on the strength of the metals.
H0: There is no main effect of method on the strength of the metals. (The means for methods are equal).
Ha: There is a main effect of method on the strength of the metals. (At least two means for methods are
different)
Conclusion: The data do not provide sufficient evidence to conclude that there is a significant main effect
of method on strength of the metals.
(d) (5 marks): At the 1% significance level, test whether there is a main effect of the type of metal (steel,
alloy and titanium) on the strength.
H0: There is no main effect of the type of metal on the strength. (The means for metals are equal).
Ha: There is a main effect of type of metal on the strength. (At least two means for metals are different)
Conclusion: The data provide sufficient evidence to conclude that there is a significant main effect of the
type of metal on strength.
2
(e) (5 marks): At the 1% significance level, test whether the effect of Method on the strength of the
metals depends upon the type of Metal being produced, in other words, test whether there is an
interaction effect between Method and Metal.
Conclusion: The data do not provide sufficient evidence to conclude that there is a significant interaction
effect between method and type of metal.
Question 5 (see Question 5 in Practice Final Exam for information and computer output)
(f) (2 marks): What percentage of the variation in diatom density is explained by (or accounted for by) the
regression model? (Note: Determine the adjusted percentage.)
The adjusted coefficient of determination shows that 99.8% of the variation in diatom density is explained
by (or accounted for by) the regression model.
(g) (4 marks): At the 1% significance level, perform a hypothesis test to determine whether light intensity
is useful in predicting diatom density (in other words, whether the relationship between light intensity
and diatom density is significant).
3
(h) (4 marks): At the 1% significance level, perform a hypothesis test to determine whether there is a
significant negative relationship between depth and diatom density.
df = n − (k + 1) = 9 − (3 + 1) = 5
P-value: 0.025 > P > 0.02 There is strong evidence against Ho
Since P > α (0.01), do not reject Ho
Conclusion: At the 1% significance level, the data do not provide sufficient evidence to conclude that
there is a significant negative relationship between depth and diatom density.
(i) (1 mark): The marine ecologist only recorded depth, light intensity and diatom density. How, then,
were the numbers for the interaction term obtained?
The numbers in the interaction term were arrived at by multiplying each value for water depth by the
corresponding value for light intensity.
(j) (4 marks): Based on the values of the predictor variables given in part (d) (depth = 70 m, light = 18%,
interaction term = 1260), what is the 95% confidence interval for mean diatom density at those values
of the predictor variables? [Note again: SE(Fit) = 0.793]
yˆ p t /2 SE( Fit )
22.77 2.571 0.793
22.77 2.039
(20.73, 24.81)
We are 95% confident that mean diatom density at the values of the predictor variables given in part (d) is
between 20.73 and 24.81 cells per ml.
(k) (3 marks): Compare the length of the prediction interval in part (e) with the confidence interval in part
(j). Explain the difference between these two confidence intervals and explain any possible difference
in their lengths.
Based on the prediction interval in part (e), if we take random samples where the predictor variables have
the values given part (d), we can be 95% confident that any single observation of diatom density will
between 18.70 and 26.84 cells per ml; whereas, based on the confidence interval in part (j), we can be
95% confident that the means of those samples will be between 20.73 and 24.81. The reason for this is
that a confidence interval for the mean response of the response variable at given values of the predictor
variables will always be shorter than the prediction interval for all single observation responses at those
same values of the predictor variables.