Professional Documents
Culture Documents
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual
Second Course in Statistics Regression Analysis 7th Edition Mendenhall Solutions Manual
Chapter
7.2 a. Multicollinearity first increases the likelihood of rounding errors in our calculations.
Second, the results themselves may not seem to make sense (i.e., a significant F test for model
utility but no significant t tests among the individual parameter estimates). A third problem is
that a high correlation between variables may affect the sign of the parameter estimates, a
positive slope is obtained when a negative relationship was expected.
b. One way to detect multicollinearity is to test for pairwise correlation of the independent
variables. Another way to detect multicollinearity is to look for confusing results in the
regression analysis.
c. When multicollinearity is present, one way of dealing with the problem is to remove all but one
of the highly correlated variables, possibly by using stepwise regression. When using the model
for estimation and prediction only, it is not necessary to drop any independent variables. It is
necessary, though, to make sure the values of the x variables fall within the ranges of x used in
the experiment. The presence of multicollinearity when making inferences about the β 's in the
model is a dangerous problem. The solution is to use a designed experiment to break up the
patterns of multicollinearity. To reduce rounding errors in polynomial models, it is useful to
code the x values so that the correlation between the subsequent powers of x is reduced. Another
way of reducing rounding errors due to multicollinearity is to use ridge regression which
provides biased estimates of the β 's in the model, but these estimates have smaller standard
errors than their least squares analogs.
7.3 Another transformation that might work in Example 7.4 would be to use x = p 2 ;ln( p); e-p .
7.4 a. Even though r = .983 is rather large, you must take into caution about deducing a cause-and-
effect between the number of females in managerial positions and the number of females with a
college degree. The independent variables could not be controlled, but there could be a strong
relationship between these two variables.
7.5 a. Based on the correlation matrix, there is no evidence of extreme multicollinearity. None of the
pairwise correlations are larger than .5 in absolute value. Thus, it does not appear that
multicollinearity is a big problem in this exercise.
7.6 The "experimental region", which is defined in the text as the range of values of the independent
variables in the sample data. Ultimately, we do not want to predict the DV for values of the IVs
outside the experimental region (the extrapolation problem).
Since there are two IV's (depth and ice type), we need to find the "joint" range. Using Minitab, we get
the output below:
Note that the range for the First-Year ice is Range = .36 − .02 = .34, the range for the Landfast ice is
Range = .86 − .00 = .86, and the range for the Multi-Year ice is Range = .64 − .07 = .57.
Avoid selecting values of depth outside these ranges to avoid extrapolation errors.
7.7 a. There is not evidence of extreme multicollinearity since aggressive behavior is highly correlated
with all three independent variables.
b. Since narcissism is only somewhat correlated with irritability and trait anger, these two variables
should not cause much concern about multicollinearity. However, if you try to use all three
independent variables which would include aggressive behavior a multicollinearity problem
could occur.
7.8 Based on these results, we would say that multicollinearity is a severe problem in this analysis. The
model predicts the amount of carbohydrate solubilized during steam processing of peat (y) very well.
We disagree with the bioengineer's statement. Note that since the given correlations between pairs of
independent variables of the model in this problem are large, these indicate the presence of
multicollinearity (see “Detecting Multicollinearity in the Regression Model”, page 365).
7.9 When fitting the pth-order polynomial regression model, two requirements must be met:
1. The number of levels of x must be greater than or equal to ( p+1). For a second-order model
( p = 2), we must have at least p + 1 = 3 levels of x.
2. The sample size n must be greater than ( p + 1) in order to allow sufficient degrees of freedom
for estimating σ 2 . For a second-order model, we must have n greater than ( p +1) = 3.
Note that for our sample data, requirement #1 is satisfied since x = 1, 2, or 5. However, requirement
#2 is not satisfied since n = 3. If we attempt to fit this model, we will have
n −( k + 1) = n − 3 = 0 df for estimating σ 2 and, thus, will be unable to test model adequacy.
1 ( xp − x )
2
b. A 95% prediction interval for y is yˆ ± tα /2 s 1 + +
n SS xx
For x p = 300, yˆ = 5.7106 + .62597 (300) = 193.50. Substituting into the formula yields:
1 (300 − 413.33)2
193.50 ± ( 2.365)(8.30) 1 + + ⇒ 193.50 ± 26.76 or (166.74, 220.26)
9 17, 200
c. Note that the values of x in the sample range from 340 to 480 pounds. The prediction interval
should be used cautiously, or not at all, since the value xp = 300 lies outside the experimental
region.
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Parameter Estimates
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Parameter Estimates
Yes, t = 11.76. The p-value associated with the nicotine content ( x2 ) variable is p= .0001. For
α > .0001, nicotine content is a useful predictor of carbon monoxide content.
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 1 116.05651 116.05651 6.309 0.0195
Error 23 423.09389 18.39539
C Total 24 539.15040
Root MSE 4.28898 R-square 0.2153
Dep Mean 12.52800 Adj R-sq 0.1811
C.V. 34.23519
Parameter Estimates
d. We see the estimate signs have changed dramatically when compared to the model with all three
variables present.
7.12 If there is no severe correlation between x1 and x2 , then the economist could try to fit the model.
7.13 a. There may be multicollinearity between the relative error in estimating effort and the company
role of estimator which could make the sign of βˆ in the model have an opposite sign than what
1
is expected.
b. If there is no data collected for project leaders with less than 20% accuracy this forces all values
of x8 to be x8 = 1. Thus, all the β 3 is not estimable.
7.14 If Nickel is highly correlated with each of the other ten potential independent variables, then
multicollinearity could be a problem. You could drop Nickel altogether, or transform (code) this
variable to reduce the multicollinearity.
7.15 a. No multicollinearity since we use all three levels variables, use DOTest and LBERATIO.
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
2. Sample size n > ( k + 1) = 3 in order to have sufficient degrees of freedom for estimating σ 2 .
E ( y ) = β 0 + β1 x1 + β 2 x2 + β 3 x1 x2 + β 4 x12 + β 5 x22
we have p = 2 and k = 5 where p is the order of the model and k is the number of β 's in the model
excluding β 0 . To fit the model, we require:
2. Sample size n > ( k + 1) = 6 in order to have sufficient degrees of freedom for estimating σ 2 .
7.19 There are signs of multicollinearity since the VIF values for INLET-TEMP, AIRFLOW, and POWER
are greater than 10. In order to modify the model you could drop either POWER or AIRFLOW since
their VIF values are very high and redo the analysis. Or you could transform (code) the highly
correlated variables to reduce the multicollinearity. Include only one of the three variables.
b.
x y ln x ln y
54 6 3.989 1.792
42 16 3.738 2.773
28 33 3.332 3.497
38 18 3.678 2.890
25 41 3.219 3.714
70 3 4.248 1.099
48 10 3.871 2.303
41 14 3.714 2.639
20 45 2.996 3.807
52 9 3.951 2.197
65 5 4.174 1.609
or = 10.6364-2.16985 ln ( x )
H 0 : β1 = 0
H a : β1 ≠ 0
βˆ1 −2.16985
Test statistic: t = = = −13.44
s / SS xx .2021 / 1.5675
Conclusion: Reject H 0 at α = .05. There is sufficient evidence to indicate that the transformed
model is adequate for predicting y.
ŷ = e = e3.256 = 25.95
Because the value of r is so small, there is no evidence of a linear relationship between x1 and y.
2
(∑ x2 ) 1049.82
SS x2 x2 = ∑ x22 − = 77911.98 − = 4439.97733
n 15
2
(∑ y ) 18002
SS yy = ∑ y 2 − = 216900.86 − = 900.86
n 15
Since the value of r is not very close to 1, there is little evidence of a linear relationship between
x2 and y.
c. Based on the values of the correlation coefficients computed in parts (a) and (b), there is no
evidence that the model will be useful for predicting the sale price.
DEP VARIABLE: Y
ANALYSIS OF VARIANCE
PARAMETER ESTIMATES
INTERCEP 1 0.0001
X1 1 0.0001
X2 1 0.0001
From the printout, the least squares line is yˆ = −45.154 + 3.097x1 + 1.032x2 .
The p-value is .0001. Since this p-value is so small, H o is rejected. There is sufficient evidence
to indicate the model is adequate for predicting sale price.
The value of R 2 is .9998. This indicates that 99.98% of the sample variation in the sale prices is
explained by the model including x1 and x2 .
Because the correlation coefficient between x1 and x2 is very close to −1, it implies that x1 and
x2 are highly correlated.
f. In this case, we would not want to throw out a redundant variable. The models with just one
independent variable are not significant. However, the model with both independent variables,
even though they are highly correlated, is very significant.
7.22 a. Variables that are moderately or highly correlated have correlation coefficients of .5 or higher in
absolute value. There are only two pairs of variables which are moderately or highly correlated.
They are "year GRE taken" and "years in graduate program"
(r = −.602), and "race" and "foreign status" (r = −.515).
b. When independent variables that are highly correlated with each other are included in a
regression model, the results may be confusing. Highly correlated independent variables
contribute overlapping information in the prediction of the dependent variable. The overall
global test can indicate that the model is useful in predicting the dependent variable, while the
individual t-tests on the independent variables can indicate that none of the independent variables
are significant. This happens because the individual t-tests tests for the significance of an
independent variable after the other independent variables are taken into account. Usually, only
one of the independent variables that are highly correlated with each other are included in the
regression model.
7.23 Since df ( Error ) = 0 , the researcher's sample size is not sufficient to estimate s 2 . Thus, no estimate of
the variation in the y values exists and no test of the model adequacy can be made.