Professional Documents
Culture Documents
m04 Rend6289 10 Im c04
m04 Rend6289 10 Im c04
REVISED
M04_REND6289_10_IM_C04.QXD 5/7/08 2:49 PM Page 46
4
C H A P T E R
Regression Models
46
REVISED
M04_REND6289_10_IM_C04.QXD 5/7/08 2:49 PM Page 47
From this the coefficient of determination is adjusted r2 value declines or does not increase when a new vari-
2
r ⫽ SSR/SST ⫽ 120.76/150 ⫽ 0.81 able is added, then the variable should not be added to the model.
Alternative Example 4.3: For Alternative Examples 4.1 and 4.2, 4-6. The F-test is used to determine if the overall regression
dealing with ads, X, and apartments leased, Y, compute the correla- model is helpful in predicting the value of the independent variable
tion coefficient. (Y). If the F-value is large and the p-value or significance level is
Since r2 ⫽ 0.81 and the slope is positive (⫹0.395), the posi- low, then we can conclude that there is a linear relationship and the
tive square root of 0.81 is the correlation coefficient. r ⫽ 0.90. model is useful, as these results would probably not occur by
chance. If the significance level is high, then the model is not useful
SOLUTIONS TO DISCUSSION QUESTIONS and the results in the sample could be due to random variations.
AND PROBLEMS 4-7. The SSE is the sum of the squared errors in a regression
model. SST ⫽ SSE ⫹ SSR.
4-1. The term least-squares means that the regression line will
minimize the sum of the squared errors (SSE). No other line will 4-8. When the residuals (errors) are plotted after a regression
give a lower SSE. line is found, the errors should be random and should not show
any significant pattern. If a pattern does exist, then the assump-
4-2. Dummy variables are used when a qualitative factor such
tions may not be met or another model (perhaps nonlinear) would
as the gender of an individual (male or female) is to be included in
be more appropriate.
the model. Usually this is given a value of 1 when the condition is
met (e.g. person is male) and 0 otherwise. When there are more 4-9. a. Ŷ ⫽ 36 ⫹ 4.3(70) ⫽ 337
than two levels or values for the qualitative factor, more than one b. Ŷ ⫽ 36 ⫹ 4.3(80) ⫽ 380
dummy variable must be used. The number of dummy variables is c. Ŷ ⫽ 36 ⫹ 4.3(90) ⫽ 423
one less than the number of possible values or categories. For ex-
4-10. a.
ample, if students are classified as freshmen, sophomores, juniors
and seniors, three dummy variables would be necessary. 12
4-3. The coefficient of determination (r2) is the square of the
coefficient of correlation (r). Both of these give an indication of 10
how well a regression model fits a particular set of data. An r2
value of 1 would indicate a perfect fit of the regression model to 8
Demand
the points. This would also mean that r would equal ⫺1 or ⫹1.
6
4-4. A scatter diagram is a plot of the data. This graphical
image helps to determine if a linear relationship is present, or if
4
another type of relationship would be more appropriate.
4-5. The adjusted r2 value is used to help determine if a new 2
variable should be added to a regression model. Generally, if the
adjusted r2 value increases when a new variable is added to a 0
model, this new variable should be included in the model. If the 0 2 4 6 8 10
TV Appearances
REVISED
M04_REND6289_10_IM_C04.QXD 5/7/08 2:49 PM Page 48
4-10. b.
Demand TV Appearances
Y X (X ⫺ ¯X̄ )2 (Y ⫺ ¯Ȳ )2 (X ⫺ ¯X̄ )(Y ⫺ ¯Ȳ ) Ŷ (Y ⫺Ŷ)2 (Ŷ ⫺ ¯Ȳ )2
4-11. See the table for the solution to problem 4-10 to obtain
some of these numbers.
MSE = SSE/(n ⫺ k ⫺ 1) = 12/(6 ⫺ 1 ⫺ 1) = 3
MSR = SSR/k = 17.7/1 = 17.5
F = MSR/MSE = 17.5/3 = 5.83
df1 = k = 1
df2 = n ⫺ k ⫺ 1 = 6 ⫺ 1 ⫺ 1 = 4
F0.05, 1, 4 = 7.71
Do not reject H0 since 5.83 ⬍ 7.71. Therefore, we cannot conclude
there is a statistically significant relationship at the 0.05 level.
4-13.
Fin. Test 1
Ave,(Y) (X) (X ⫺ ¯X̄ )2 (Y ⫺ ¯Ȳ )2 (X ⫺ ¯X̄ )(Y ⫺ ¯Ȳ ) Y (Y ⫺ Ŷ)2 (Ŷ ⫺ ¯Ȳ )2
93 98 285.235 196 236.444 91.5 2.264 156.135
78 77 16.901 1 4.111 76 4.168 9.252
84 88 47.457 25 34.444 84.1 0.009 25.977
73 80 1.235 36 6.667 78.2 26.811 0.676
84 96 221.679 25 74.444 90 36.188 121.345
64 61 404.457 225 301.667 64.1 0.015 221.396
64 66 228.346 225 226.667 67.8 14.592 124.994
95 95 192.901 256 222.222 89.3 32.766 105.592
76 69 146.679 9 36.333 70 35.528 80.291
b1 = 1143/1544.9 = 0.740
b0 = (711/9) ⫺ 0.740 (730/9) = 18.99
REVISED
M04_REND6289_10_IM_C04.QXD 5/7/08 2:49 PM Page 49
Ridership (100,000s)
MSE = SSE/(n ⫺ k ⫺ 1) = 152.341/(9 ⫺ 1 ⫺ 1) = 21.76
35
MSR = SSR/k = 845.659/1 = 845.659
30
F = MSR/MSE = 845.659/21.76 = 38.9
df1 = k = 1 25
df2 = n ⫺ k ⫺ 1 = 9 ⫺ 1 ⫺ 1 = 7 20
F0.05, 1, 7 = 5.59 15
Because 38.9 ⬎ 5.59, we can conclude (at the 0.05 level) that 10
there is a statistically significant relationship between the first test
5
grade and the final average.
0
4-15. F ⫽ 38.86; the significance level ⫽ 0.0004 (which is ex- 0 5 10 15 20 25
tremely small) so there is definitely a statistically significant
Tourists (Millions)
relationship.
4-16. a. Ŷ ⫽ 13,473 ⫹ 37.65(1,860) ⫽ $83,502. b. Ŷ ⫽ 5.060 ⫹ 1.593X
b. The predicted average selling price for a house this c. Ŷ ⫽ 5.060 ⫹ 1.593(10) ⫽ 20.99, or 2,099,000 people.
size would be $83,502. Some will sell for more and some d. If there are no tourists, the predicted ridership would be
will sell for less. There are other factors besides size that 5.06 (100,000s) or 506,000. Because X ⫽ 0 is outside the
influence the price of the house. range of values that were used to construct the regression
c. Some other variables that might be included are age of model, this number may be questionable.
the house, number of bedrooms, and size of the lot. There 4-20. The F-value for the F-test is 52.6 and the significance
are other factors in addition to these that one can identify. level is extremely small (0.00002) which indicates that there is a
d. The coefficient of determination (r2) ⫽ (0.63)2 ⫽ statistically significant relationship between number of tourists
0.3969. and ridership. The coefficient of determination is 0.84 indicating
4-17. The multiple regression equation is Ŷ ⫽ $90.00 ⫹ that 84% of the variability in ridership from one year to the next
$48.50X1 ⫹ $0.40X2 could be explained by the variations in the number of tourists.
a. Number of days on the road: X1 ⫽ 5; Distance traveled: 4-21. a. Ŷ ⫽ 24,328 ⫹ 3026.67X1 ⫹ 6684X2
X2 ⫽ 300 miles where Ŷ predicted starting salary; X1 ⫽ GPA; X2 ⫽ 1 if business
The amount he may be expected to claim is major, 0 otherwise.
Ŷ ⫽ 90.00 ⫹ 48.50(5) ⫹ $0.40(300) ⫽ $452.50 b. Ŷ ⫽ 24,328 ⫹ 3026.67(3.0) ⫹ 6684(1) ⫽ $40,092.01.
b. The reimbursement request, according to the model, c. The starting salary for business majors tends to be
appears to be too high. However, this does not mean that it is about $6,684 higher than non-business majors in this
not justified. The accountants should question Thomas sample, even after adjusting for variations in GPA.
Williams about his expenses to see if there are other explana- d. The overall significance level is 0.099 and r2 ⫽ 0.69.
tions for the high cost. Thus, the model is significant at the 0.10 level and 69% of
c. A number of other variables should be included, such as the variability in starting salary is explained by GPA and
the type of travel (air or car), conference fees if any, and ex- major. The model is useful in predicting starting salary.
penses for entertainment of customers, and other transportation 4-22. a. Let
(cab and limousine) expenses. In addition, the coefficient of Ŷ ⫽ predicted selling price
correlation is only 0.68 and r2 ⫽ (0.68)2 ⫽ 0.46. Thus, about
46% of the variability in the cost of the trip is explained by this X1 ⫽ square footage
model; the other 54% is due to other factors. X2 ⫽ number of bedrooms
4-18. Using computer software to get the regression equation, X3 ⫽ age
we get The model with square footage: Ŷ ⫽ 2367.26 ⫹ 46.60X1 ; r2 ⫽ 0.65
Ŷ ⫽ 1.03 ⫹ 0.0034X The model with number of bedrooms: Ŷ ⫽ 1923.5 ⫹ 36137.76X2 ;
where Ŷ ⫽ predicted GPA and X ⫽ SAT score. r2 ⫽ 0.36
If a student scores 450 on the SAT, we get The model with age: Ŷ ⫽ 147670.9 ⫺ 2424.16X3 ; r2 ⫽ 0.78
REVISED
M04_REND6289_10_IM_C04.QXD 5/7/08 2:49 PM Page 50
All of these models are significant at the 0.01 level or less. The If both SAT and a dummy variable (X2 ⫽ 1 for private, 0 otherwise)
best model uses age as the independent variable. The coefficient of are used to predict the cost, we get r2 ⫽ 0.79. The model is
determination is highest for this, and it is significant.
Ŷ ⫽ 7121.8 ⫹ 5.16X1 ⫹ 9354.99X2.
4-23. Ŷ ⫽ 5701.45 ⫹ 48.51X1 ⫺ 2540.39X2 and r2 ⫽ 0.65. This says that a private school tends to be about $9,355 more ex-
Ŷ ⫽ 5701.45 ⫹ 48.51(2000) ⫺ 2540.39(3) ⫽ 95,100.28. pensive than a public school when the median SAT score is used to
Notice the r2 value is the same as it was in the previous problem adjust for the quality of the school. The coefficient of determination
with just square footage as the independent variable. Adding the indicates that about 79% of the variability in cost can be explained
number of bedrooms did not add any significant information that by these factors. The model is significant at the 0.001 level.
was not already captured by the square footage. It should not be
included in the model. The r2 for this is lower than for age alone in 4-31. Yˆ = 67.8 + 0.0145 X
the previous problem. There is a significant relationship between the number of victories
(Y) and the payroll (X) at the 0.054 level, which is marginally sig-
4-24. Ŷ ⫽ 82185.5 ⫹ 25.94X1 ⫺ 2151.7X2 ⫺ 1711.5X3 and
nificant. However, r2 = 0.24, so the relationship is not very strong.
r2 ⫽ 0.89.
Only about 24% of the variability in victories is explained by this
Ŷ ⫽ 82185.5 ⫹ 25.94(2000) ⫺ 2151.7(3) ⫺ 1711.5(10) model.
⫽ $110,495.4.
4-32. a. Yˆ = 42.43 + 0.0004 X
4-25. Ŷ ⫽ 3071.885 ⫹ 6.5326X where
Y ⫽ DJIA and X ⫽ S&P. b. Yˆ = −31.54 + 0.0058 X
2
r ⫽ 0.84 and r ⫽ 0.70.
c. The correlation coefficient for the first stock is only
Ŷ ⫽ 3071.885 ⫹ 6.5326(1100) ⫽ 10257.8 (rounded) 0.19 while the correlation coefficient for the second is
4-26. With one independent variable, beds, in the model, r2 ⫽ 0.96. Thus, there is a much stronger correlation between
0.88. With just admissions in the model, r2 ⫽ 0.974. When both stock 2 and the DJI than there is for stock 1 and the DJI.
variables are in the model, r2 ⫽ 0.975. Thus, the model with only
admissions as the independent variable is the best. Adding the CASE STUDIES
number of beds had virtually no impact on r2, and the adjusted r2 SOLUTION TO NORTH–SOUTH AIRLINE CASE
decreased slightly. Thus, the best model is Ŷ ⫽ 1.518 ⫹ 0.6686X
where Y ⫽ expense and X ⫽ admissions. Northern Airline Data
4-27. Using Excel with Y ⫽ MPG; X1 ⫽ horsepower; X2 ⫽ Airframe Cost Engine Cost Average Age
Year per Aircraft per Aircraft (Hours)
weight the models are:
Ŷ ⫽ 53.87 ⫺ 0.269X1; r2 ⫽ 0.77 2001 51.80 43.49 6,512
Ŷ ⫽ 57.53 ⫺ 0.01X2; r2 ⫽ 0.73. 2002 54.92 38.58 8,404
2003 69.70 51.48 11,077
Thus, the model with horsepower as the independent variable is 2004 68.90 58.72 11,717
better since r2 is higher. 2005 63.72 45.47 13,275
4-28. Ŷ ⫽ 57,69 ⫺ 0.17X1 ⫺ 0.005X2 where 2006 84.73 50.26 15,215
2007 78.74 79.60 18,390
Y ⫽ MPG
X1 ⫽ horsepower
X2 ⫽ weight Southeast Airline Data
2
r ⫽ 0.82. Airframe Cost Engine Cost Average Age
Year per Aircraft per Aircraft (Hours)
This model is better because the coefficient of determination is much
higher with both variables than it is with either one individually. 2001 13.29 18.86 5,107
2002 25.15 31.55 8,145
4-29. Let Y ⫽ MPG; X1 ⫽ horsepower; X2 ⫽ weight 2003 32.18 40.43 7,360
2004 31.78 22.10 5,773
The model Ŷ ⫽ b0 ⫹ b1X1 ⫹ b2X12 is Ŷ ⫽ 69.93 ⫺0.620X1 ⫹ 2005 25.34 19.69 7,150
0.001747X12 and has r2 ⫽ 0.798. 2006 32.78 32.58 9,364
The model Ŷ ⫽ b0 ⫹ b3X2 ⫹ b4X22 is Ŷ ⫽ 89.09 ⫺ 0.0337X2 ⫹ 2007 35.56 38.07 8,259
0.0000039X22 and has r2 ⫽ 0.800.
Utilizing QM for Windows, we can develop the following regres-
The model Ŷ ⫽ b0 ⫹ b1X1 ⫹ b2X12 ⫹ b3X2 ⫹ b4X22 is Ŷ ⫽ 89.2 ⫺ sion equations for the variables of interest.
0.51X1 ⫹ 0.001889X12 ⫺ 0.01615X2 ⫹ 0.00000162X22 and has r2 ⫽ Northern Airline—airframe maintenance cost:
0.883. This model has a higher r2 value than the model in 4-28. A
Cost ⫽ 36.10 ⫹ 0.0025 (airframe age)
graph of the data would show a nonlinear relationship.
Coefficient of determination ⫽ 0.7694
4-30. If SAT median score alone is used to predict the cost, Coefficient of correlation ⫽ 0.8771
we get
Ŷ ⫽ ⫺7793.1 ⫹ 21.8X1 with r2 ⫽ 0.22.
REVISED
M04_REND6289_10_IM_C04.QXD 5/7/08 2:49 PM Page 51
Cost ($)
50 50
40 40
30 Airframe 30 Airframe
Engine Engine
20 20
10 10
5 7 9 11 13 15 17 19 5 7 9 11 13 15 17 19
Average Airframe Age (Thousands) Average Airframe Age (Thousands)