PROBLEMS ch05

You might also like

Download as xls, pdf, or txt
Download as xls, pdf, or txt
You are on page 1of 117

Chapter 5: Regression Analysis

1. Explain the difference between the model Y = b0 + b1 X + e and the model

The first model represents the population regression model and the second model is computed using
sample data and represents the estimate of the population mean for any value of X within the sample
range.

2. What is least squares estimation? Illustrate how least squares regression defines the best fitting
regression line.

Least squares estimation is the most common approach used to estimate the population regression
model using sample data. The example below shows the least squares fit for a set of 5 data points.
The least squares fit is 1.92 + .19 X, and the error (difference between the sample Y value and the least
squares fit value) is computed at each point. The sum of those squared errors is 4.7027, no other fit
would result in a lower total of squared errors.

Intercept = 0.19 16
Slope = 1.92 14
12 f(x) = 1.91891891891892 x + 0.189189189189191
10
8
6
4
2
0
1 2 3 4 5 6 7 8

Regression
Line Error
X Y 1.92+.19 X Y-[1.92+.19 X] Error Squared
2 5 4.0270 0.9730 0.9467
5 9 9.7838 -0.7838 0.6143
4 7 7.8649 -0.8649 0.7480
7 15 13.6216 1.3784 1.8999
6 11 11.7027 -0.7027 0.4938
4.7027

3. Explain the coefficient of determination, R2. How does it differ from the sample correlation coefficient?

The coefficient of determination, R2, is the ratio SSR/SST and gives the proportion of variation that is
explained by the independent variable of the regression model. The value of R 2 will be between 0 and 1.
A value of 1.0 indicates a perfect fit, while a value of 0 indicates that no relationship exists.

The sample correlation coefficient, R, is the square root of the coefficient of determination. Values of R
range from -1 to 1, where the sign is determined by the sign of the slope of the regression line. A
correlation coefficient of R = 1 indicates perfect positive correlation; that is, as the independent variable
increases, the dependent variable does also; R = -1 indicates perfect negative correlation - as X
increases, Y decreases. As with R2, a value of R = 0 indicates no correlation. Because R2 measures
the actual proportion of the variation explained by regression, it is generally easier to interpret than R.

4. Explain the assumptions of linear regression. How can you determine if each of these assumptions
hold?

1. Linearity, the relationship between the dependent and independent variables is linear.

This can be checked by examining a plot of residuals, if a pattern exists, perhaps some
other, nonlinear model should be used.

2. The errors for each individual value of X are normally distributed with a mean of zero and a
constant variance.

This can be verified by examining a histogram of the residuals associated with each value
of the independent variable and inspecting for a bell-shaped distribution, or using more
formal goodness-of-fit tests as described in Chapter 4.

3. Homoscedasticity, which means that the variation about the regression line is constant for
all values of the independent variable.

This can also be evaluated examining the residual plot and looking for large differences in
the variances at different values of the independent variable.

4. The residuals should be independent for each value of the independent variable.

Correlation among successive observations over time is called autocorrelation, and can be
identified by residual plots having clusters of residuals with the same sign.
Autocorrelation can be evaluated more formally using a statistical test based on the
Durbin-Watson statistic.

5. Explain how Adjusted R2 is used in evaluating the fit of multiple regression models.

Adjusted R2, like R2, considers the value of adding an independent variable to the multiple regression
model, but unlike R2, it is adjusted for both the number of independent variables and the sample size.
Also unlike R2,adjusted R2 may actually go down if the added variable adds little in explanation power to
the regression model. Therefore adjusted R2 may be more useful than R2 in identifying the most
beneficial variables to include in the multiple regression model.

6. Describe the differences, and advantages/disadvantages, of using best subsets regression, stepwise
regression, and examination of correlations in developing multiple regression models.

Best subsets and stepwise regression are both (automated) search methods for developing multiple
regression models. Best subsets regression will consider every possible model but takes awhile
to complete and leave you with a lot of candidate models to consider (and the PHStat version leaves a
whole bunch of new worksheets). Stepwise regression builds just one model by adding or removing one
independent variable at each step, thus taking less time but perhaps missing the one model you really
would want. Note that both methods are dependent on the criterion used for model selection.
Examination of correlations refers to manually examining the relationships between the independent
variables and the dependent variable, and also between independent variables, either to construct a
model manually or to better understand a regression model.

7. Find real data on daily changes in the S&P 500 (or the Dow Jones Industrial Average) and a stock of your
interest. Use regression analysis to estimate the beta risk of the stock.

Cold Water Creek S&P Index


CWTR ROR Market ROR
1.4% 0.6% SUMMARY OUTPUT
-24.5% -4.3%
0.8% 5.8% Regression Statistics
21.5% 5.9% Multiple R 0.3042348082
52.9% 4.3% R Square 0.0925588185
2.9% 7.8% Adjusted R Squ0.0810722213
-12.2% -5.7% Standard Error 0.196509777
23.5% 5.3% Observations 81
0.0% -3.4%
6.9% 4.5% ANOVA
8.9% 1.6% df SS
11.7% 1.0% Regression 1 0.3111678611
10.0% 7.0% Residual 79 3.0506713029
-41.5% 5.0% Total 80 3.3618391641
-18.6% 0.9%
15.1% -1.9% Coefficients Standard Error
20.9% 3.9% Intercept 0.0204581536 0.0219269699
-28.2% -1.2% X Variable 1 1.2098742641 0.4262133357
-26.3% -14.6%
30.4% 6.2%
-34.5% 8.0%
-3.6% 5.9%
14.6% 5.6% SUMMARY OUTPUT
-4.9% 4.1%
-23.1% -3.2% Regression Statistics
14.3% 3.9% Multiple R 0.3042348082
28.4% 3.8% R Square 0.0925588185
35.5% -2.5% Adjusted R Squ0.0810722213
-2.5% 5.4% Standard Error 0.196509777
-2.0% -3.2% Observations 81
-7.9% -0.6%
13.5% -2.9% ANOVA
20.6% 6.3% df SS
16.0% 1.9% Regression 1 0.3111678611
-26.8% 5.8% Residual 79 3.0506713029
-15.8% -5.1% Total 80 3.3618391641
5.1% -2.0%
-6.2% 9.7% Coefficients Standard Error
47.8% -3.1% Intercept 0.0204581536 0.0219269699
-8.8% -2.2% X Variable 1 1.2098742641 0.4262133357
31.4% 2.4%
4.6% -1.6%
1.6% 6.1%
-15.6% -5.3%
9.9% -0.5%
-15.2% -8.0%
23.4% 0.4%
25.1% 3.5%
-52.4% -9.2%
21.8% -6.4%
-13.2% 7.7%
19.4% 0.5%
11.4% -2.5%
5.4% -1.1%
-8.5% -6.4%
-27.1% -8.2%
7.6% 1.8%
30.2% 7.5%
-17.4% 0.8%
-22.6% -1.6%
-5.2% -2.1%
14.3% 3.7%
23.4% -6.1%
-8.9% -0.9%
22.2% -7.2%
-36.0% -7.9%
-2.7% 0.5%
-13.2% -11.0%
13.8% 8.6%
6.3% 5.7%
20.3% -6.0%
-3.5% -2.7%
-2.9% -1.7%
-21.0% 0.8%
3.0% 8.1%
26.0% 5.1%
0.0% 1.1%
30.3% 1.6%
10.1% 1.8%
-17.2% -1.2%
17.5% 3.9%
computed using
within the sample

e best fitting

tion regression
of 5 data points.
value and the least
027, no other fit

9189189189191

7 8

lation coefficient?

f variation that is
be between 0 and 1.

ation. Values of R
ssion line. A
dependent variable

se R2 measures
interpret than R.

se assumptions

nt variables is linear.

exists, perhaps some

with a mean of zero and a

sociated with each value


bution, or using more

ssion line is constant for

g for large differences in

endent variable.

tocorrelation, and can be

test based on the

ultiple regression
the sample size.
xplanation power to

ession, stepwise

eloping multiple

at version leaves a
ing or removing one
e model you really

he independent
er to construct a

e) and a stock of your


MS F Significance F
0.3111678611 8.057984 0.005757
0.0386160924

t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
0.9330132548 0.353656 -0.02319 0.064103 -0.02319 0.064103
2.8386588659 0.005757 0.361517 2.058231 0.361517 2.058231

MS F Significance F
0.3111678611 8.057984 0.005757
0.0386160924

t Stat P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
0.9330132548 0.353656 -0.02319 0.064103 -0.02319 0.064103
2.8386588659 0.005757 0.361517 2.058231 0.361517 2.058231
Chapter 5: Regression Analysis

8. Construct a scatter diagram for Takeaways and Yards Allowed in the 2000 NFL Data.xls worksheet.
Does there appear to be a linear relationship? Develop a regression model for predicting Yards Allowed
as a function of Takeaways. Explain the statistical significance of the model.

7,000
6,000
5,000
Yards Allowed

f(x) = − 32.4118177566972 x + 6087.76250918565


R² = 0.217094482668474
4,000
3,000
2,000
1,000
0
15 20 25 30 35 40 45 50 55
Takeaways

There is a negative relationship, i.e. as takeaways increase, yards allowed tend to decrease. However the
strength of that relationship is relatively weak (R2 = .22).

-0.46593399
X TakeawaYards Allowed Y
30 3,813
Data.xls worksheet. 49 3,967
edicting Yards Allowed 31 4,546
37 5,249
18 5,701
31 4,820
44 5,544
41 4,636
22 5,357
41 4,800
25 5,494
35 4,743
35 4,820
35 4,713
28 5,069
42 5,033
33 4,474
29 4,426
38 5,656
30 4,845
29 5,293
29 6,391
o decrease. However the 21 5,709
25 5,329
20 5,234
23 5,353
25 5,607
21 5,487
25 5,643
20 5,737
22 4,959

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.46593399
R Square 0.21709448
Adjusted R Squ 0.19009774
Standard Error 502.319706
Observations 31

ANOVA
df SS MS F Significance F
Regression 1 2029074 2029074 8.041507 0.008248
Residual 29 7317428 252325.1
Total 30 9346501

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%


Lower 95.0%
Upper 95.0%
Intercept 6087.76251 355.9877 17.10105 1.09E-16 5359.686 6815.839 5359.686 6815.839
X Variable 1 -32.4118178 11.4297 -2.83576 0.008248 -55.7882 -9.03545 -55.7882 -9.03545
Upper 95.0%
Chapter 5: Regression Analysis

9. Construct a 95% confidence band chart for the model developed in Problem 8.

7000.0000

6500.0000 b0 = intercept b

6000.0000

5500.0000

5000.0000

4500.0000

4000.0000

3500.0000

3000.0000
15 20 25 30 35 40 45 50 55

Formulas: Squared Deviation = (Xi - `X)2


hi = 1/n + (Xi - `X)2 / S(Xi - `X)2
Regression Line = b0 + b1 X
Confidence Band: Regression Line ± ta/2,n-2 SYX Öhi

Calculations: n= 31 =COUNT(x's)
`X = 30.13 =AVERAGE(x's)
S (X - `X)2 = 1,931.48 =DEVSQ(x's)
Slope = -32.41 =SLOPE(y's,x's)
Intercept = 6,087.76 =INTERCEPT(y's,x's)
SYX = 502.32 =STEYX(y's,x's)
a= 0.05
t= 2.05 =TINV(alpha,n-1)

Yards Lower
Takeaways Allowed Squared Regression Confidence
X Y Deviations h Line Band
18 5,701 147.1134 0.1084 5504.3498 5166.0629
20 5,234 102.5973 0.0854 5439.5262 5139.3395
20 5,737 102.5973 0.0854 5439.5262 5139.3395
21 5,709 83.3392 0.0754 5407.1143 5125.0002
21 5,487 83.3392 0.0754 5407.1143 5125.0002
22 5,357 66.0812 0.0665 5374.7025 5109.8297
22 4,959 66.0812 0.0665 5374.7025 5109.8297
23 5,353 50.8231 0.0586 5342.2907 5093.6548
25 5,494 26.3070 0.0459 5277.4671 5057.4151
25 5,329 26.3070 0.0459 5277.4671 5057.4151
25 5,607 26.3070 0.0459 5277.4671 5057.4151
25 5,643 26.3070 0.0459 5277.4671 5057.4151
28 5,069 4.5328 0.0346 5180.2316 4989.1184
29 4,426 1.2747 0.0329 5147.8198 4961.4227
29 5,293 1.2747 0.0329 5147.8198 4961.4227
29 6,391 1.2747 0.0329 5147.8198 4961.4227
30 3,813 0.0166 0.0323 5115.4080 4930.8642
30 4,845 0.0166 0.0323 5115.4080 4930.8642
31 4,546 0.7586 0.0327 5082.9962 4897.3571
31 4,820 0.7586 0.0327 5082.9962 4897.3571
33 4,474 8.2425 0.0365 5018.1725 4821.8273
35 4,743 23.7263 0.0445 4953.3489 4736.5249
35 4,820 23.7263 0.0445 4953.3489 4736.5249
35 4,713 23.7263 0.0445 4953.3489 4736.5249
37 5,249 47.2102 0.0567 4888.5253 4643.8918
38 5,656 61.9521 0.0643 4856.1134 4595.5347
41 4,636 118.1779 0.0934 4758.8780 4444.8300
41 4,800 118.1779 0.0934 4758.8780 4444.8300
42 5,033 140.9199 0.1052 4726.4662 4393.2192
44 5,544 192.4037 0.1319 4661.6425 4288.5647
49 3,967 356.1134 0.2166 4499.5834 4021.4131

Note: data sorted by takeaways.


= intercept b1 = slope

45 50 55

VERAGE(x's)

LOPE(y's,x's)
NTERCEPT(y's,x's)
TEYX(y's,x's)

INV(alpha,n-1)

Upper
Confidence
Band
5842.6367
5739.7128
5739.7128
5689.2284
5689.2284
5639.5754
5639.5754
5590.9266
5497.5190
5497.5190
5497.5190
5497.5190
5371.3449
5334.2169
5334.2169
5334.2169
5299.9518
5299.9518
5268.6352
5268.6352
5214.5177
5170.1729
5170.1729
5170.1729
5133.1587
5116.6922
5072.9259
5072.9259
5059.7131
5034.7204
4977.7538
Chapter 5: Regression Analysis

10. Develop simple linear regression models for predicting Games Won as a function of each of the
independent variables in the 2000 NFL Data.xls worksheet individually. Do the assumptions of linear
regression hold for your models? How do these models compare to the multiple regression model
developed in the chapter?

x = Yards Gained
14
12
f(x) = 0.00233044788412 x − 3.98662110410048
10 R² = 0.361782483272298
8
6
4
2
0
3,000 3,500 4,000 4,500 5,000 5,500 6,000 6,500 7,000 7,500

x = Takeaways
14
12
f(x) = 0.228839601843811 x + 1.10528425412519
10
R² = 0.339419812130753
8
6
4
2
0
15 20 25 30 35 40 45 50 55

x = Giveaways
14
12
10
f(x) = − 0.241416401342397 x + 15.2736425436709
8 R² = 0.293264219080362
6
4
2
0
15 20 25 30 35 40 45 50 55

x = Yards Allowed
14
12
10 f(x) = − 0.002898624713617 x + 22.8155254394586
8 R² = 0.263521948796365
6
4
2
x = Yards Allowed
14
12
10 f(x) = − 0.002898624713617 x + 22.8155254394586
8 R² = 0.263521948796365
6
4
2
0
3,500 4,000 4,500 5,000 5,500 6,000 6,500 7,000

x = Points Scored
14
12 f(x) = 0.025666737634605 x − 0.48989444210445
10 R² = 0.46424065721651
8
6
4
2
0
100 150 200 250 300 350 400 450 500 550 600

As pointed out in the text, these variables satisfy the assumptions of linear regression.

These models display the same direction of the x coefficients (i.e. negative for giveaways and yards
allowed, positive for the others) as the multiple regression model developed in the text. For the
multiple regression model R2 = .77, the greatest R2 for a simple regression model is R2 = .46 for points
scored as the independent variable.
Team Yards GaTakeawaGiveawa Yards Al Points S Games Won
Tennessee 5,350 30 30 3,813 346 13
of each of the Baltimore 5,014 49 26 3,967 333 12
ssumptions of linear New York 6,376 31 24 4,546 328 12
regression model Oakland 5,776 37 20 5,249 479 12
Minnesota 5,961 18 28 5,701 397 11
Philadelph 5,006 31 29 4,820 351 11
Denver 6,567 44 25 5,544 485 11
Miami 4,461 41 26 4,636 323 11
Indianapoli 6,141 22 29 5,357 429 10
Tampa Ba 4,649 41 24 4,800 388 10
St. Louis 7,075 25 35 5,494 540 10
New Orlea 5,397 35 26 4,743 354 10
New York J 5,395 35 40 4,820 321 9
Pittsburgh 4,766 35 21 4,713 321 9
Green Bay 5,321 28 33 5,069 353 9
Detroit 4,422 42 31 5,033 307 9
Washingto 5,396 33 33 4,474 281 8
Buffalo 5,498 29 23 4,426 315 8
Carolina 4,654 38 35 5,656 310 7
Jacksonvil 5,690 30 29 4,845 367 7
Kansas Cit 5,614 29 26 5,293 355 7
Seattle 4,680 29 38 6,391 320 6
San Franci 6,040 21 19 5,709 388 6
Dallas 4,475 25 39 5,329 294 5
Chicago 4,541 20 29 5,234 216 5
New Engla 4,571 23 25 5,353 276 5
Atlanta 3,994 25 34 5,607 252 4
Cincinnati 4,260 21 35 5,487 185 4
Cleveland 3,530 25 28 5,643 161 3
Arizona 4,528 20 44 5,737 210 3
San Diego 4,300 22 50 4,959 269 1
veaways and yards
e text. For the
is R2 = .46 for points
Games Won
Chapter 5: Regression Analysis

11. Data obtained from a County Auditor (see the file Market Value.xls) provides information about the age,
square footage, and current market value of houses along one street in a particular subdivision.
a. Construct a scatter diagrams showing the relationship between market value as a function
of the age and size of the house, and add trendlines using the Add Trendline option in
Excel.
b. Develop simple linear regression models for estimating the market value as a function of
the age of the house and size of the house separately.
c. Develop a multiple linear regression model for estimating the market value as a function of
both the age and size of the house.
d. How do the models developed in parts b and c compare?

a. b.
x = House Age
$140,000.00
$120,000.00
$100,000.00
$80,000.00 f(x) = 1570.43418332185 x + 45217.7611499458
$60,000.00 R² = 0.130621028591945
$40,000.00
$20,000.00
$0.00
20 22 24 26 28 30 32 34

x = Square Feet
$150,000.00

$100,000.00 f(x) = 35.036372583472 x + 32673.219897243


R² = 0.534734201674492
$50,000.00

$0.00
1,000 1,200 1,400 1,600 1,800 2,000 2,200 2,400 2,600

c. Multiple Regression Model

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.7454947764
R Square 0.5557624616
Adjusted R Squ0.5329810494
Standard Error 7211.8484974
Observations 42
ANOVA
df SS MS F
Regression 2 2537650171 1268825085 24.395435019
Residual 39 2028419591 52010758.749
Total 41 4566069762

Coefficients Standard Error t Stat P-value


Intercept 47331.381536 13884.346644 3.4089743472 0.0015278315
House Age -825.1612203 607.31284208 -1.358708664 0.1820459095
Square Feet 40.911068448 6.6965239941 6.1092991654 3.651013E-07

d. R2 = .55 for the multiple regression model, only slightly better than R2 = for the simple
regression model using Square Feet as the independent variable.
Count Street Address House Age Square Feet Market Value
1 1357 33 1,812 $90,000.00
rmation about the age, 2 1358 32 1,914 $104,400.00
ar subdivision. 3 1361 32 1,842 $93,300.00
arket value as a function 4 1362 33 1,812 $91,000.00
d Trendline option in 5 1365 32 1,836 $101,900.00
6 1366 33 2,028 $108,500.00
t value as a function of 7 1369 32 1,732 $87,600.00
8 1370 33 1,850 $96,000.00
ket value as a function of 9 1373 32 1,791 $89,200.00
10 1374 33 1,666 $88,400.00
11 1377 32 1,852 $100,800.00
12 1378 32 1,620 $96,700.00
13 1381 32 1,692 $87,500.00
14 1382 32 2,372 $114,000.00
15 1385 32 2,372 $113,200.00
16 1386 33 1,666 $87,500.00
17 1389 32 2,123 $116,100.00
18 1390 32 1,620 $94,700.00
19 1393 32 1,731 $86,400.00
20 1394 32 1,666 $87,100.00
21 1405 28 1,520 $83,400.00
22 1406 27 1,484 $79,800.00
23 1409 28 1,588 $81,500.00
24 1410 28 1,598 $87,100.00
25 1413 28 1,484 $82,600.00
26 1414 28 1,484 $78,800.00
27 1417 28 1,520 $87,600.00
28 1418 27 1,701 $94,200.00
29 1421 28 1,484 $82,000.00
30 1425 28 1,468 $88,100.00
31 1426 28 1,520 $88,100.00
32 1429 27 1,520 $88,600.00
33 1430 27 1,484 $76,600.00
34 1434 28 1,520 $84,400.00
35 1438 27 1,668 $90,900.00
36 1442 28 1,588 $81,000.00
37 1446 28 1,784 $91,300.00
38 1450 27 1,484 $81,300.00
39 1453 27 1,520 $100,700.00
40 1454 28 1,520 $87,200.00
41 1457 27 1,684 $96,700.00
42 1458 27 1,581 $120,700.00
Significance F
1.344304E-07

n R2 = for the simple


Chapter 5: Regression Analysis

12. Excel file TV Viewing.xls provides sample data on the number of hours of TV viewing per week for six age
groups.
a. Using all the data, develop a simple linear regression model for estimating TV viewing time
as a function of age.
b. Is a linear model appropriate? If not, propose an alternative model.

a.

150
TV Viewing Time (y)

100 f(x) = 0.998060629511521 x + 37.2795789024244


R² = 0.688974564233461
50

0
10 20 30 40 50 60 70 80 90
Age (x)

b. Although you can achieve a slightly higher R2 = .71 with a polynomial model (see below),
the linear model is appropriate for this data and R2 = .69 is relatively high.

150
TV Viewing Time (y)

f(x)
100= − 0.000373274 x³ + 0.061980486 x² − 2.100023441 x + 82.22709498
R² = 0.711759988339481
50

0
10 20 30 40 50 60 70 80 90
Age (x)
Age TV hours/week Age X Age^2 x^2 Age^3 X^3
21 48 21 441 9261 SUMMARY OUTPUT
wing per week for six age 21 47 21 441 9261
18 73 18 324 5832 Regression Statistics
stimating TV viewing time 23 65 23 529 12167 Multiple R
19 74 19 361 6859 R Square
19 50 19 361 6859 Adjusted R Square
20 57 20 400 8000 Standard Error
24 64 24 576 13824 Observations
21 70 21 441 9261
23 51 23 529 12167 ANOVA
20 54 20 400 8000
21 63 21 441 9261 Regression
23 67 23 529 12167 Residual
24 75 24 576 13824 Total
95789024244 19 61 19 361 6859
24 51 24 576 13824 Coefficients
23 47 23 529 12167 Intercept
18 76 18 324 5832 X Variable 1
20 63 20 400 8000 X Variable 2
19 72 19 361 6859 X Variable 3
70 80 90 22 59 22 484 10648
18 57 18 324 5832
20 51 20 400 8000
24 62 24 576 13824
mial model (see below), 22 68 22 484 10648
20 46 20 400 8000
21 64 21 441 9261
20 69 20 400 8000
19 57 19 361 6859
24 57 24 576 13824
23 56 23 529 12167
22 62 22 484 10648
22 37 22 484 10648
20 69 20 400 8000
41 x + 82.22709498 23 75 23 529 12167
23 52 23 529 12167
22 78 22 484 10648
23 63 23 529 12167
21 41 21 441 9261
19 50 19 361 6859
70 80 90 24 65 24 576 13824
18 62 18 324 5832
22 73 22 484 10648
21 50 21 441 9261
21 56 21 441 9261
30 61 30 900 27000
28 78 28 784 21952
26 72 26 676 17576
30 65 30 900 27000
34 73 34 1156 39304
34 69 34 1156 39304
29 54 29 841 24389
34 74 34 1156 39304
29 70 29 841 24389
30 57 30 900 27000
33 86 33 1089 35937
30 55 30 900 27000
30 64 30 900 27000
26 67 26 676 17576
32 71 32 1024 32768
33 57 33 1089 35937
27 71 27 729 19683
27 70 27 729 19683
29 87 29 841 24389
31 58 31 961 29791
34 62 34 1156 39304
33 91 33 1089 35937
29 63 29 841 24389
25 69 25 625 15625
28 79 28 784 21952
32 75 32 1024 32768
32 56 32 1024 32768
26 77 26 676 17576
30 86 30 900 27000
33 80 33 1089 35937
27 87 27 729 19683
25 67 25 625 15625
25 63 25 625 15625
26 70 26 676 17576
25 76 25 625 15625
26 70 26 676 17576
36 76 36 1296 46656
35 75 35 1225 42875
42 69 42 1764 74088
36 70 36 1296 46656
35 70 35 1225 42875
43 64 43 1849 79507
39 53 39 1521 59319
37 78 37 1369 50653
37 71 37 1369 50653
36 70 36 1296 46656
40 76 40 1600 64000
43 75 43 1849 79507
40 66 40 1600 64000
42 61 42 1764 74088
44 70 44 1936 85184
37 77 37 1369 50653
44 72 44 1936 85184
40 63 40 1600 64000
38 61 38 1444 54872
38 74 38 1444 54872
40 64 40 1600 64000
40 63 40 1600 64000
44 71 44 1936 85184
36 64 36 1296 46656
43 62 43 1849 79507
37 62 37 1369 50653
44 76 44 1936 85184
40 55 40 1600 64000
44 73 44 1936 85184
37 63 37 1369 50653
40 70 40 1600 64000
41 71 41 1681 68921
38 70 38 1444 54872
35 70 35 1225 42875
41 69 41 1681 68921
39 71 39 1521 59319
40 54 40 1600 64000
44 65 44 1936 85184
38 62 38 1444 54872
40 61 40 1600 64000
38 62 38 1444 54872
36 62 36 1296 46656
49 95 49 2401 117649
54 80 54 2916 157464
52 105 52 2704 140608
48 83 48 2304 110592
49 89 49 2401 117649
51 90 51 2601 132651
49 72 49 2401 117649
51 94 51 2601 132651
48 72 48 2304 110592
45 103 45 2025 91125
50 99 50 2500 125000
50 84 50 2500 125000
50 93 50 2500 125000
47 108 47 2209 103823
47 82 47 2209 103823
54 88 54 2916 157464
54 93 54 2916 157464
53 82 53 2809 148877
52 106 52 2704 140608
46 89 46 2116 97336
51 110 51 2601 132651
49 87 49 2401 117649
51 94 51 2601 132651
46 76 46 2116 97336
49 74 49 2401 117649
46 90 46 2116 97336
52 83 52 2704 140608
45 91 45 2025 91125
46 98 46 2116 97336
47 71 47 2209 103823
53 81 53 2809 148877
54 73 54 2916 157464
48 87 48 2304 110592
52 106 52 2704 140608
63 103 63 3969 250047
61 110 61 3721 226981
55 99 55 3025 166375
56 109 56 3136 175616
57 93 57 3249 185193
62 95 62 3844 238328
61 116 61 3721 226981
56 103 56 3136 175616
56 93 56 3136 175616
59 91 59 3481 205379
59 74 59 3481 205379
61 81 61 3721 226981
60 74 60 3600 216000
60 95 60 3600 216000
56 111 56 3136 175616
57 108 57 3249 185193
59 88 59 3481 205379
58 92 58 3364 195112
59 101 59 3481 205379
64 90 64 4096 262144
64 109 64 4096 262144
62 87 62 3844 238328
59 92 59 3481 205379
56 86 56 3136 175616
60 135 60 3600 216000
61 87 61 3721 226981
55 90 55 3025 166375
63 90 63 3969 250047
59 80 59 3481 205379
79 117 79 6241 493039
80 113 80 6400 512000
78 117 78 6084 474552
76 124 76 5776 438976
70 126 70 4900 343000
67 120 67 4489 300763
76 119 76 5776 438976
69 116 69 4761 328509
74 113 74 5476 405224
71 116 71 5041 357911
68 113 68 4624 314432
77 110 77 5929 456533
67 128 67 4489 300763
73 107 73 5329 389017
79 110 79 6241 493039
71 116 71 5041 357911
74 123 74 5476 405224
80 124 80 6400 512000
77 113 77 5929 456533
75 116 75 5625 421875
ression Statistics
0.843659
0.71176
0.707479
10.95873
206

df SS MS F Significance F
3 59903.37 19967.79 166.2683 2.62E-54
202 24258.95 120.0938
205 84162.32

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
82.22709 14.24763 5.771284 2.93E-08 54.13398 110.3202 54.13398 110.3202
-2.10002 1.067794 -1.96669 0.050588 -4.20547 0.005427 -4.20547 0.005427
0.06198 0.024179 2.563436 0.011091 0.014306 0.109655 0.014306 0.109655
-0.00037 0.000169 -2.21406 0.027943 -0.00071 -4.08E-05 -0.00071 -4.08E-05
Chapter 5: Regression Analysis

13. A deep foundation engineering contractor has bid on a foundation system for a new world headquarters
building for a Fortune 500 company. A part of the project consists of installing 311 augercast piles. The
contractor was given bid information for cost estimating purposes, which consisted of the estimated
depth of each pile; however, actual drill footage of each pile could not be determined exactly until
construction was performed. The Excel file Pile Foundation.xls contains the estimates and actual pile
lengths after the project was completed.
a. Develop a linear regression model to estimate the actual pile length as a function of the
estimated pile lengths. What do you conclude?

Note: The decimal point was missing for pile number 225.

100.00
80.00
60.00
f(x) = 0.814189160265947 x + 11.6143770642259
Actual

40.00 R² = 0.635094810583462
20.00
0.00
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00
Estimate

The estimate explains 63% of the variation in the actual drill footage (63% accurate?).
Pile X Y
Number Estimated Actual
ew world headquarters 1 10.58 18.58 SUMMARY OUTPUT
1 augercast piles. The 2 10.58 18.58
d of the estimated 3 10.58 18.58 Regression Statistics
ned exactly until 4 10.58 18.58 Multiple R 0.796928
mates and actual pile 5 10.58 28.58 R Square 0.635095
6 10.58 26.58 Adjusted R 0.633914
th as a function of the 7 10.58 17.58 Standard E 9.886816
8 10.58 27.58 Observatio 311
9 10.58 27.58
10 10.58 37.58 ANOVA
11 10.58 28.58 df SS MS
12 5.83 1.83 Regressio 1 52569.02 52569.02
13 5.83 8.83 Residual 309 30204.48 97.74913
14 5.83 8.83 Total 310 82773.5
15 5.83 8.83
16 10.83 16.83 Coefficients
Standard Error t Stat
17 10.83 19.83 Intercept 11.61438 1.137095 10.21408
18 10.83 18.33 X Variable 0.814189 0.035109 23.19041
19 10.83 14.83
20 21.33 12
21 16.33 27
22 21.33 42
23 21.33 46
24 21.33 46
25 31.33 57
26 41.08 55.75
27 41.08 55.75
28 41.08 55.75
29 46.33 57
30 61.33 77
31 61.33 76
32 61.33 75
33 61.33 75
34 61.33 36
35 61.33 56.42
36 61.33 45
37 61.33 45.42
38 61.33 37
39 11.33 8
40 5.83 2.83
41 5.83 2.83
42 5.83 2.83
43 5.83 2.83
44 15.83 14.83
45 15.83 16.83
46 15.83 16.83
47 15.83 16.83
48 11.33 11
49 21.08 42.75
50 21.08 43.17
51 21.08 43.17
52 21.08 42.75
53 21.08 42.75
54 41.08 31.75
55 41.08 31.75
56 41.08 31.75
57 41.08 36.75
58 41.08 44.75
59 61.08 36.75
60 61.08 55.75
61 61.08 40.17
62 61.08 56.75
63 61.33 26.42
64 61.33 66.00
65 11.33 13.33
66 20.08 38.75
67 20.08 38.75
68 20.08 38.75
69 20.08 38.75
70 20.08 38.75
71 20.05 38.75
72 40.08 54.75
73 40.08 25.75
74 40.08 25.75
75 40.08 54.75
76 40.08 23.75
77 40.08 73.08
78 60.08 57.75
79 60.08 56.75
80 60.08 57.75
81 60.08 58.17
82 61.33 62.00
83 8.42 9.09
84 20.58 16.25
85 20.58 16.25
86 20.55 16.25
87 20.58 16.25
88 20.58 16.25
89 20.55 16.25
90 23.50 29.17
91 20.52 21.19
92 20.52 20.19
93 20.52 20.19
94 20.52 36.19
95 20.52 26.19
96 20.52 22.19
97 38.42 29.51
98 35.83 26.51
99 35.83 27.09
100 35.83 34.51
101 35.83 37.09
102 35.83 31.51
103 34.75 55.42
104 34.75 28.84
105 4.75 14.42
106 19.75 22.42
107 34.75 42.42
108 34.75 55.42
109 4.75 14.42
110 17.25 21.92
111 17.25 21.92
112 17.25 21.92
113 17.25 21.92
114 17.25 21.92
115 17.25 21.92
116 19.75 22.42
117 28.42 32.09
118 28.42 32.09
119 28.42 32.09
120 28.42 32.09
121 28.42 32.09
122 28.42 32.09
123 35.92 46.59
124 40.53 55.51
125 40.83 56.51
126 40.83 67.09
127 40.83 57.09
128 40.83 67.09
129 53.42 55.09
130 50.83 50.34
131 50.83 50.34
132 50.83 45.34
133 50.83 47.76
134 53.42 41.51
135 53.42 41.09
136 5.58 19.25
137 5.58 19.25
138 5.58 19.25
139 6.58 19.25
140 5.58 19.25
141 5.58 19.25
142 20.58 28.25
143 20.58 28.25
144 20.55 28.25
145 20.58 28.25
146 20.58 28.25
147 20.58 28.25
148 28.75 32.42
149 28.75 32.42
150 28.75 32.42
151 28.75 32.42
152 28.75 32.42
153 28.75 32.42
154 39.25 56.92
155 39.25 60.92
156 39.25 59.92
157 39.25 59.34
158 39.25 57.92
159 49.75 47.42
160 49.75 48.42
161 49.75 48.42
162 49.75 48.42
163 53.42 45.09
164 13.42 22.09
165 6.97 26.64
166 6.97 26.64
167 6.97 26.64
168 8.42 22.09
169 21.67 29.34
170 21.67 29.34
171 21.67 29.34
172 28.42 37.09
173 31.67 34.34
174 31.67 35.34
175 31.67 29.34
176 38.42 59.09
177 41.67 61.34
178 41.67 60.34
179 41.67 61.34
180 53.42 49.09
181 13.42 11.59
182 13.42 14.09
183 10.83 26.50
184 10.83 24.50
185 10.83 33.92
186 10.83 32.92
187 10.83 16.09
188 15.42 18.51
189 15.42 28.09
190 15.42 29.09
191 15.42 29.09
192 9.42 14.09
193 9.42 14.09
194 9.42 14.09
195 9.42 14.09
196 9.42 14.09
197 10.58 23.25
198 10.58 23.25
199 10.55 23.25
200 10.58 23.25
201 10.58 23.25
202 10.55 26.25
203 20.55 29.25
204 20.55 31.25
205 20.58 30.25
206 20.55 31.25
207 20.58 31.25
208 20.55 32.25
209 30.58 31.25
210 30.55 32.25
211 30.58 31.25
212 30.58 32.25
213 30.58 31.25
214 30.55 29.25
215 40.83 61.50
216 40.83 61.50
217 40.83 58.50
218 40.83 57.92
219 40.83 57.50
220 50.83 51.50
221 50.83 51.50
222 50.83 49.50
223 50.83 48.50
224 63.42 49.09
225 53.42 52.09
226 22.58 23.25
227 22.58 21.25
228 20.83 22.50
229 20.83 22.50
230 20.83 27.50
231 20.83 25.92
232 22.68 37.25
233 20.83 22.92
234 20.83 26.92
235 20.83 21.50
236 20.83 22.50
237 24.83 33.50
238 24.83 33.50
239 24.83 31.50
240 24.83 32.50
241 10.83 14.50
242 10.83 17.50
243 10.83 14.50
244 10.83 14.50
245 11.67 17.50
246 11.67 17.50
247 11.67 17.50
248 11.67 17.50
249 20.83 26.50
250 20.83 31.50
251 20.83 28.50
252 20.83 26.50
253 25.83 34.50
254 25.83 34.50
255 25.83 34.50
256 25.83 34.50
257 30.83 29.92
258 30.83 30.92
259 30.83 31.50
260 30.83 34.50
261 40.83 66.50
262 40.83 69.50
263 40.83 68.50
264 40.53 66.50
265 50.83 53.34
266 50.83 54.34
267 50.83 61.76
268 50.83 51.34
269 53.42 52.09
270 53.42 52.09
271 53.42 52.09
272 22.58 38.25
273 22.58 43.25
274 22.58 50.25
275 20.83 29.50
276 20.83 29.50
277 20.83 38
278 25.42 31.69
279 26.67 27.34
280 26.67 32.34
281 26.67 26.76
282 23.42 10.51
283 11.67 18.34
284 11.67 18.34
285 11.67 18.34
286 13.42 8.09
287 11.67 22.34
288 11.67 21.34
289 11.67 21.34
290 13.42 16.09
291 21.67 29.34
292 21.67 30.34
293 21.67 30.34
294 28.42 31.09
295 26.67 24.34
296 26.67 35.34
297 26.67 35.34
298 28.42 37.09
299 31.67 37.34
300 31.67 37.34
301 31.67 39.76
302 38.42 63.09
303 41.67 67.34
304 41.67 67.34
305 41.67 67.34
306 53.42 64.09
307 51.67 51.34
308 51.67 50.76
309 51.67 46.34
310 53.42 50.09
311 53.42 49.09

63.42 77.00
F Significance F
537.7953 1.29E-69

P-value Lower 95%Upper 95%Lower 95.0%


Upper 95.0%
2.71E-21 9.376951 13.8518 9.376951 13.8518
1.29E-69 0.745107 0.883272 0.745107 0.883272
Chapter 5: Regression Analysis

14. The file 1999 Baseball Data.xls contains data for professional baseball teams for 1999, including their
total payroll, winning percentage, batting average, home runs, runs, runs batted in, earned run average,
and pitching saves.
a. Develop a multiple regression equation for predicting the winning percentage as a function
of all the other variables. How good is your model? Is multicollinearity a problem?
b. Find the best set of independent variables that predict the winning percentage by
examining the correlation matrix.
c. Find the best set of independent variables that predict the winning percentage using best
subsets regression.
d. Find the best set of independent variables that predict the winning percentage using
stepwise regression.

a. SUMMARY OUTPUT

Regression Statistics
Multiple R 0.9758640096
R Square 0.9523105651
Adjusted R Squ0.9371366541
Standard Error 0.0191261557
Observations 30

ANOVA
df SS MS F
Regression 7 0.1607068837 0.0229581262 62.759730132
Residual 22 0.0080478163 0.0003658098
Total 29 0.1687547

Coefficients Standard Error t Stat P-value


Intercept 0.3039161185 0.1254830654 2.4219691922 0.0241268629
Payroll 3.791855E-10 2.281527E-10 1.661981282 0.1107011082
Avg 0.1332523285 0.7153080512 0.1862866331 0.8539270429
HR 0.0001717022 0.0002118592 0.8104543053 0.426359703
R 0.0001893587 0.0005771368 0.328100253 0.7459380636
RBI 0.0002724922 0.0006238626 0.4367823588 0.6665262106
ERA -0.073408488 0.0106600266 -6.886332555 6.467126E-07
Saves 0.0021375293 0.0007491865 2.8531336493 0.0092461977

The above regression model has a very high R2 = .95.


Multicollinearity is very high (e.g. runs and rbi's have a correlation of .99), see correlation
matrix below.

b. Correlation Matrix

Winning % Payroll Avg HR


Winning % 1
Payroll 0.6828230462 1
Avg 0.3987267932 0.4133376828 1
HR 0.5115858887 0.4795985221 0.3155628297 1
R 0.6987000374 0.5413722001 0.7584878535 0.7113485916
RBI 0.7059412131 0.5581241335 0.75808605 0.7239255273
ERA -0.698218024 -0.377853193 0.1576221487 0.0184234986
Saves 0.7562266827 0.4036819895 0.1976175322 0.2769375056

RBI ERA Saves


Winning %
Payroll
Avg
HR
R
RBI 1
ERA -0.066898007 1
Saves 0.3966051539 -0.609783178 1

Select the four independent variables that have the highest correlation with the dependent
variable, leaving out runs because of multicollinearity.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.9749021651
R Square 0.9504342315
Adjusted R Squ0.9425037086
Standard Error 0.0182914804
Observations 30

ANOVA
df SS MS F
Regression 4 0.1603902436 0.0400975609 119.84508927
Residual 25 0.0083644564 0.0003345783
Total 29 0.1687547

Coefficients Standard Error t Stat P-value


Intercept 0.3148870066 0.0596623301 5.2778194525 1.819229E-05
RBI 0.0005272214 5.938142E-05 8.8785586742 3.340038E-09
ERA -0.071248068 0.0088069691 -8.089964584 1.915206E-08
Saves 0.0021766864 0.0007080789 3.0740730597 0.0050502186
Payroll 4.134863E-10 2.005378E-10 2.0618867164 0.0497586861

Using RBI's, ERA, Saves, and Payroll yields an R2 = .95 and Adjusted R2 = .94

c. Best subset multiple regression.

Model Cp k R Square Adj. R Square Std. Error


X1X2X3X4X5X 8 8 0.9523105651 0.9371366541 0.0191261557
X1X2X3X5X6X 6.107649776 7 0.9520772126 0.9395756158 0.0187514586
X1X2X4X5X6X 6.6568361809 7 0.9508867403 0.9380745856 0.0189829367
X1X2X4X6X7 5.1402381585 6 0.9498388691 0.9393886335 0.0187804493
X1X2X5X6X7 4.706791222 6 0.9507784527 0.9405239637 0.0186037266
X1X3X4X5X6X 6.0347027097 7 0.95223534 0.9397749939 0.0187204966
X1X3X4X6X7 4.2659298849 6 0.9517341085 0.9416787144 0.018422242
X1X3X5X6X7 4.1393010935 6 0.9520086019 0.942010394 0.0183697827
X1X4X5X6X7 4.8281070018 6 0.9505154763 0.9402062005 0.0186533576
X1X4X6X7 3.3026336809 5 0.949486844 0.9414047391 0.0184654623
X1X5X6X7 2.8655866767 5 0.9504342315 0.9425037086 0.0182914804

Note that the last model, with the highest adjusted R2, is the model picked in b.

d. Table of Results for General Stepwise

Saves entered.
RBI entered.
ERA entered.
Payroll entered.

Which also results in the same model selected in b. and c. above.


Y X Batting
Team League Winning Payroll Avg HR R
1999, including their Anaheim American 0.432 $51,340,297 0.256 158 711
n, earned run average, Arizona National 0.617 $70,046,818 0.277 216 908
Atlanta National 0.636 $79,256,599 0.266 197 840
percentage as a function Baltimore American 0.481 $75,443,363 0.279 203 851
arity a problem? Boston American 0.580 $72,330,656 0.278 176 836
g percentage by Chicago WAmerican 0.466 $24,535,000 0.277 162 777
Chicago C National 0.414 $55,419,648 0.257 189 747
g percentage using best Cincinnati National 0.589 $38,031,285 0.272 209 865
Cleveland American 0.599 $73,531,692 0.289 209 1009
g percentage using Colorado National 0.444 $54,367,504 0.288 223 906
Detroit American 0.429 $36,954,666 0.261 212 747
Florida National 0.395 $14,650,000 0.263 128 691
Houston National 0.599 $56,389,000 0.267 168 823
Kansas CitAmerican 0.398 $16,557,000 0.282 151 856
Los Angel National 0.475 $76,607,247 0.266 187 793
Milwaukee National 0.460 $42,976,575 0.273 165 815
Minnesota American 0.394 $15,845,000 0.264 105 686
Montreal National 0.420 $15,015,250 0.265 163 718
New York American 0.605 $91,990,955 0.282 193 900
New York National 0.595 $71,510,523 0.279 181 853
Oakland American 0.537 $25,208,858 0.259 235 893
Philadelph National 0.475 $30,441,500 0.275 161 841
Pittsburgh National 0.484 $23,682,420 0.259 171 775
Significance F San Diego National 0.457 $46,507,179 0.252 153 710
4.517097E-13 Seattle American 0.488 $45,351,254 0.269 244 859
San FranciNational 0.531 $45,991,934 0.271 188 872
St. Louis National 0.466 $46,337,129 0.262 194 809
Tampa Ba American 0.426 $37,860,451 0.274 145 772
Texas American 0.586 $80,801,598 0.293 230 945
Toronto American 0.519 $48,847,300 0.280 212 883

of .99), see correlation

CORREL
1
0.9966714544
-0.061914571
0.3879442345

ation with the dependent

Significance F
6.305265E-16

sted R2 = .94

Consider
This Model?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes

el picked in b.
Pitching Y X
RBI ERA Saves Winning % Payroll SUMMARY OUTPUT
673 4.79 37 0.432 $51,340,297
865 3.77 42 0.617 $70,046,818 Regression Statistics
791 3.65 45 0.636 $79,256,599 Multiple R 0.682823
804 4.77 33 0.481 $75,443,363 R Square 0.466247
808 4.00 50 0.580 $72,330,656 Adjusted R 0.447185
742 4.92 39 0.466 $24,535,000 Standard E 0.056718
717 5.27 32 0.414 $55,419,648 Observatio 30
820 3.99 55 0.589 $38,031,285
960 4.91 46 0.599 $73,531,692 ANOVA
863 6.03 33 0.444 $54,367,504 df
704 5.22 33 0.429 $36,954,666 Regressio 1
655 4.90 33 0.395 $14,650,000 Residual 28
784 3.84 48 0.599 $56,389,000 Total 29
800 5.35 29 0.398 $16,557,000
761 4.45 37 0.475 $76,607,247 Coefficients
777 5.08 40 0.460 $42,976,575 Intercept 0.386468
643 5.03 34 0.394 $15,845,000 X Variable 2.32E-09
680 4.69 44 0.420 $15,015,250
855 4.16 50 0.605 $91,990,955
814 4.27 49 0.595 $71,510,523
845 4.76 48 0.537 $25,208,858
797 4.93 32 0.475 $30,441,500
735 4.35 34 0.484 $23,682,420
671 4.47 43 0.457 $46,507,179
825 5.25 40 0.488 $45,351,254
828 4.71 42 0.531 $45,991,934
763 4.76 38 0.466 $46,337,129
728 5.06 45 0.426 $37,860,451
897 5.07 47 0.586 $80,801,598
856 4.93 39 0.519 $48,847,300
RY OUTPUT

SS MS F Significance F
0.078681 0.078681 24.45875 3.22E-05
0.090073 0.003217
0.168755

Standard Error t Stat P-value Lower 95%Upper 95%


Lower 95.0%
Upper 95.0%
0.025165 15.35714 3.62E-15 0.334919 0.438017 0.334919 0.438017
4.7E-10 4.945579 3.22E-05 1.36E-09 3.29E-09 1.36E-09 3.29E-09
Chapter 5: Regression Analysis

15. The State of Ohio Department of Education has a mandated 9th grade proficiency test that covers
writing, reading, mathematics, citizenship (social studies), and science. The Excel file Ohio Education
Performance.xls provides data on success rates (defined as the percentage of students passing) in
school districts in the Greater Cincinnati metropolitan area along with state averages.
a. Develop a multiple regression model to predict math success as a function of success in
all other subjects. Is multicollinearity a problem.
b. Develop the best regression model to predict math success as a function of success in
the other subjects by examining the correlation matrix.
c. Develop the best regression model to predict math success as a function of success in
the other subjects using best subsets regression.
d. Develop the best regression model to predict math success as a function of success in
the other subjects using stepwise regression.

a. SUMMARY OUTPUT

Regression Statistics
Multiple R 0.9393726812
R Square 0.8824210342
Adjusted R Squ0.8643319625
Standard Error 5.5555151986
Observations 31

ANOVA
df SS MS F
Regression 4 6022.3812325 1505.5953081 48.78199671
Residual 26 802.45747717 30.863749122
Total 30 6824.8387097

Coefficients Standard Error t Stat P-value


Intercept -23.19481099 10.409106704 -2.228319072 0.034720525
Writing 0.0909709778 0.182468793 0.4985563631 0.6222836232
Reading 0.180051632 0.396307037 0.4543235805 0.6533662081
Citizenship 0.1194335286 0.1707391246 0.6995088494 0.4904455593
Science 0.7807428634 0.2874752783 2.7158608839 0.0115926699

It appears that there will be a multicollinearity problem (see correlation matrix below).

b. Math Writing Reading Citizenship


Math 1
Writing 0.6978461074 1
Reading 0.9092573628 0.8009283888 1
Citizenship 0.8579620028 0.5934772215 0.8597044556 1
Science 0.9353020802 0.6984380122 0.9488165876 0.8950932872

Due to multicollinearity, select only the highest correlation (Science has extremely high
correlation with math) and use simple regression.

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9353020802
R Square 0.8747899812
Adjusted R Squ0.8704723943
Standard Error 5.4283362007
Observations 31

ANOVA
df SS MS F
Regression 1 5970.3005263 5970.3005263 202.61085887
Residual 29 854.53818334 29.466833908
Total 30 6824.8387097

Coefficients Standard Error t Stat P-value


Intercept -12.90859859 5.5037465834 -2.345420232 0.0260567452
Science 1.0751986308 0.0755365845 14.234144121 1.287741E-14

c. Best subsets multiple regression.

Model Cp k R Square Adj. R Square Std. Error


X1X2X3X4 5 5 0.8824210342 0.8643319625 5.5555151986
X1X2X4 3.4893126303 4 0.8802082314 0.8668980349 5.5027249123
X1X3X4 3.2064099158 4 0.8814875933 0.8683195481 5.4732618111
X1X4 1.8288192365 3 0.8786728916 0.8700066695 5.4380864089
X2X3X4 3.2485584472 4 0.8812969863 0.8681077626 5.4776614459
X2X4 1.6312520879 3 0.8795663431 0.8709639391 5.4180264041
X3X4 2.2071805183 3 0.8769618405 0.8681734005 5.4762982604
X4 0.6874393957 2 0.8747899812 0.8704723943 5.4283362007

The model with science and reading as independent variables (X2X4) has a slightly higher
adjusted R2 value than the simple regression model with just science.

d. Table of Results for General Stepwise

Science entered.

Stepwise regression agrees with the simple regression model using science as the
independent variable.
School DiMath Writing Reading Citizens Science All
Indian Hill 89 95 98 95 91 83
y test that covers Wyoming 86 98 96 93 87 81
el file Ohio Education Mason Cit 85 96 92 94 86 72
udents passing) in Madiera 88 94 95 82 88 69
Mariemont 74 99 92 89 88 68
a function of success in Sycamore 80 85 88 87 84 68
Forest Hill 73 93 91 85 88 67
unction of success in Kings Loca 78 92 86 82 78 64
Lakota 73 90 88 85 81 64
unction of success in Loveland 72 85 93 86 86 61
Southwest 73 92 82 78 73 58
unction of success in Fairf ield 71 90 86 83 77 57
Oak Hills 75 88 86 77 79 57
Three Rive 66 87 85 84 77 56
Milford 72 82 86 82 76 53
Ross 66 84 85 75 78 52
West Cler 63 88 83 73 70 48
Reading 58 88 80 76 75 46
Princeton 59 83 75 76 63 46
Finneytow 61 79 71 67 62 45
Norwood 64 86 77 75 67 44
Lockland 52 88 79 82 64 41
Franklin Ci 49 85 79 70 67 40
Winton Wo 55 82 77 65 59 40
Northwest 51 75 74 62 61 38
Significance F North Colle 50 77 76 66 57 35
1.023771E-11 Mount Heal 40 87 72 62 53 32
Felicity Fr 52 52 64 81 64 28
St. Bernar 40 81 59 41 48 26
Deer Park 40 69 66 43 52 25
Cincinnati 35 63 59 50 44 23

ation matrix below).

ce has extremely high


Significance F
1.287741E-14

Consider
This Model?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes

2X4) has a slightly higher

ng science as the
Chapter 5: Regression Analysis

16. A national homebuilder builds single-family homes and condominium style townhouses. The Excel file
House Sales Data.xls provides information on the selling price, lot cost, type of home, and region of the
country (M = Midwest, S = South) for closings during one month.
a. Develop a multiple regression model for sales price as a function of lot cost, region of
country, and type of home.
b. Determine if any interactions exist between sales price, region, and type of home.

Note: set up zero-one variables for Region (M=1,S=0) and Type (SF=1, T=0).

a. SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8081105397
R Square 0.6530426444
Adjusted R Squ0.6466958635
Standard Error 48853.475443
Observations 168

ANOVA
df SS MS F
Regression 3 7.36716E+11 2.45572E+11 102.89352274
Residual 164 3.91413E+11 2386662063
Total 167 1.12813E+12

Coefficients Standard Error t Stat P-value


Intercept 53557.56731 17960.741253 2.9819241063 0.0033013681
Lot Cost 3.5052670649 0.2981713639 11.755880978 1.561924E-23
Region -9081.588335 9890.7624738 -0.918188902 0.3598683702
Type 72565.857045 9755.3742424 7.4385518425 5.398865E-12

Note that Region is not significant.

b. Correlation Matrix.

Selling Price Lot Cost Region Type


Selling Price 1
Lot Cost 0.7262879646 1
Region -0.490752071 -0.571613013 1
Type 0.4281750016 0.1079874345 -0.1784362 1

Region and Lot Cost have a moderately high negative correlation.

Selling price is most highly correlated with Lot Cost.


Region Type Selling Pri Lot Cost Region Type
M SF $348,744 $53,000 1 1
ouses. The Excel file M SF $274,455 $41,000 1 1
ome, and region of the M SF $277,720 $44,650 1 1
M SF $307,373 $41,292 1 1
of lot cost, region of M SF $271,105 $45,000 1 1
M SF $262,740 $44,900 1 1
nd type of home. M SF $175,000 $28,000 1 1
M SF $201,700 $40,940 1 1
M SF $283,440 $50,900 1 1
SF=1, T=0). M SF $185,160 $29,000 1 1
M SF $323,716 $34,500 1 1
M SF $281,487 $57,285 1 1
M SF $184,460 $22,300 1 1
M SF $289,000 $44,000 1 1
M SF $410,810 $66,500 1 1
M SF $184,210 $28,000 1 1
M SF $223,890 $28,000 1 1
M SF $189,120 $35,000 1 1
M SF $230,440 $33,000 1 1
M SF $330,486 $35,000 1 1
M SF $250,005 $33,000 1 1
Significance F M SF $203,950 $33,000 1 1
1.670453E-37 M SF $230,555 $28,000 1 1
M SF $183,370 $28,000 1 1
M T $112,740 $20,700 1 0
M T $179,365 $32,200 1 0
M T $155,870 $24,650 1 0
M T $155,270 $19,600 1 0
M T $116,415 $19,600 1 0
M T $147,905 $24,650 1 0
M T $139,955 $30,400 1 0
M T $184,873 $33,400 1 0
M T $212,079 $33,400 1 0
M T $265,500 $35,800 1 0
M T $175,470 $28,600 1 0
M T $115,350 $18,030 1 0
M T $85,145 $17,030 1 0
M T $139,435 $29,155 1 0
M T $133,070 $24,455 1 0
M SF $165,220 $25,500 1 1
M SF $136,530 $25,500 1 1
M SF $153,845 $27,500 1 1
M SF $165,350 $25,000 1 1
M SF $168,354 $27,316 1 1
M SF $170,000 $25,200 1 1
M SF $210,380 $33,856 1 1
M SF $268,210 $29,700 1 1
M SF $233,900 $44,200 1 1
M SF $168,500 $33,000 1 1
M SF $248,500 $20,000 1 1
M SF $220,257 $31,300 1 1
M SF $214,900 $31,300 1 1
M SF $211,513 $31,300 1 1
M SF $188,603 $31,300 1 1
M T $187,390 $27,000 1 0
S T $335,000 $68,375 0 0
S T $294,450 $73,400 0 0
S T $267,060 $73,400 0 0
S T $250,800 $73,400 0 0
S T $269,410 $73,400 0 0
S T $267,640 $73,400 0 0
S T $260,100 $73,400 0 0
S SF $301,500 $59,000 0 1
S SF $309,075 $82,250 0 1
S SF $290,190 $82,250 0 1
S SF $322,920 $82,250 0 1
S SF $319,602 $82,250 0 1
S SF $294,990 $57,000 0 1
S SF $286,758 $57,000 0 1
S SF $352,781 $60,000 0 1
S SF $310,372 $60,000 0 1
S SF $400,330 $75,510 0 1
S SF $446,507 $75,510 0 1
S T $198,202 $45,025 0 0
S T $200,423 $45,025 0 0
S T $181,916 $45,025 0 0
S T $203,076 $45,025 0 0
S T $196,898 $45,025 0 0
S T $182,237 $45,025 0 0
S T $224,108 $45,025 0 0
S T $230,000 $45,025 0 0
S T $172,749 $45,025 0 0
S SF $318,274 $85,800 0 1
S SF $191,028 $45,000 0 1
S SF $200,119 $45,000 0 1
S SF $242,899 $48,252 0 1
S SF $387,527 $48,000 0 1
S SF $257,040 $37,631 0 1
S SF $270,518 $46,499 0 1
S SF $265,058 $41,404 0 1
S SF $255,000 $43,198 0 1
S SF $385,942 $49,123 0 1
S SF $354,065 $48,115 0 1
S SF $333,158 $49,123 0 1
S SF $254,048 $39,680 0 1
S SF $246,648 $41,600 0 1
S SF $367,600 $50,000 0 1
S SF $318,523 $50,000 0 1
S SF $359,949 $50,591 0 1
S SF $281,824 $50,448 0 1
S SF $355,688 $65,373 0 1
S SF $305,000 $49,067 0 1
S SF $299,096 $43,784 0 1
S SF $280,622 $45,130 0 1
S SF $404,510 $58,225 0 1
S SF $371,152 $58,223 0 1
S SF $219,990 $37,557 0 1
S SF $432,426 $57,422 0 1
S SF $268,000 $43,344 0 1
S SF $312,898 $40,768 0 1
S SF $267,250 $45,676 0 1
S SF $379,000 $72,915 0 1
S SF $342,423 $48,309 0 1
S SF $337,374 $70,399 0 1
S SF $358,162 $44,470 0 1
S SF $398,651 $65,429 0 1
S SF $280,804 $40,667 0 1
S SF $407,076 $48,668 0 1
S SF $268,500 $41,099 0 1
S SF $444,304 $53,938 0 1
S SF $324,266 $47,891 0 1
S SF $307,387 $45,850 0 1
S SF $369,101 $46,773 0 1
S SF $350,702 $46,386 0 1
S SF $329,611 $48,611 0 1
S SF $242,191 $33,434 0 1
S SF $379,424 $64,902 0 1
S SF $324,412 $62,523 0 1
S SF $340,730 $50,850 0 1
S SF $310,100 $41,800 0 1
S SF $354,117 $56,219 0 1
S SF $330,710 $49,920 0 1
S SF $417,790 $63,099 0 1
S SF $290,000 $48,300 0 1
S SF $274,903 $45,345 0 1
S SF $209,400 $43,579 0 1
S SF $205,821 $39,299 0 1
S SF $287,771 $46,300 0 1
S SF $575,120 $79,790 0 1
S SF $226,000 $35,600 0 1
S SF $216,049 $35,600 0 1
S SF $207,345 $35,600 0 1
S SF $211,797 $34,000 0 1
S SF $204,900 $34,000 0 1
S SF $206,400 $35,851 0 1
S SF $186,000 $35,851 0 1
S SF $249,900 $38,200 0 1
S SF $214,205 $36,500 0 1
S SF $256,235 $48,500 0 1
S SF $262,890 $48,500 0 1
S SF $338,065 $54,850 0 1
S SF $326,570 $51,000 0 1
S SF $239,000 $39,169 0 1
S SF $239,870 $41,354 0 1
S SF $241,195 $41,340 0 1
S SF $252,135 $41,341 0 1
S SF $253,055 $41,340 0 1
S SF $160,000 $29,500 0 1
S SF $337,380 $49,150 0 1
S SF $492,820 $84,122 0 1
S SF $385,000 $75,000 0 1
S SF $340,000 $40,000 0 1
S SF $202,000 $31,160 0 1
S SF $234,971 $29,202 0 1
S SF $225,900 $28,618 0 1
S SF $366,990 $55,508 0 1
S SF $307,663 $44,840 0 1
S SF $379,575 $44,294 0 1
Chapter 5: Regression Analysis

17. The Excel file Salary Data.xls provides information on current salary, beginning salary, previous
experience when hired in months, and total years of education for a sample of 100 employees in a firm.
a. Develop a multiple regression model for predicting current salary as a function of the other
variables.
b. Find the best model for predicting current salary.

a. SUMMARY OUTPUT

Regression Statistics
Multiple R 0.896159669
R Square 0.8031021524
Adjusted R Square 0.7969490947
Standard Error 7790.8746383
Observations 100

ANOVA
df SS MS
Regression 3 23766951872 7922317291
Residual 96 5826981852 60697727.63
Total 99 29593933724

Coefficients Standard Error t Stat


Intercept -4139.237674 4203.3581586 -0.984745415
Beginning Salary 1.7302416617 0.1138107661 15.20279426
Previous Experience (months) -10.90710724 7.7709507968 -1.403574353
Education (years) 719.12214722 351.73394061 2.0445059865

b. Remove Previous Experience (not significant in above model).

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8939024564
R Square 0.7990616015
Adjusted R Square 0.7949185417
Standard Error 7829.7329472
Observations 100

ANOVA
df SS MS
Regression 2 23647376076 11823688038
Residual 97 5946557648 61304718.024
Total 99 29593933724

Coefficients Standard Error t Stat


Intercept -6276.569618 3937.3676995 -1.594102989
Beginning Salary 1.6904400361 0.1107711257 15.260655932
Education (years) 852.91089426 340.26047453 2.5066411121
Y X1 X2 X3
Current SaBeginning Education Previous Experience SUMMARY OUTPUT
$57,000 $27,000 15 144
lary, previous $40,200 $18,750 16 36 Regression Statistics
0 employees in a firm. $21,450 $12,000 12 381 Multiple R
as a function of the other $21,900 $13,200 8 190 R Square
$45,000 $21,000 15 138 Adjusted R
$32,100 $13,500 15 67 Standard E
$36,000 $18,750 15 114 Observatio
$21,900 $9,750 12 0
$27,900 $12,750 15 115 ANOVA
$24,000 $13,500 12 244
$30,300 $16,500 16 143 Regressio
$28,350 $12,000 8 26 Residual
$27,750 $14,250 15 34 Total
$35,100 $16,800 15 137
$27,300 $13,500 12 66 Coefficients
$40,800 $15,000 12 24 Intercept
$46,000 $14,250 15 48 Beginning
$103,750 $27,510 16 70 Education
F Significance F $23,700 $13,500 15 359
130.52082178 9.407617E-34 $26,550 $14,250 15 61
$27,600 $15,000 12 75
$25,800 $15,000 12 143
$42,300 $26,250 16 126
P-value $30,750 $15,000 8 451
0.3272245227 $26,700 $12,900 12 18
2.667152E-27 $20,850 $12,000 12 163
0.1636721199 $35,250 $15,000 15 54
0.0436414029 $26,700 $15,000 15 56
$26,550 $13,050 12 11
$27,750 $12,000 12 11
$25,050 $12,750 16 123
$66,000 $47,490 16 150
$52,650 $19,500 16 20
$45,625 $23,250 16 60
$30,900 $15,000 15 25
$29,400 $16,500 15 24
$24,900 $11,250 12 0
$19,650 $10,950 12 11
$22,050 $10,950 12 9
$25,500 $12,000 12 11
$28,200 $12,750 15 19
$23,100 $11,250 12 13
F Significance F $25,500 $11,400 12 9
192.86750545 1.579619E-34 $17,100 $10,200 8 0
$68,125 $32,490 18 29
$30,600 $15,750 12 460
$52,125 $27,480 19 221
P-value $61,875 $36,750 19 199
0.1141663434 $21,300 $11,550 8 24
1.580932E-27 $19,650 $11,250 12 5
0.0138513269 $22,350 $11,250 12 5
$23,400 $11,250 12 18
$24,300 $10,950 12 8
$28,500 $11,250 12 4
$19,950 $11,250 12 8
$23,400 $11,250 12 0
$34,500 $17,250 16 3
$18,150 $10,950 12 0
$21,750 $12,450 8 318
$59,400 $33,750 12 272
$24,450 $14,250 12 117
$103,500 $60,000 16 150
$35,700 $16,500 12 72
$22,200 $16,500 12 7
$22,950 $13,950 15 22
$23,100 $12,000 12 228
$56,750 $30,000 16 15
$29,100 $12,750 17 375
$37,650 $15,750 12 132
$27,900 $13,500 12 32
$21,150 $12,000 8 159
$31,200 $15,750 12 155
$20,550 $11,250 12 154
$20,700 $11,250 12 2
$21,300 $11,250 12 3
$24,300 $15,000 12 121
$19,650 $13,950 12 133
$60,000 $32,490 17 17
$30,300 $15,750 15 55
$61,250 $33,000 19 9
$36,000 $19,500 19 21
$25,200 $18,750 8 344
$30,750 $15,000 12 56
$33,540 $15,750 12 47
$34,950 $20,250 16 55
$40,350 $16,500 15 80
$30,270 $15,750 12 80
$26,250 $16,050 8 264
$32,400 $15,000 15 64
$20,400 $11,250 12 0
$24,150 $12,750 8 96
$23,850 $13,500 15 122
$29,700 $13,500 12 26
$21,600 $13,500 8 228
$24,450 $15,750 12 87
$28,050 $16,500 15 84
$100,000 $44,100 16 128
$49,000 $20,550 15 86
$16,350 $10,200 12 163
$70,000 $21,750 16 19
SUMMARY OUTPUT

Regression Statistics
0.893902
0.799062
0.794919
7829.733
100

df SS MS F Significance F
2 2.36E+10 1.18E+10 192.8675 1.58E-34
97 5.95E+09 61304718
99 2.96E+10

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%
Lower 95.0%
Upper 95.0%
-6276.57 3937.368 -1.5941 0.114166 -14091.2 1538.011 -14091.2 1538.011
1.69044 0.110771 15.26066 1.58E-27 1.47059 1.91029 1.47059 1.91029
852.9109 340.2605 2.506641 0.013851 177.5884 1528.233 177.5884 1528.233
Chapter 5: Regression Analysis

18. The Excel file Cereal Data.xls provides a variety of nutritional information about 67 cereals and their shelf
location in a supermarket. Use regression analysis to determine if a relationship exists between calories
and the other variables. Investigate the model assumptions and clearly explain your conclusions.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8674241518
R Square 0.7524246591
Adjusted R Square 0.7321315983
Standard Error 9.7154511857
Observations 67

ANOVA
df SS MS F
Regression 5 17498.926922 3499.7853843 37.077928706
Residual 61 5757.7894962 94.389991742
Total 66 23256.716418

Coefficients Standard Error t Stat P-value


Intercept 16.50551127 9.4952720551 1.738287347 0.0872058215
Sodium 0.0272954078 0.0161902133 1.6859202083 0.0969202676
Fiber 0.4297366193 0.6105746818 0.7038231884 0.4842209844
Carbs 3.4683591781 0.457334531 7.5838558926 2.290573E-10
Sugars 3.8784647054 0.3610111185 10.743338658 1.068954E-15
Shelf 2.4543920673 1.5553809545 1.5780005922 0.119737715

Only Carbs and Sugars are significant in the above model, create a model with only Carbs
and Sugars as independent variables.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8519372382
R Square 0.7257970578
Adjusted R Square 0.7172282158
Standard Error 9.9820620902
Observations 67

ANOVA
df SS MS F
Regression 2 16879.656349 8439.8281746 84.701884153
Residual 64 6377.0600687 99.641563573
Total 66 23256.716418

Coefficients Standard Error t Stat P-value


Intercept 28.760195322 6.7452659084 4.2637600522 6.750035E-05
Carbs 3.3583269216 0.3601604569 9.3245298225 1.54031E-13
Sugars 3.9055846021 0.3152959993 12.387041417 1.183381E-18

corr. Sodium Fiber Carbs Sugars


sodium 1
carbs 0.2478880465
fibre
sugar

50

Residuals
Sugars Residual Plot
40 0
4 6
Residuals

20
-50
0
-20 0 2 4 6 8 10 12 14 16
-40
Sugars

Calories Sodium Fiber Carbs


Calories 1
Sodium 0.3795661945 1
Fiber -0.365284319 -0.105907642 1
Carbs 0.2615352101 0.2478880465 -0.425119163 1
Sugars 0.5943737231 0.1182181644 -0.153524118 -0.46764486
Shelf 0.0090261418 -0.147796048 0.3028595583 -0.279684717

The residual plots do not exhibit any pattern. The correlation between Carbs and Sugars
is -.47.
Calories Sodium Fiber Carbs Sugars Shelf
70 130 10 5 6 3
als and their shelf 70 260 9 7 5 3
between calories 50 140 14 8 0 3
110 200 1 14 8 3
110 180 1.50 10.50 10 1
110 125 1 11 14 2
130 210 2 18 8 3
90 200 4 15 6 1
90 210 5 13 5 3
120 220 0 12 12 2
110 290 2 17 1 1
120 210 0 13 9 2
110 140 2 13 7 3
110 180 0 12 13 2
110 280 0 22 3 1
100 290 1 21 2 1
Significance F 110 90 1 13 12 2
2.874096E-17 110 180 0 12 13 2
110 140 4 10 7 3
100 80 1 21 0 2
110 220 1 21 3 3
100 140 2 11 10 3
100 190 1 18 5 3
110 125 1 11 13 2
110 200 1 14 11 1
100 0 3 14 7 2
120 160 5 12 10 3
120 240 5 14 12 3
110 135 0 13 12 2
110 280 0 15 9 2
del with only Carbs 100 140 3 15 5 3
110 170 3 17 3 3
120 75 3 13 4 3
110 180 0 14 11 1
120 220 1 12 11 2
110 250 1.50 11.50 10 1
110 170 1 17 6 3
140 170 2 20 9 3
110 260 0 21 3 2
100 150 2 12 6 2
110 180 0 12 12 2
160 150 3 17 13 3
100 220 2 15 6 1
120 190 0 15 9 2
Significance F 130 170 1.50 13.50 10 3
1.043056E-18 120 200 6 11 14 3
100 320 1 20 3 3
50 0 0 13 0 3 Observation
50 0 1 10 0 3
100 135 2 14 6 3
120 210 5 14 12 2
90 0 2 15 6 3
110 240 0 23 2 1
110 290 0 22 3 1
Shelf 90 0 3 20 0 1
80 0 3 16 0 1
90 0 4 19 0 1
110 70 1 9 15 2
110 230 1 16 3 1
90 15 3 15 5 2
110 200 0 21 3 3
140 190 4 15 14 3
100 200 3 16 3 3
110 140 0 13 12 2
100 230 3 17 3 1
100 200 3 17 3 1
110 200 1 18 8 1
50
Carbs Residual Plot
Residuals

0
4 6 8 10 12 14 16 18 20 22 24
-50
Carbs

Sugars

1
0.1100252664

Carbs and Sugars


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.867424
R Square 0.752425
Adjusted R 0.732132
Standard E 9.715451
Observatio 67

ANOVA
df SS MS F Significance F
Regressio 5 17498.93 3499.785 37.07793 2.87E-17
Residual 61 5757.789 94.38999
Total 66 23256.72

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 16.50551 9.495272 1.738287 0.087206 -2.48146 35.49249 -2.48146 35.49249
Sodium 0.027295 0.01619 1.68592 0.09692 -0.00508 0.05967 -0.00508 0.05967
Fiber 0.429737 0.610575 0.703823 0.484221 -0.79118 1.650656 -0.79118 1.650656
Carbs 3.468359 0.457335 7.583856 2.29E-10 2.553862 4.382856 2.553862 4.382856
Sugars 3.878465 0.361011 10.74334 1.07E-15 3.156578 4.600351 3.156578 4.600351
Shelf 2.454392 1.555381 1.578001 0.119738 -0.65579 5.564569 -0.65579 5.564569

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.851937
R Square 0.725797
Adjusted R 0.717228
Standard E 9.982062
Observatio 67

ANOVA
df SS MS F Significance F
Regressio 2 16879.66 8439.828 84.70188 1.04E-18
Residual 64 6377.06 99.64156
Total 66 23256.72

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 28.7602 6.745266 4.26376 6.75E-05 15.28499 42.2354 15.28499 42.2354
Carbs 3.358327 0.36016 9.32453 1.54E-13 2.638824 4.077829 2.638824 4.077829
Sugars 3.905585 0.315296 12.38704 1.18E-18 3.275709 4.53546 3.275709 4.53546

RESIDUAL OUTPUT PROBABILITY OUTPUT

Observation
Predicted Calories
Residuals Percentile Calories
1 68.98534 1.014662 0.746269 50
2 71.79641 -1.79641 2.238806 50
3 55.62681 -5.62681 3.731343 50
4 107.0214 2.978551 5.223881 70
5 103.0785 6.921526 6.716418 70
6 120.38 -10.38 8.208955 80
7 120.4548 9.545243 9.701493 90
8 102.5686 -12.5686 11.19403 90
9 91.94637 -1.94637 12.68657 90
10 115.9271 4.072866 14.1791 90
11 89.75734 20.24266 15.67164 90
12 107.5687 12.43129 17.16418 90
13 99.75754 10.24246 18.65672 100
14 119.8327 -9.83272 20.14925 100
15 114.3601 -4.36014 21.64179 100
16 107.0962 -7.09623 23.13433 100
17 119.2855 -9.28546 24.62687 100
18 119.8327 -9.83272 26.1194 100
19 89.68256 20.31744 27.61194 100
20 99.28506 0.714939 29.10448 100
21 111.0018 -1.00181 30.59701 100
22 104.7576 -4.75764 32.08955 100
23 108.738 -8.738 33.58209 100
24 116.4744 -6.47439 35.07463 100
25 118.7382 -8.7382 36.56716 100
26 103.1159 -3.11586 38.0597 110
27 108.116 11.88404 39.55224 110
28 122.6438 -2.64379 41.04478 110
29 119.2855 -9.28546 42.53731 110
30 114.2854 -4.28536 44.02985 110
31 98.66302 1.336978 45.52239 110
32 97.56851 12.43149 47.01493 110
33 88.04078 31.95922 48.50746 110
34 118.7382 -8.7382 50 110
35 112.0215 7.978451 51.49254 110
36 106.4368 3.563199 52.98507 110
37 109.2853 0.714739 54.47761 110
38 131.077 8.923005 55.97015 110
39 111.0018 -1.00181 57.46269 110
40 92.49363 7.506374 58.95522 110
41 115.9271 -5.92713 60.44776 110
42 136.6244 23.37565 61.9403 110
43 102.5686 -2.56861 63.43284 110
44 114.2854 5.714639 64.92537 110
45 113.1535 16.84655 66.41791 110
46 120.38 -0.37998 67.91045 110
47 107.6435 -7.64349 69.40299 110
48 72.41845 -22.4184 70.89552 110
49 62.34346 -12.3435 72.38806 110
50 99.21028 0.78972 73.8806 110
51 122.6438 -2.64379 75.37313 110
52 102.5686 -12.5686 76.86567 110
53 113.8129 -3.81288 78.35821 110
54 114.3601 -4.36014 79.85075 120
55 95.92673 -5.92673 81.34328 120
56 82.49343 -2.49343 82.83582 120
57 92.56841 -2.56841 84.32836 120
58 117.5689 -7.56891 85.8209 120
59 94.21018 15.78982 87.31343 120
60 98.66302 -8.66302 88.80597 120
61 111.0018 -1.00181 90.29851 120
62 133.8133 6.186716 91.79104 120
63 94.21018 5.78982 93.28358 130
64 119.2855 -9.28546 94.77612 130
65 97.56851 2.431493 96.26866 140
66 97.56851 2.431493 97.76119 140
67 120.4548 -10.4548 99.25373 160
Normal Probability Plot
200
Calories

100

0
0 20 40 60 80 100 120
Sample Percentile
Chapter 5: Regression Analysis

19. The Excel file Infant Mortality.xls provides data on infant mortality rate (deaths per 1000 births), female
literacy (percentage who read), and population density (people per square kilometer) for 85 countries.
Develop simple and multiple regression models for the relationship between mortality, population density,
and literacy. Explain all statistical output.

Simple Regression: dependent variable: Infant Mortality, independent variable: Density

200.00
Infant Mortality

150.00
100.00
50.00
f(x) = − 0.009321363395322 x + 53.3489342494644
0.00 R² = 0.034339689822422
0.00 1

Including all data.

200.00
Infant Mortality

150.00
100.00
50.00
f(x) = − 0.020108043003514 x + 54.1414169236885
0.00 R² = 0.002129153486033
0.00 100.00 200.00 300.00 400.00 500.00
Density

Excluding high Density "outliers".

Any way you look at it, there is no statistical relationship between Infant
Mortality and Density (very low R2).

Simple Regression: dependent variable: Infant Mortality, independent variable: Literacy

200.00
Infant Mortality

150.00
100.00 f(x) = − 1.12896263836012 x + 127.203287100057
50.00 R² = 0.711342937520834
0.00
0 20 40 60 80 100 120
Literacy
200.00

Infant Mortality
150.00
100.00 f(x) = − 1.12896263836012 x + 127.203287100057
50.00 R² = 0.711342937520834
0.00
0 20 40 60 80 100 120
Literacy

Clear statistical relationship between Infant Mortality and Literacy, as


Literacy increases, Infant Mortality goes down (R2 = .71).

Multiple Regression: dependent variable: Infant Mortality, independent variables: Density and Literacy

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8586805697
R Square 0.7373323208
Adjusted R Squ 0.730925792
Standard Error 19.863456131
Observations 85

ANOVA
df SS MS F
Regression 2 90819.711535 45409.855767 115.09076886
Residual 82 32353.664936 394.55688946
Total 84 123173.37647

Coefficients Standard Error t Stat P-value


Intercept 128.59611399 5.553822295 23.154524427 1.013594E-37
Density -0.008112553 0.0028481048 -2.848404051 0.005552715
Literacy -1.122777641 0.0757904981 -14.81422696 6.73918E-25

Both Density and Literacy are significant in this model (P-values of .006 and 6.74E-25),
however R2 = .74 for this model is only a small increase over R2 = .71 for simple
regression with Literacy.
Country Mortalit Density Literacy
Australia 7.30 2.30 100
r 1000 births), female Botswana 39.30 2.40 16
ter) for 85 countries. Libya 63.00 2.80 50
ality, population density, Gabon 94.00 4.20 48
Cent. Afri. 137.00 5.00 15
Bolivia 75 6.90 71
Saudi Arab 52.00 7.70 48
Russia 27.00 8.80 100
Somalia 126.00 10.00 14
Paraguay 25.20 11.00 88
Zambia 85.00 11.00 65
Argentina 25.60 12.00 95
Brazil 66.00 18.00 80
Chile 14.60 18.00 93
Peru 54.00 18.00 79
Uruguay 17.00 18.00 96
Venezuela 28.00 22.00 87
Afghanista 168.00 25.00 14
USA 8.10 26.00 97
1000.00
Cameroon 77.00 27.00 45 2000
Liberia 113.00 29.00 29
Tanzania 110.00 29.00 31
Colombia 28.00 31.00 86
U.Arab Em 22.00 32.00 63
Nicaragua 52.50 33.00 57
Panama 16.50 34.00 88
Burkina Fa 118.00 36.00 9
Estonia 19.00 36.00 100
Ecuador 39.00 39.00 86
Iran 60.00 39.00 43
Latvia 21.50 40.00 100
Jordan 34.00 42.00 70
Senegal 76.00 43.00 25
Iraq 67.00 44.00 49
Honduras 45.00 46.00 71
Mexico 35.00 46.00 85
Ethiopia 110.00 47.00 16
Kenya 74.00 49.00 58
Belarus 19.00 50.00 100
Uzbekistan 53.00 50.00 100
Cambodia 112.00 55.00 22
Egypt 76.40 57.00 34
Lithuania 17.00 58.00 98
Malaysia 25.60 58.00 70
Morocco 50.00 63.00 38
Costa Rica 11.00 64.00 93
Syria 43.00 74.00 51
Uganda 112.00 76.00 35
Spain 6.90 77.00 93
Turkey 49.00 79.00 71
Greece 8.20 80.00 89
Georgia 23.00 81.00 100
Azerbaijan 35.00 86.00 100
Gambia 124.00 86.00 16
Ukraine 20.70 87.00 100
Guatemala 57.00 97.00 47
Kuwait 12.50 97.00 67
Cuba 10.20 99.00 93
Indonesia 68.00 102.00 68
Nigeria 75.00 102.00 40
Portugal 9.20 108.00 82
Hungary 12.50 111.00 98
Thailand 37.00 115.00 90
Density and Literacy Poland 13.80 123.00 98
China 52.00 124.00 68
Armenia 27.00 126.00 100
Pakistan 101.00 143.00 21
Domincan 51.50 159.00 82
Italy 7.60 188.00 96
N. Korea 27.70 189.00 99
Burundi 105.00 216.00 40
Vietnam 46.00 218.00 83
Philippines 51.00 221.00 90
Haiti 109.00 231.00 47
Israel 8.60 238.00 89
Significance F El Salvado 41.00 246.00 70
1.569178E-24 India 79.00 283.00 39
Rwanda 117.00 311.00 37
Lebanon 39.50 343.00 73
S. Korea 21.70 447.00 99
Barbados 20.30 605.00 99
Banglades 106.00 800.00 22
Bahrain 25.00 828.00 55
Singapore 5.70 4456.00 84
Hong Kong 5.80 5494.00 64
of .006 and 6.74E-25),
.71 for simple
2000.00 3000.00
Density
3000.00 4000.00
Density
000.00 5000.00
6000.00
Chapter 5: Regression Analysis

20. A mental health agency measured the self-esteem score for randomly selected individuals with
disabilities who were involved in some work activity within the past year. The Excel file Self Esteem.xls
provides the data, including the individuals' marital status, length of work, type of support received (direct
support includes job-related services such as job coaching and counseling), education, and age.
a. Use simple linear regression to determine if there is a relationship between self-esteem
and length of work
b. Use multiple linear regression for predicting self esteem as a function of the other
variables. Investigate possible interaction effects and determine the best model.

Note: Self Esteem omitted for line 50, removed that line from data.

a. Simple Regression: dependent variable: Self Esteem, independent variable: Length of


Work

7
6 f(x) = 0.053887746731874 x + 2.89776786177823
5 R² = 0.646263823385515
Self Esteem

4
3
2
1
0
0 relationship
The statistical 10 20 30Length
between
Length 40of Work
of Work 50and Self
60 Esteem
70 is
positive with R = .65
2

b. First create categorical variables for Support (none=0, direct=1) and Single, Married,
Separated, and Divorced (no=0, yes=1), then take a look at the correlation matrix:

Self Esteem Single Married Separated


Self Esteem 1
Single 0.1483281612 1
Married 0.1935237285 -0.30540095 1
Separated -0.095263601 -0.450225169 -0.231248645 1
Divorced -0.226835488 -0.430098808 -0.220911165 -0.325669474
Support Level 0.5059077796 0.2935115018 0.1057428087 -0.029032945
Length of Work 0.8039053572 0.1764148253 0.1184433775 -0.017693856
Education (year 0.711212317 0.1149726042 0.0741579093 -0.037640785
Age -0.723603317 -0.071318189 -0.125948658 -0.059194643

Divorced Support Level Length of Work Education


Self Esteem
Single
Married
Separated
Divorced 1
Support Level -0.389011631 1
Length of Work -0.277735726 0.5458402689 1
Education (year -0.151840019 0.2741699653 0.7387249131 1
Age 0.2430112068 -0.361226138 -0.722305866 -0.764994317

Consider last 5 independent variables (highest correlations).

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8424977386
R Square 0.7098024395
Adjusted R Squ0.6824253112
Standard Error 0.566821702
Observations 59

ANOVA
df SS MS F
Regression 5 41.649763485 8.3299526969 25.926840482
Residual 53 17.028202617 0.3212868418
Total 58 58.677966102

Coefficients Standard Error t Stat P-value


Intercept 3.3065141029 1.3105481903 2.5230007773 0.014671815
Divorced 0.0816266004 0.1900808541 0.4294309431 0.6693504016
Support Level 0.3023842294 0.1888710093 1.6010092313 0.1153193263
Length of Work 0.0290796175 0.009051352 3.212737462 0.0022379289
Education (year0.1022635802 0.0726743256 1.4071486631 0.1652208676
Age -0.031180568 0.0173163808 -1.800639975 0.0774509288

Divorced, Support Level, and Education are not significant (high P-vales).
Education is also highly correlated with both Age and Length of Work.

Best Model: Length of Work and Age as independent variables.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8300496487
R Square 0.6889824193
Adjusted R Squ0.6778746485
Standard Error 0.5708683714
Observations 59

ANOVA
df SS MS F
Regression 2 40.428087043 20.214043522 62.027065144
Residual 56 18.249879058 0.3258906975
Total 58 58.677966102

Coefficients Standard Error t Stat P-value


Intercept 4.8597124264 0.7158173402 6.7890398201 7.718313E-09
Length of Work 0.0394174901 0.0072234565 5.4568737083 1.14115E-06
Age -0.041653631 0.015019063 -2.773384148 0.0075217849
Marital SSupport Self Est Single Married Divorced Support
Single None 2 1 0 0 0
dividuals with Single None 2 1 0 0 0
cel file Self Esteem.xls Single Direct 3 1 0 0 1
support received (direct Single Direct 3 1 0 0 1
cation, and age. Single None 3 1 0 0 0
between self-esteem Single None 3 1 0 0 0
Single None 3 1 0 0 0
tion of the other Single None 3 1 0 0 0
he best model. Married Direct 3 0 1 0 1
Married None 3 0 1 0 0
Married None 3 0 1 0 0
Separated Direct 3 0 0 0 1
Separated Direct 3 0 0 0 1
nt variable: Length of Separated Direct 3 0 0 0 1
Separated Direct 3 0 0 0 1
Separated None 3 0 0 0 0
Separated None 3 0 0 0 0
Separated None 3 0 0 0 0
Separated None 3 0 0 0 0
Separated None 3 0 0 0 0
Divorced Direct 3 0 0 1 1
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Divorced None 3 0 0 1 0
Single Direct 4 1 0 0 1
Single Direct 4 1 0 0 1
Single Direct 4 1 0 0 1
nd Single, Married, Single Direct 4 1 0 0 1
orrelation matrix: Single Direct 4 1 0 0 1
Single Direct 4 1 0 0 1
Single Direct 4 1 0 0 1
Married None 4 0 1 0 0
Separated Direct 4 0 0 0 1
Separated None 4 0 0 0 0
Separated None 4 0 0 0 0
Separated None 4 0 0 0 0
Divorced None 4 0 0 1 0
Divorced None 4 0 0 1 0
Divorced None 4 0 0 1 0
Single Direct 5 1 0 0 1
Single Direct 5 1 0 0 1
Single Direct 5 1 0 0 1
Single Direct 5 1 0 0 1
Single Direct 5 1 0 0 1
Married Direct 5 0 1 0 1
Married Direct 5 0 1 0 1
Married Direct 5 0 1 0 1
Separated Direct 5 0 0 0 1
Divorced Direct 5 0 0 1 1
Single Direct 6 1 0 0 1
Single None 6 1 0 0 0
Married Direct 6 0 1 0 1
Separated Direct 6 0 0 0 1

Significance F
3.881066E-13

Significance F
6.28027E-15
Length oAge Separated Education (years)
4 52 0 9
4 52 0 9
14 40 0 11
10 46 0 9
12 40 0 12
6 47 0 10
4 50 0 10
5 44 0 10
9 46 0 9
4 47 0 9
4 51 0 10
11 47 1 9
10 51 1 9
12 42 1 10
9 48 1 9
3 46 1 9
14 37 1 12
13 35 1 12
3 43 1 9
4 45 1 10
10 50 0 10
8 46 0 9
2 49 0 11
7 48 0 12
8 45 0 10
9 47 0 9
6 46 0 9
4 47 0 9
3 45 0 10
2 47 0 10
12 28 0 13
37 42 0 12
23 40 0 10
23 38 0 12
10 45 0 9
10 47 0 10
9 47 0 9
23 39 0 12
12 43 1 10
21 39 1 11
10 33 1 12
11 35 1 12
14 45 0 12
12 41 0 13
5 43 0 9
51 28 0 12
40 32 0 14
29 38 0 14
29 39 0 14
21 39 0 13
7 48 0 9
24 29 0 12
32 26 0 13
37 30 1 14
31 26 0 13
58 27 0 12
17 38 0 12
61 28 0 16
64 38 1 14
Chapter 5: Regression Analysis

21. Data collected in 1960 from the National Cancer Institute provides the per capita numbers of cigarettes
sold along with death rates for various forms of cancer (see the Excel file Smoking and Cancer.xls).
Use simple linear regression to determine if a significant relationship exists between the number of
cigarettes sold and each form of cancer.

x=cigarettes sold, y=bladder cancer


8.00

6.00 f(x) = 0.121820815664453 x + 1.08608148755257


R² = 0.495083721111977
4.00

2.00

0.00
10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00

x=cigarettes sold, y=lung cancer


30.00
25.00 f(x) = 0.529077926985499 x + 6.47168624727175
R² = 0.486370253879193
20.00
15.00
10.00
5.00
0.00
10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00

x=cigarettes sold, y=kidney cancer


5.00

4.00

3.00 f(x) = 0.045394075409748 x + 1.66359333305287


R² = 0.237548638792125
2.00

1.00

0.00
10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00

x=cigarettes sold, y=leukemia


10.00

8.00

6.00 f(x) = − 0.007842545753796 x + 7.0251626251415


R² = 0.004689678790632
4.00

2.00
x=cigarettes sold, y=leukemia
10.00

8.00

6.00 f(x) = − 0.007842545753796 x + 7.0251626251415


R² = 0.004689678790632
4.00

2.00

0.00
10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00

Note: Insert decimal point in line 45 for leukemia.

Perhaps surprisingly, the number of cigarettes sold has a slightly higher


statistical relationship with bladder cancer (R2 = .495) than with lung cancer
(R2 = .486). The statistical relationship with kidney cancer is lower
(R2 = .238) and nonexistent with leukemia (R2 = .005)

Note R. A. Fisher's analysis.


# Cigarett Deaths pe Deaths pe Deaths pe Deaths per 100K
State Cigarettes bladder ca lung cancekidney canleukemia
numbers of cigarettes AL 18.20 2.90 17.05 1.59 6.15
g and Cancer.xls). AZ 25.82 3.52 19.80 2.75 6.61
een the number of AR 18.24 2.99 15.98 2.02 6.94
CA 28.60 4.46 22.07 2.66 7.06
CT 31.10 5.11 22.83 3.35 7.20
DE 33.60 4.78 24.55 3.36 6.45
DC 40.46 5.60 27.27 3.13 7.08
FL 28.27 4.46 23.57 2.41 6.07
ID 20.10 3.08 13.58 2.46 6.62
IL 27.91 4.75 22.80 2.95 7.27
IN 26.18 4.09 20.30 2.81 7.00
IA 22.12 4.23 16.59 2.90 7.69
KS 21.84 2.91 16.84 2.88 7.42
KY 23.44 2.86 17.71 2.13 6.41
IA 21.58 4.65 25.45 2.30 6.71
ME 28.92 4.79 20.94 3.22 6.24
MD 25.91 5.21 26.48 2.85 6.81
MA 26.92 4.69 22.04 3.03 6.89
MI 24.96 5.27 22.72 2.97 6.91
MN 22.06 3.72 14.20 3.54 8.28
MS 16.08 3.06 15.60 1.77 6.08
MO 27.56 4.04 20.98 2.55 6.82
MT 23.75 3.95 19.50 3.43 6.90
NB 23.32 3.72 16.70 2.92 7.80
NE 42.40 6.54 23.03 2.85 6.67
NJ 28.64 5.98 25.95 3.12 7.12
NM 21.16 2.90 14.59 2.52 5.95
NY 29.14 5.30 25.02 3.10 7.23
ND 19.96 2.89 12.12 3.62 6.99
OH 26.38 4.47 21.89 2.95 7.38
OK 23.44 2.93 19.45 2.45 7.46
PE 23.78 4.89 12.11 2.75 6.83
RI 29.18 4.99 23.68 2.84 6.35
SC 18.06 3.25 17.45 2.05 5.82
SD 20.94 3.64 14.11 3.11 8.15
TE 20.08 2.94 17.60 2.18 6.59
TX 22.57 3.21 20.74 2.69 7.02
UT 14.00 3.31 12.01 2.20 6.71
VT 25.89 4.63 21.22 3.17 6.56
WA 21.17 4.04 20.34 2.78 7.48
WI 21.25 5.14 20.55 2.34 6.73
WV 22.86 4.78 15.53 3.28 7.38
WY 28.04 3.20 15.92 2.66 5.78
AK 30.34 3.46 25.88 4.32 4.90
Chapter 5: Regression Analysis

Questions 22 - 30 relate to the following data.

The Excel file HATCO.xls (adopted from Hair, Anderson, Tatham, and Black in Multivariate Analysis, 5th
Edition, Prentice-Hall 1998) consists of data related to predicting the level of business (Usage Level)
obtained from a survey of purchasing managers of customers of an industrial supplier, HATCO. The
independent variables are

1. Delivery speed - amount of time it takes to deliver the product once an order is confirmed
2. Price level - perceived level of price charged by product suppliers
3. Price flexibility - perceived willingness of HATCO representatives to negotiate price on all
types of purchases
4. Manufacturing image - overall image of the manufacturer or supplier
5. Overall service - overall level of service necessary for maintaining a satisfactory
relationship between supplier and purchaser
6. Salesforce image - overall image of the manufacturer's salesforce
7. Product quality - perceived level of quality of a particular product
8. Size of firm relative to others in this market (0 = small; 1 = large)

Responses to the first seven variables were obtained using a graphic rating scale, where a 10-centimeter
line was drawn between endpoints labeled "poor" and "excellent." Respondents indicated their
perceptions using a mark on the line, which was measured from the left endpoint. The result was a scale
from 0 to 10 rounded to one decimal place.

22. Develop a correlation matrix, and interpret the ability of each independent variable explain Usage Level.

Usage level Delivery speed Price level Price flexibility


Usage level 1
Delivery speed 0.6763544396 1
Price level 0.0829177168 -0.349225145 1
Price flexibility 0.5580497552 0.509295189 -0.487212589 1
Manuf image 0.2251849989 0.0504142018 0.2721867586 -0.116104083
Overall service 0.7013895004 0.6119006853 0.5129808224 0.0666172816
Sales image 0.2571190609 0.0771152167 0.1862432539 -0.034316102
Prod quality -0.192290179 -0.482630936 0.4697457827 -0.448112014
Size of firm -0.365005592 -0.630652268 0.4279233268 -0.646011135

Manuf. image Overall service Sales image Prod quality


Usage level
Delivery speed
Price level
Price flexibility
Manuf image 1
Overall service 0.2986773703 1
Sales image 0.7882245444 0.2408081806 1
Prod quality 0.1999811099 -0.055161302 0.1772939221 1
Size of firm 0.0377153969 -0.219555334 -0.042581815 0.684082777

Correlations with Usage in descending order


of absolute value: Overall service 0.7013895004
Delivery speed 0.6763544396
Price flexibility 0.5580497552
Size of firm -0.365005592
Sales image 0.2571190609
Manuf image 0.2251849989
Prod quality -0.192290179
Price level 0.0829177168

23. Construct a simple linear regression model of Usage level as a function of Overall service and interpret
the results.

y=Usage level, x=Overall service


70
60
f(x) = 8.3927594748221 x + 21.6327133714188
50 R² = 0.491947231244597
40
30
20
10
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Usage level is positively correlated with Overall service, slope = 8.39, intercept = 21.62, and R

24. Construct a simple linear regression model of Usage level as a function of Delivery Speed and interpret
the results.

y=Usage level, x=Delivery s


70 peed
60
50 f(x) = 4.60358161418747 x + 29.924410626131
R² = 0.457455327960286
40
30
20
10
0
0 1 2 3 4 5 6 7
Usage level is positively correlated with Delivery speed, slope = 4.60, intercept = 29.92, and R

25. Construct a multiple regression model with Usage Level as the dependent variable, and Delivery speed
and Overall Service as independent variables. Interpret the results.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.7678598462
R Square 0.5896087434
Adjusted R Squ 0.581147068
Standard Error 5.8178850418
Observations 100

ANOVA
df SS MS F
Regression 2 4717.0211231 2358.5105616 69.679905696
Residual 97 3283.2352769 33.84778636
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept 20.615721809 2.3525081392 8.763294573 6.251575E-14
Delivery speed 2.6893210192 0.5597500056 4.804503783 5.652382E-06
Overall service 5.4997650233 0.9840521656 5.5888958082 2.095998E-07

Usage Level = 20.62 + 2.69 Delivery speed + 5.50 Overall service

Both Delivery speed and Overall service are significant (P-values 5.65E-06 and
2.10E-07), R2 = .5896 and Adjusted R2 = .5811.

26. Compare the models developed in Problems 23-25.

Independent Variable(s) R2 Adjusted R2


Overall service 0.4919472312 0.4867630193
Delivery speed 0.457455328 0.4519191578
Delivery speed and Overall service 0.5896087434 0.581147068

The multiple regression model with Delivery speed and Overall service explains the most variation in
Usage level (R2 = .59, followed by the simple regression equation with Overall service (R2 = .49).

27. Develop a multiple regression model of Usage Level as a function of the first seven independent variables
and interpret the results
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8803126254
R Square 0.7749503185
Adjusted R Squ0.7578269731
Standard Error 4.4238178904
Observations 100

ANOVA
df SS MS F
Regression 7 6199.801245 885.68589215 45.256946197
Residual 92 1800.455155 19.570164728
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept -10.16147769 4.9769645953 -2.041701824 0.0440445783
Delivery speed -0.043518892 2.0127296247 -0.021621827 0.9827964479
Price level -0.678907252 2.0902483366 -0.324797413 0.7460718055
Price flexibility 3.3619678619 0.4112489451 8.1750188094 1.555692E-12
Manuf image -0.041005558 0.6668294851 -0.061493319 0.9510997299
Overall service 8.3453720194 3.9182982016 2.1298460683 0.0358536302
Sales image 1.2914708389 0.9472025934 1.3634578788 0.1760654661
Prod quality 0.5629510281 0.3554441042 1.5837962185 0.1166717736

Only Price flexibility and Overall service are significant (P-values 1.5557E-12 and
.0359), R2 = .7750 and Adjusted R2 = .7578.

Although R2 has increased over the two variable model, there are too many
non-significant variables included in this model.

It is interesting to note that Delivery speed is no longer significant (P-value = .9828),


probably because it has a fairly high correlation with both Price flexibility and Overall
service.

28. Use best subsets and stepwise regression to find good models for Usage Level using the first seven
independent variables. What is your recommendation?

Best subsets results, "Models to Consider":

Model Cp k R Square Adj. R Square


X1X2X3X4X5X6X7 8 8 0.7749503185 0.7578269731
X1X2X3X5X6X7 6.0037814282 7 0.7749410684 0.7604211373
X1X3X4X5X6X7 6.1054933593 7 0.7746922614 0.7601562783
X1X3X4X5X7 5.9287167941 6 0.7702323065 0.7580106207
X1X3X5X6 4.5190677838 5 0.7687881945 0.7590529606
X1X3X5X6X7 4.1058242163 6 0.7746914521 0.7627069549
X2X3X4X5X6X7 6.0004675034 7 0.7749491749 0.7604297668
X2X3X4X5X7 5.862661768 6 0.7703938898 0.7581807988
X2X3X5X6 4.5084262239 5 0.7688142258 0.759080088
X2X3X5X6X7 4.0039408256 6 0.7749406785 0.7629694379
X3X4X5X6 4.6693407144 5 0.7684205981 0.7586698864
X3X4X5X6X7 5.0960762184 6 0.7722691054 0.7601557599
X3X5X6 2.6697552106 4 0.7684195841 0.7611826961
X3X5X6X7 3.1040618452 5 0.772249571 0.7626600793

Stepwise Analysis
Table of Results for General Stepwise

Overall service entered.

df SS MS F
Regression 1 3935.7039852 3935.7039852 94.893348933
Residual 98 4064.5524148 41.475024641
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept 21.632713371 2.5935470229 8.340975961 4.72503E-13
Overall service 8.3927594748 0.8615627029 9.7413217241 4.40976E-16

Price flexibility entered.

df SS MS F
Regression 2 6036.7218583 3018.3609292 149.10917222
Residual 97 1963.5345417 20.242624141
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept -3.460359569 3.0577071412 -1.131684432 0.2605571368
Overall service 7.9833469185 0.6032436841 13.234033159 1.869684E-23
Price flexibility 3.3299873264 0.3268595006 10.1878248 5.271925E-17

Salesforce image entered.

df SS MS F
Regression 3 6147.5536958 2049.1845653 106.18094194
Residual 96 1852.7027042 19.298986502
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept -6.514097047 3.2461566384 -2.006710634 0.0475930303
Overall service 7.6285400735 0.6073382787 12.560611344 5.574379E-22
Price flexibility 3.369812246 0.3195824557 10.544421904 1.011392E-17
Salesforce ima 1.4161261997 0.5909312296 2.3964314774 0.0184930132

No other variables could be entered into the model. Stepwise ends.


Stepwise regression has probably found the best model with Overall Service (X5), Price flexibility (X3),
and Salesforce Image (X6) as the independent variables. This model also has the smallest value for Cp
in the best subsets regression.

Independent Variable(s) R2
Price flexibility (X3), Overall Service (X5), Salesforce image(X6) 0.7684

29. Include the categorical variable Size of Firm (coded as 0 for small firms, and 1 for large firms) in
identifying the best model for predicting Usage Level. Be sure to investigate possible interactions.

Stepwise Analysis
Table of Results for General Stepwise

Overall service entered.

df SS MS F
Regression 1 3935.7039852 3935.7039852 94.893348933
Residual 98 4064.5524148 41.475024641
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept 21.632713371 2.5935470229 8.340975961 4.72503E-13
Overall service 8.3927594748 0.8615627029 9.7413217241 4.40976E-16

Price flexibility entered.

df SS MS F
Regression 2 6036.7218583 3018.3609292 149.10917222
Residual 97 1963.5345417 20.242624141
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept -3.460359569 3.0577071412 -1.131684432 0.2605571368
Overall service 7.9833469185 0.6032436841 13.234033159 1.869684E-23
Price flexibility 3.3299873264 0.3268595006 10.1878248 5.271925E-17

Size of firm entered.

df SS MS F
Regression 3 6222.8841647 2074.2947216 112.03747269
Residual 96 1777.3722353 18.514294118
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept -12.89981365 4.1728693743 -3.091353332 0.0026079906
Overall service 8.419202306 0.5930645952 14.196096639 2.580978E-25
Price flexibility 4.1747048072 0.4107059132 10.164705871 6.588339E-17
Size of firm 3.7507499343 1.182839487 3.1709711888 0.0020398616
Salesforce image entered.

df SS MS F
Regression 4 6342.6338284 1585.6584571 90.875664943
Residual 95 1657.6225716 17.448658649
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept -16.3032882 4.2542268473 -3.832256432 0.0002279877
Overall service 8.0607496504 0.5917796447 13.621201275 4.589671E-24
Price flexibility 4.2365069075 0.3994084777 10.606952891 8.393229E-18
Size of firm 3.8412741522 1.1488142294 3.3436860842 0.0011841986
Salesforce ima 1.4726630496 0.5621435893 2.6197275531 0.0102448255

Delivery speed entered.

df SS MS F
Regression 5 6414.1504943 1282.8300989 76.026467625
Residual 94 1586.1059057 16.873467082
Total 99 8000.2564

Coefficients Standard Error t Stat P-value


Intercept -15.93116544 4.1874223482 -3.804527969 0.0002526877
Overall service 7.0164597618 0.7719834403 9.0888734074 1.591383E-14
Price flexibility 3.9834192684 0.4115591612 9.6788497108 8.851698E-16
Size of firm 4.9335675196 1.2481050706 3.9528463073 0.0001494803
Salesforce ima 1.5838160087 0.5554307905 2.8515091993 0.0053493037
Delivery speed 1.1202732293 0.5441548351 2.0587398238 0.0422859171

No other variables could be entered into the model. Stepwise ends.

Independent Variable(s) R2
Delivery speed (X1), Price flexibility (X3), Overall Service (X5), 0.8017
Salesforce image(X6), Size of firm (X8)

It is interesting to note that Delivery speed re-enters as a significant variable in this model, there is
something about firm size that enhances Delivery speed in explaining variance in Usage level.

30. Segregate the HATCO data by firm size. Run separate regressions on the data for small firms and the
data for large firms. Compare your results with Problem 29.

Regression model for small firms, X8 = 0

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8792684242
R Square 0.7731129619
Adjusted R Squ0.7566120864
Standard Error 4.339904019
Observations 60

ANOVA
df SS MS F
Regression 4 3529.8496541 882.46241354 46.85284498
Residual 55 1035.9121792 18.834766894
Total 59 4565.7618333

Coefficients Standard Error t Stat P-value


Intercept -19.85031901 5.5005150656 -3.608810951 0.0006649582
Delivery speed 1.9353531914 0.6904001498 2.8032340259 0.0069751459
Price flexibility 3.9377119596 0.5329367651 7.3887039094 8.70334E-10
Overall service 7.6544705644 1.1733160666 6.5237925077 2.267652E-08
Salesforce ima 1.1940018748 0.6697457186 1.7827689548 0.0801431289

Salesforce image is no longer a significant variable for small firms - useful information.

R2 = .77, Adjusted R2 = .76

Regression model for large firms, X8 = 1

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.9310092849
R Square 0.8667782886
Adjusted R Squ0.8515529502
Standard Error 3.0026304776
Observations 40

ANOVA
df SS MS F
Regression 4 2053.0751075 513.26877688 56.929984963
Residual 35 315.55264247 9.0157897848
Total 39 2368.62775

Coefficients Standard Error t Stat P-value


Intercept -8.904844702 4.9773625805 -1.789068921 0.0822614063
Delivery speed -1.499068829 0.7747798068 -1.934832085 0.061125499
Price flexibility 3.6237241431 0.5612507088 6.456515932 1.942627E-07
Overall service 8.1379769175 0.8917982138 9.1253568253 8.788922E-11
Salesforce ima 3.053343223 0.8712974512 3.5043637724 0.0012733453

Delivery speed is not significant variable for large firms - also useful information.

R2 = .86, Adjusted R2 = .85 - this model explains more variation in Usage level than the
model for small firms.
Purchase Outcome Respondent Perceptions
X9 X1 X2 X3 X4 X5 X6
Usage level
Delivery speedPrice level
Price flexibility
Manuf image
Overall service
Sales image
32 4.1 0.6 6.9 4.7 2.4 2.3
ultivariate Analysis, 5th 43 1.8 3 6.3 6.6 2.5 4
ness (Usage Level) 48 3.4 5.2 5.7 6 4.3 2.7
plier, HATCO. The 32 2.7 1 7.1 5.9 1.8 2.3
58 6 0.9 9.6 7.8 3.4 4.6
45 1.9 3.3 7.9 4.8 2.6 1.9
ce an order is confirmed 46 4.6 2.4 9.5 6.6 3.5 4.5
44 1.3 4.2 6.2 5.1 2.8 2.2
to negotiate price on all 63 5.5 1.6 9.4 4.7 3.5 3
54 4 3.5 6.5 6 3.7 3.2
32 2.4 1.6 8.8 4.8 2 2.8
a satisfactory 47 3.9 2.2 9.1 4.6 3 2.5
39 2.8 1.4 8.1 3.8 2.1 1.4
38 3.7 1.5 8.6 5.7 2.7 3.7
54 4.7 1.3 9.9 6.7 3 2.6
49 3.4 2 9.7 4.7 2.7 1.7
38 3.2 4.1 5.7 5.1 3.6 2.9
, where a 10-centimeter 40 4.9 1.8 7.7 4.3 3.4 1.5
ndicated their 54 5.3 1.4 9.7 6.1 3.3 3.9
The result was a scale 55 4.7 1.3 9.9 6.7 3 2.6
41 3.3 0.9 8.6 4 2.1 1.8
35 3.4 0.4 8.3 2.5 1.2 1.7
55 3 4 9.1 7.1 3.5 3.4
e explain Usage Level. 36 2.4 1.5 6.7 4.8 1.9 2.5
49 5.1 1.4 8.7 4.8 3.3 2.6
49 4.6 2.1 7.9 5.8 3.4 2.8
36 2.4 1.5 6.6 4.8 1.9 2.5
54 5.2 1.3 9.7 6.1 3.2 3.9
49 3.5 2.8 9.9 3.5 3.1 1.7
46 4.1 3.7 5.9 5.5 3.9 3
43 3 3.2 6 5.3 3.1 3
53 2.8 3.8 8.9 6.9 3.3 3.2
60 5.2 2 9.3 5.9 3.7 2.4
47.3 3.4 3.7 6.4 5.7 3.5 3.4
35 2.4 1 7.7 3.4 1.7 1.1
39 1.8 3.3 7.5 4.5 2.5 2.4
44 3.6 4 5.8 5.8 3.7 2.5
46 4 0.9 9.1 5.4 2.4 2.6
29 0 2.1 6.9 5.4 1.1 2.6
28 2.4 2 6.4 4.5 2.1 2.2
40 1.9 3.4 7.6 4.6 2.6 2.5
58 5.9 0.9 9.6 7.8 3.4 4.6
53 4.9 2.3 9.3 4.5 3.6 1.3
48 5 1.3 8.6 4.7 3.1 2.5
38 2 2.6 6.5 3.7 2.4 1.7
54 5 2.5 9.4 4.6 3.7 1.4
55 3.1 1.9 10 4.5 2.6 3.2
43 3.4 3.9 5.6 5.6 3.6 2.3
57 5.8 0.2 8.8 4.5 3 2.4
53 5.4 2.1 8 3 3.8 1.4
41 3.7 0.7 8.2 6 2.1 2.5
53 2.6 4.8 8.2 5 3.6 2.5
50 4.5 4.1 6.3 5.9 4.3 3.4
32 2.8 2.4 6.7 4.9 2.5 2.6
39 3.8 0.8 8.7 2.9 1.6 2.1
47 2.9 2.6 7.7 7 2.8 3.6
62 4.9 4.4 7.4 6.9 4.6 4
65 5.4 2.5 9.6 5.5 4 3
46 4.3 1.8 7.6 5.4 3.1 2.5
50 2.3 4.5 8 4.7 3.3 2.2
54 3.1 1.9 9.9 4.5 2.6 3.1
l service and interpret 60 5.1 1.9 9.2 5.8 3.6 2.3
47 4.1 1.1 9.3 5.5 2.5 2.7
36 3 3.8 5.5 4.9 3.4 2.6
40 1.1 2 7.2 4.7 1.6 3.2
45 3.7 1.4 9 4.5 2.6 2.3
59 4.2 2.5 9.2 6.2 3.3 3.9
46 1.6 4.5 6.4 5.3 3 2.5
58 5.3 1.7 8.5 3.7 3.5 1.9
49 2.3 3.7 8.3 5.2 3 2.3
50 3.6 5.4 5.9 6.2 4.5 2.9
55 5.6 2.2 8.2 3.1 4 1.6
51 3.6 2.2 9.9 4.8 2.9 1.9
60 5.2 1.3 9.1 4.5 3.3 2.7
41 3 2 6.6 6.6 2.4 2.7
49 4.2 2.4 9.4 4.9 3.2 2.7
42 3.8 0.8 8.3 6.1 2.2 2.6
47 3.3 2.6 9.7 3.3 2.9 1.5
39 1 1.9 7.1 4.5 1.5 3.1
56 4.5 1.6 8.7 4.6 3.1 2.1
21.62, and R 2 = .49. 59 5.5 1.8 8.7 3.8 3.6 2.1
47.3 3.4 4.6 5.5 8.2 4 4.4
41 1.6 2.8 6.1 6.4 2.3 3.8
37 2.3 3.7 7.6 5 3 2.5
53 2.6 3 8.5 6 2.8 2.8
ry Speed and interpret 43 2.5 3.1 7 4.2 2.8 2.2
51 2.4 2.9 8.4 5.9 2.7 2.7
36 2.1 3.5 7.4 4.8 2.8 2.3
34 2.9 1.2 7.3 6.1 2 2.5
60 4.3 2.5 9.3 6.3 3.4 4
49 3 2.8 7.8 7.1 3 3.8
39 4.8 1.7 7.6 4.2 3.3 1.4
43 3.1 4.2 5.1 7.8 3.6 4
36 1.9 2.7 5 4.9 2.2 2.5
31 4 0.5 6.7 4.5 2.2 2.1
25 0.6 1.6 6.4 5 0.7 2.1
60 6.1 0.5 9.2 4.8 3.3 2.8
38 2 2.8 5.2 5 2.4 2.7
42 3.1 2.2 6.7 6.8 2.6 2.9
33 2.5 1.8 9 5 2.2 3
29.92, and R 2 = .46.

e, and Delivery speed

Significance F
1.738232E-19

5.65E-06 and

he most variation in
rvice (R2 = .49).

en independent variables
Significance F
4.078136E-27

1.5557E-12 and

(P-value = .9828),
exibility and Overall

sing the first seven

Std. Error
4.4238178904
4.4000600657
4.402491569
4.422140211
4.4126060166
4.3790193115
4.3999808212
4.4205850106
4.4123576099
4.3765967026
4.4161123561
4.4024963266
4.3930611767
4.3794518128

Significance F
4.40976E-16

Lower 95%
16.48590381
6.6830163631

Significance F
2.58125E-30

Lower 95%
-9.529058433
6.7860758617
2.6812620599

Significance F
2.219534E-30

Lower 95%
-12.95767197
6.4229821968
2.7354456003
0.2431360829
, Price flexibility (X3),
e smallest value for Cp

Adjusted R2
0.7612

large firms) in
sible interactions.

Significance F
4.40976E-16

Lower 95%
16.48590381
6.6830163631

Significance F
2.58125E-30

Lower 95%
-9.529058433
6.7860758617
2.6812620599

Significance F
3.044562E-31

Lower 95%
-21.18290022
7.2419774889
3.3594593896
1.4028303145
Significance F
1.304291E-31

Lower 95%
-24.74899027
6.885919358
3.4435813939
1.5605911829
0.3566677237

Significance F
1.701095E-31

Lower 95%
-24.24538644
5.4836693247
3.1662593222
2.4554269057
0.4809957153
0.0398415984

Adjusted R2
0.7912

is model, there is
Usage level.

or small firms and the


Significance F
4.28848E-17

- useful information.

Significance F
7.739904E-15

Lower 95%
-19.00944028
-3.07195738
2.4843232373
6.3275280811
1.2845131983

ful information.

Usage level than the


Purchaser Characteristic SUMMARY OUTPUT
X7 X8
Prod qualitySize of firm Regression Statistics
5.2 0 Multiple R 0.880313
8.4 1 R Square 0.77495
8.2 1 Adjusted R 0.757827
7.8 1 Standard E 4.423818
4.5 0 Observatio 100
9.7 1
7.6 0 ANOVA
6.9 1 df SS MS F
7.6 0 Regressio 7 6199.801 885.6859 45.25695
8.7 1 Residual 92 1800.455 19.57016
5.8 0 Total 99 8000.256
8.3 0
6.6 1 Coefficients
Standard Error t Stat P-value
6.7 0 Intercept -10.1615 4.976965 -2.0417 0.044045
6.8 0 Delivery s -0.04352 2.01273 -0.02162 0.982796
4.8 0 Price level -0.67891 2.090248 -0.3248 0.746072
6.2 0 Price flexibi3.361968 0.411249 8.175019 1.56E-12
5.9 0 Manuf ima -0.04101 0.666829 -0.06149 0.9511
6.8 0 Overall ser 8.345372 3.918298 2.129846 0.035854
6.8 0 Sales ima 1.291471 0.947203 1.363458 0.176065
6.3 0 Prod qualit 0.562951 0.355444 1.583796 0.116672
5.2 0
8.4 0
7.2 1
3.8 0
4.7 0
7.2 1
6.7 0
5.4 0
8.4 1
8 1
8.2 0
4.6 0
8.4 1
6.2 1
7.6 1
9.3 1
7.3 0
8.9 1
8.8 1
7.7 1
4.5 0
6.2 0
3.7 0
8.5 1
6.3 0
3.8 0
9.1 1
6.7 0
5.2 0
5.2 0
9 1
8.8 1
9.2 1
5.6 0
7.7 0
9.6 1
7.7 0
4.4 0
8.7 1
3.8 0
4.5 0
7.4 0
6 0
10 1
6.8 0
7.3 0
7.1 1
4.8 0
9.1 1
8.4 1
5.3 0
4.9 0
7.3 0
8.2 1
8.5 0
5.3 0
5.2 0
9.9 1
6.8 0
4.9 0
6.3 0
8.2 1
7.4 0
6.8 1
9 1
6.7 1
7.2 0
8 1
7.4 0
7.9 0
5.8 0
5.9 0
8.2 1
5 0
8.4 1
7.1 0
8.4 1
8.4 1
6 0
Significance F
4.08E-27

Lower 95%Upper 95%


Lower 95.0%
Upper 95.0%
-20.0462 -0.2768 -20.0462 -0.2768
-4.04097 3.953936 -4.04097 3.953936
-4.83032 3.472506 -4.83032 3.472506
2.545192 4.178744 2.545192 4.178744
-1.36539 1.283375 -1.36539 1.283375
0.563294 16.12745 0.563294 16.12745
-0.58976 3.172697 -0.58976 3.172697
-0.14299 1.268894 -0.14299 1.268894
Chapter 5: Regression Analysis

31. (From Horngren, Foster, and Datar, Cost Accounting: A Managerial Emphasis, 9th ed., 1997, Prentice
Hall, 371.) The managing director of a consulting group has the following monthly data on total overhead
costs and professional labor-hours to bill to clients.

Total Overhead Costs Billable Hours


$340,000 3,000
$400,000 4,000
$435,000 5,000
$477,000 6,000
$529,000 7,000
$587,000 8,000

Generate a regression model to identify the fixed overhead costs to the consulting group.
a. What is the constant component of the consultant group's overhead?
b. If a special job requiring 1,000 billable hours that would contribute a margin of $38,000
before overhead was available, would the job be attractive?

y=Total Overhead Costs, x=Billable Hours


$700,000
$600,000
$500,000 f(x) = 47.5428571428571 x + 199847.619047619
R² = 0.994027235957207
$400,000
$300,000
$200,000
$100,000
$0
2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000

a. The constant portion (intercept) is $199,848.

b. The average margin contribution (the slope of the regression equation) is 47.543 per hour,
so 1,000 hours should generate 47.543 x 1,000 = $47,543. In comparison, a margin
contribution of $38,000 is probably not attractive.
h ed., 1997, Prentice
data on total overhead

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.997009
R Square 0.994027
Adjusted R Squ 0.992534
Standard Error 7708.375
Observations 6

ANOVA
e a margin of $38,000 df SS MS F Significance F
Regression 1 3.96E+10 3.96E+10 665.7067 1.34E-05
Residual 4 2.38E+08 59419048
Total 5 3.98E+10

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%
Lower 95.0%
Upper 95.0%
Intercept 199847.6 10611.94 18.83234 4.68E-05 170384.1 229311.1 170384.1 229311.1
X Variable 1 47.54286 1.842654 25.80129 1.34E-05 42.42682 52.6589 42.42682 52.6589
9047619

000 9,000

ation) is 47.543 per hour,


mparison, a margin
Upper 95.0%
Chapter 5: Regression Analysis

32. (From Horngren, Foster, and Datar, Cost Accounting: A Managerial Emphasis, 9th ed., 1997, Prentice
Hall, 349.) Cost functions are often nonlinear with volume, as production facilities are often able to
produce larger quantities at lower rates than smaller quantities. Using the following data, plot the data
and use the Chart, Add Trendline feature. Compare a linear trendline with a logarithmic trendline.
X Y
Units Produced Costs
500 $12,500
1,000 $25,000
1,500 $32,500
2,000 $40,000
2,500 $45,000
3,000 $50,000

Linear Trendline:

y=Costs, x=Units Produced


$60,000
$50,000
f(x) = 14.5714285714286 x + 8666.66666666666
$40,000 R² = 0.969316770186335

$30,000
$20,000
$10,000
$0
0 500 1000 1500 2000 2500 3000 3500

Logarithmic Trendline:

y=Costs, x=Units Produced


$60,000
$50,000
f(x) = 20818.6401679837 ln(x) − 118041.533712357
$40,000 R² = 0.992928564248576
$30,000
$20,000
$10,000
$0
0 500 1000 1500 2000 2500 3000 3500

Although the linear trendline is a pretty good fit (R2 = .96), the logarithmic trend line is a
better fit (R2 = .99).
h ed., 1997, Prentice
s are often able to
ng data, plot the data
ithmic trendline.
Chapter 5: Regression Analysis

33. (From Horngren, Foster, and Datar, Cost Accounting: A Managerial Emphasis, 9th ed., 1997, Prentice
Hall, 349.) The Helicopter Division of Aerospatiale is studying assembly costs at its Marseilles plant.
Past data indicates the following costs per helicopter: Use linear regression, and compare the results
with a second-order polynomial regression model. Using the model with the best fit, predict the hours
required for a ninth helicopter. (Hint: a model often used in such situation is Y = aXb).

Helicopter Number Labor-Hours


1 2,000
2 1,400
3 1,238
4 1,142
5 1,075
6 1,029
7 985
8 957

Linear Regression Model:

y=Labor-Hours, x=Helicopter Number


2,500

2,000

1,500 f(x) = − 119.880952380952 x + 1767.71428571429


R² = 0.72849044791147
1,000

500

0
0 1 2 3 4 5 6 7 8 9

Second-order Polynomial Regression Model:

y=Labor-Hours, x=Helicopter Number


2,500

2,000
f(x) = 30.9404761904762 x² − 398.345238095238 x + 2231.82142857143
1,500 R² = 0.922595782310216

1,000

500

0
0 1 2 3 4 5 6 7 8 9
Using the second-order polynomial regression model (higher R2),

y = 30.94 (9)2 - 398.35 (9) +2231.8 = 1,152

This is perhaps not a good estimate, illustrating the problem with extrapolating beyond the
range of the data.

Consider a Power Regression Model:

y=Labor-Hours, x=Helicopter Number


2,500

2,000
f(x) = 1875.93620431106 x^-0.340901256518966
1,500 R² = 0.9725757984856
1,000

500

0
0 1 2 3 4 5 6 7 8 9

Using this power regression model, y = 1875.9 (9) -0.3409 = 886

This is probably a better estimate for the time of the ninth helicopter.
h ed., 1997, Prentice
ts Marseilles plant.
compare the results
fit, predict the hours

571429

7 8 9

2231.82142857143

7 8 9
extrapolating beyond the

886.9689

18966

7 8 9
Chapter 5: Regression Analysis

34. For the data in problem 33, use an exponential regression model and estimate the hours required for the
ninth helicopter.

Helicopter Number Labor-Hours


1 2,000
2 1,400
3 1,238
4 1,142
5 1,075
6 1,029
7 985
8 957

Exponential Regression Model:

y=Labor-Hours, x=Helicopter Number


2,500

2,000

1,500 f(x) = 1787.39823252837 exp( − 0.089676338698403 x )


R² = 0.816159350017321
1,000

500

0
0 1 2 3 4 5 6 7 8 9

Using this power exponential model, y = 1787.4 e -0.0897(9)


= 797.2891
e hours required for the

38698403 x )

7 8 9
Chapter 5: Regression Analysis

35. (From Crask, Fox, and Stout, Marketing Research: Principles & Applications, Prentice Hall, 1995, 252.)
A real estate company hired a small market research firm to develop a model to calculate a ballpark
price for a home based only on square footage. The real estate company felt that this model would be
useful in helping customers set the list prices of their homes. The market research firm wants a linear
regression relating price as a function of square footage based on the following sample data.

List Price Square Footage


$75,900 1,750
$61,000 1,590
$110,000 2,100
$83,500 1,800
$94,600 1,890
$54,500 1,360
$96,000 2,050
$70,700 1,760
$50,800 1,500
$69,400 1,650
$87,500 1,700
$105,000 1,920
$76,500 1,800
$103,200 2,150
$59,000 1,600

Linear Regression Model:

y=List Price, x=Square Footage


$120,000
$100,000 f(x) = 79.4023654961185 x − 61072.7313004449
R² = 0.850126874741351
$80,000
$60,000
$40,000
$20,000
$0
1,000 1,200 1,400 1,600 1,800 2,000 2,200 2,400

Note that the model says the price per square foot is $79.402, i.e., about $80 per sq. ft.
ntice Hall, 1995, 252.)
calculate a ballpark
this model would be
h firm wants a linear

313004449

00 2,400

You might also like