CH 06

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 20

Instructor’s Manual Chapter 6 52

Chapter 6

I Chapter Outline

6.1 Prediction based on Simple Linear Regression


 The simple linear regression model is introduced, including the assumptions
of independence and normality of the “noise” in the model
 Definition of regression coefficients
 Definition of predicted, observed, and residual values
 Definition of the residual sum of squares
 Statement of the formulas to determine regression coefficients

6.2 Prediction based on Multiple Linear Regression


 Definition of a multiple linear regression model
 Extension of the concepts for simple linear regression to the multiple
regression models

6.3 Using Spreadsheet Software for Linear Regression


 Description of how to use spreadsheet software to organize data in a
regression model, to run the regression, and to view the output

6.4 Interpretation of Computer Output of a Linear Regression Model


 Interpretation and analysis of the following elements in a compute output:
o
Regression coefficients
o
Standard error
o
Degrees of freedom
o
Standard errors of the regression coefficients
o
Confidence intervals for the regression coefficients
o
t-statistics
o
Coefficient of determination R2

6.5 Sample Correlation and R2 in Simple Linear Regression

6.6 Validating the Regression Model


 Review of the main assumptions in linear regression:
o
Linearity
o
Normality of the noise
o
Heteroscedasticity
o
Autocorrelation
 Discussion on how to use spreadsheet software to check for autocorrelation

6.7 Warnings and Issues in Linear Regression Modeling


 Over-specification by the addition of too many independent variables
 Extrapolation beyond the range of the data

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 53

 Multicollinearity
 Checking for multicollinearity using spreadsheet software

6.8 Regression Modeling Techniques


 Discussion on advanced regression modeling techniques:
o Nonlinear relationships
o The use of “dummy” variables
o Stepwise multiple regression

6.9 Illustration of the Regression Modeling Process

6.10 Summary and Conclusions

II Teaching Tips

1. The correlation coefficient, as well as the determination coefficient, can be


illustrated by using the following experience in class. Perform a quick survey of
your students, including variables such as height, weight, age, number of siblings,
shirt size, gender, position among your siblings (first born, second born…) etc.
Run a correlation of the data and ask the students to interpret the correlation
coefficients from different pairs of variables. Students usually find very
interesting or surprising to discover correlation between these variables
concerning their own data, specially for pairings such as number of siblings and
position among your siblings, gender and shirt size, etc.

2. Continuing with the exercise explained in the previous item, you could also play
the following game with your students to illustrate regression. Run a regression
model trying to predict weight in terms of height and/or age. Then, ask for
volunteers from both genders to submit their height and/or age and you will guess
their weight. This is usually a very engaging experience and you can enhance it
by explaining why the predictions that are not very accurate are sometimes the
result of not having a good model. You can also use the example to explain the
need for prediction intervals to cope with the uncertainty in the estimation.

3. An alternative way to verify that a regression model is valid consists in comparing


the value of the significance F provided by the Excel summary output report. If
this number is below a given significance level, say 1%, then the model is valid at
such level. This is a more precise statement than just looking at the R 2 value.
Also, to test the significance of a coefficient, a similar test can be conducted by
using the P-values provided by the Excel summary output report.

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 54

III Answers to Chapter Exercises

6.1
(a) The corresponding multiple regression model is predicted price = b 0 + b1 x
area + b2 x neighborhood rating + b3 x general rating. This is the result from
running the regression:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.90
R Square 0.81
Adjusted R Square 0.77
Standard Error 49.07
Observations 20

ANOVA
  df SS MS F Significance F
Regression 3 163167.7802 54389.3 22.59 5.39018E-06
Residual 16 38525.96977 2407.9
Total 19 201693.75      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -166.69 65.24 -2.56 0.02 -305.00 -28.39
Area 0.09 0.01 7.16 0.00 0.06 0.11
Neighborhood rating 39.92 8.59 4.65 0.00 21.72 58.12
General rating 42.70 9.74 4.39 0.00 22.06 63.33

(b) Since the determination coefficient R2 is very close to one, the regression
model is acceptable in general. Also notice that the 95% confidence intervals
for each of the regression coefficients do not contain the value of zero, and so,
we are 95% confident that each of the coefficients is different from zero.
Therefore, this model is recommendable.
(c) Predicted price = -166.69 + 0.09 x area + 39.92 x neighborhood rating + 42.70
x general rating.
(d) Predicted price = -166.69 + 0.09 x 3,000 + 39.92 x 5 + 42.70 x 4 = $473,710.

6.2
(a) We use the following model: predicted taxes = b 0 + b1 x labor hours + b2 x
computer hours. After running the regression, we obtain the following output:

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 55

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.97
R Square 0.93
Adjusted R Square 0.91
Standard Error 1.07
Observations 10

ANOVA
  df SS MS F Significance F
Regression 2 112.38 56.19 49.02 0.00
Residual 7 8.02 1.15
Total 9 120.40      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -101.82 13.32 -7.64 0.00 -133.32 -70.32
Labor hours 2.56 0.30 8.45 0.00 1.85 3.28
Computer hours 1.10 0.31 3.51 0.01 0.36 1.84

(b) Using 10 – 2 – 1 = 7 degrees of freedom, we find that the corresponding 95%


t-student value is 2.365. Next, we notice that the absolute value of the t-stat of
each of the regression coefficients is greater than 2.365. Therefore, we
conclude that with a 95% confidence the coefficients are significant.
(c) There is not an apparent evidence of heteroscedasticity from looking at the
scatter-plots of the regression coefficients:

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 56

(d) The regression equation produced by the model is as follows: predicted taxes
= -101.82 + 2.56 x labor hours + 1.10 x computer hours. From examining the
regression coefficients associated with labor and computer time (2.56 > 1.10),
respectively, it is clear that increasing the field-audit time by one hour would
have a bigger impact on uncovering unpaid taxes than increasing the computer
time by one hour.

6.3
(a) We use the following model: predicted taxes = b0 + b1 x gross income + b2 x
schedule A + b3 x schedule C income + b4 x schedule C % + b5 x home office.
After running the regression model, we obtain the following summary report:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.94
R Square 0.88
Adjusted R Square 0.84
Standard Error 3572.44
Observations 24

ANOVA
df SS MS F Significance F
Regression 5 1653944963.8 330788992.7 25.92 0.00
9 8
Residual 18 229721457.07 12762303.17
Total 23 1883666420.9
6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -8423.69 6237.38 -1.35 0.19 -21527.94 4680.56
Gross Income 0.29 0.03 10.04 0.00 0.23 0.35
Schedule A -0.01 0.16 -0.07 0.94 -0.35 0.33
Deductions

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 57

Schedule C Income 0.19 0.17 1.12 0.28 -0.16 0.54


Schedule C % 104.66 43.09 2.43 0.03 14.13 195.20
Home Office -3786.20 1826.50 -2.07 0.05 -7623.53 51.13

Since the R2 value is so close to one, we can safely say that the model is valid.
However, a few of the coefficients are not statistically significant. In
particular, the 95% intervals for the coefficients corresponding to the variables
Schedule A deductions, Schedule C income, and home office contain the
value of zero, and so, they do not pass the t-student test.
(b) The revised model initially eliminates the three variables with no statistical
significance from the previous model, that is, Schedule A deduction, Schedule
C income, and home office. After computing the corresponding regression, we
discover that the coefficient corresponding to the variable Schedule C % is not
statistically significant. After we also eliminate this variable, we obtain a
model where the intercept is not statistically significant either. The final
model is predicted taxes = 0.31 x gross income. The R 2 value for this model is
0.82, which is close enough to 1 to validate the model. The coefficient
associated with gross income also passes the t-student test.
(c) By looking at the residual plot as shown below, there are no signs of
heteroscedasticity in the model. A histogram of the residuals also shows that
the normality condition is approximately satisfied. A 95% confidence interval
of the coefficient associated with gross income is [0.24, 0.37].

(d) The prediction is 0.31 x $130,000 = $40,300 in taxes.

6.4

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 58

(a) We use the following model: predicted month number of next earthquake
= b0 + b1 x time since most recent earthquake + b 2 x time since second
most recent earthquake. After running the regression model, we obtain the
following report:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.73
R Square 0.53
Adjusted R Square 0.42
Standard Error 16.73
Observations 12

ANOVA
df SS MS F Significance F
Regression 2 2797.75 1398.88 5.00 0.03
Residual 9 2519.16 279.91
Total 11 5316.92

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 166.92 39.16 4.26 0.00 78.33 255.50
Most recent 1.24 2.12 0.58 0.57 -3.56 6.03
Second most recent -10.08 3.19 -3.16 0.01 -17.30 -2.87

(b) The R2 value of this regression model is 0.53. Taking into consideration
the imperfect nature of earthquake prediction is a rather high value.
(c) By looking at the significance F value, this model is statistically valid at
the 5% and 10% level of significance. However, it is not valid at the 1%
level of significance. The value of R2 is close to 0.5, so that the model is
barely valid. The coefficient associated with the variable time from most
recent earthquake is not statistically significant since its corresponding
95% confidence interval contains the value of zero. The other coefficient
is significant. From the chart below, where we plot residuals versus time
(month of earthquake), there is not clear pattern, so that we conclude that
there is no evidence of auto-correlation in the model.

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 59

(d) In the revised model, we eliminate the variable corresponding to the time
since the most recent earthquake. The resulting regression formula is
Predicted month number of next earthquake = 170 - 9.7 x time since
second most recent earthquake. The R2 value is 0.51, but the significance
of F value is 0.009, indicating that the model is statistically valid. The
coefficients pass the t-student significance test and there is no evidence of
auto-correlation.

6.5
(a) We use the following regression model: Predicted number of defective
shafts = b0 + b1 x batch size. After running the regression on Excel, we get
the following summary report:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.98
R Square 0.95
Adjusted R Square 0.95
Standard Error 7.56
Observations 30

ANOVA
df SS MS F Significance F
Regression 1 32744.46 32744.46 572.90 0.00
Residual 28 1600.34 57.16
Total 29 34344.80

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 60

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -47.90 4.11 -11.65 0.00 -56.32 -39.48
Batch size 0.37 0.02 23.94 0.00 0.34 0.40

(b) The R2 value is 0.98, which is very close to one, and so, validating the
model. The t-statistics are also large, indicating that the coefficients are
statistically significant. The linear model is a good fit to the data, but is not
the best fit. By looking at the scatter plot of the two variables, as shown
below, there seems to be a quadratic relation between the two variables.

(c) The residual plot is shown below. There is evidence of heteroscedasticity


in that there is a convex (quadratic) dependency of the residuals on the
batch size.

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 61

(d) The revised model is predicted number of defective shafts = b0 + b1 x


batch size + b2 x batch size2. After running the regression, we obtain the
formula Predicted number of defective shafts = 6.9 - 0.12 x batch size +
0.0009 x batch size2. The R2 value is 0.995, the coefficients are all
statistically significant, and there is no indication of heteroscedasticity.

6.6
(a) The regression equation produced by Jack's regression model is predicted
sales = 13,707.14 + 37.34 x hops + 1,319.27 x malt + 0.05 x advertising -
63.17 x bitterness + 53.23 x investment.
(b) The degrees of freedom are 50 - 5 -1 = 44. Since this is more than 30, we
use the normal distribution to find a Z-factor of 1.96, corresponding to
95% confidence. The confidence intervals for each coefficient are shown
in the table below.

Coefficients Standard Error Lower 95% Upper 95%


Intercept -13707.139 1368.409 -16464.985 -10949.292
Hops 37.344 42.497 -48.303 122.992
Malt 1319.270 161.076 994.642 1643.897
Advertising 0.049 0.005 0.040 0.059
Bitterness -63.168 83.109 -230.662 104.327
Investment 53.230 133.142 -215.100 321.561

From the table it follows that the intervals corresponding to the variables hops,
bitterness, and initial investment contain zero, and so, those coefficients
are not statistically significant.
(c) Since the R2 value is 0.88, then we can safely say that the model is valid.
As indicated before, the coefficients for the variable hops, bitterness, and
initial investment are significant. We can eliminate those coefficients to
obtain a more valid model.

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 62

(d) The new regression model only has the independent variables malt and
annual advertising. The new regression equation is predicted sales =
-14,162 + 1,401.13 x malt + 0.05 x advertising. The R2 value for the new
model is 0.88 and all of the coefficients are significant.
(e) The predictions of the annual sales of each new beer are summarized in
the table below.

New Beer Malt Advertising Sales Forecast


Great Ale 8.0 150,000.0 4,547.04
Fine Brau 6.0 155,000.0 1,994.78
HBC Porter 8.0 180,000.0 6,047.04
HBC Stout 7.0 150,000.0 3,145.91

(f) Since the amounts given for malt and annual advertising are within the
range of the data used to create the model, I would recommend the
regression model to predict sales of the new beer Final Excalibur. The
actual sales forecast is -14,162 + 1,401.13 x 7 + 0.05 x 150,000 = 3,145.91
thousands of dollars.

6.7
(a) The degrees of freedom are 67 - 4 - 1 = 62.
(b) Since the degrees of freedom are more than 30, then we use the normal
distribution to find the Z-factor, which in this case is 1.96. The
corresponding 95% confidence intervals are shown below.

Coefficient Standard Error Lower limit Upper limit


Intercept -4.015 2.766 -9.43636 1.40636
Money Supply 0.368 0.064 0.24256 0.49344
Lending Rate 0.005 0.049 -0.09104 0.10104
Price Index 0.037 0.009 0.01936 0.05464
Exchange Rate 0.268 1.175 -2.035 2.571

Notice that the confidence intervals corresponding to the intercept, lending


rate, and exchange rate variables contain the value of zero, and so, they are
not statistically significant. The other coefficients, that is money supply
and price index, are relevant for predicting U.S. exports to Singapore.
(c) The third regression is the best overall for several reasons. First, it has the
same R2 value as the other two models, but only comprises two
independent variables, allowing for more simplicity. Second, the third
model eliminates the two variables that are not statistically significant, as
indicated in the previous item. Third, in the third model all of the
coefficients are statistically significant, even the intercept is significant (in
the first and second model, a 95% confidence interval for the intercept will
contain zero).
(d) Predicted U.S. exports to Singapore = -3.423 + 0.361 x 7.3 + 0.037 x 155
= $4.95 billions.

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 63

(e) I would look at a plot chart between the residuals as a function of time,
from January 1989 to July 1994. If in resulting chart there is an apparent
pattern, then there exists an auto-correlation problem in the model.

6.8
(a) We use the following model: predicted market value = b0 + b1 x total
assets + b2 x total sales + b3 x number of employees. The following is the
summary output from running the regression model:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8
R Square 0.6
Adjusted R Square 0.5
Standard Error 637.1
Observations 15

ANOVA
df SS MS F Significance F
Regression 3 6488609.4 2162869.8 5.3 0.0
Residual 11 4464258.2 405841.7
Total 14 10952867.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 452.5 242.0 1.9 0.1 -80.1 985.1
Assets 0.0 0.9 0.0 1.0 -1.9 1.9
Sales 0.5 0.7 0.6 0.5 -1.2 2.1
Employees 0.0 0.0 -0.6 0.6 -0.1 0.0

(b) There is evidence of multicollinearity in that none of the coefficients is


statistically significant (all of them fail the t-student significance test). To
detect multicollinearity, we use the correlation matrix:

Market value Assets Sales Employees


Market value 1
Assets 0.76 1
Sales 0.76 0.996 1
Employees 0.72 0.98 0.98 1

Notice that there is strong correlation between the three independent


variables. We can probably eliminate two independent variables.
(c) After experimenting with several combinations of independent variables,
we found that the model with the highest R2 (0.76) is predicted market
value = 493.4 + 0.26 x total sales.
(d) Predicted market value = 493.4 + 0.26 x 3,500 = $1,403.4.

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 64

6.9
(a) For given data x1, …, xn and y1, …, yn, let f(b0,b1) be the residual sum of
squares. Then the gradient of f is

It is not difficult to show that the Hessian of f is positive definite, and so


the minimum is achieved when f(b0,b1) = 0. By solving the equation
corresponding to the first coordinate in this system of equations, we get

And so, dividing by n,

By replacing this value of b0 into the equation corresponding to the second


coordinate in the system of equations, we get

Therefore, by equating the last expression to zero, we get the formula for
b1.
(b) It follows from the following argument:

(c) It follows from the following argument:

IV Answers to Chapter Cases

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 65

PREDICTING HEATING OIL CONSUMPTION AT OILPLUS

(a) Using the data in the file OILPLUS.XLS, we ran regression in Excel and obtained
the summary report shown below. Based on this report, we obtain predicted
heating oil consumption for next December = 109 - 1.24 x 35.2 = 65,352 gallons.
(A better model can be obtained by using regression to find a formula for oil
consumption in terms of temperature and temperature2).

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.83
R Square 0.69
Adjusted R Square 0.68
Standard Error 13.52
Observations 55

ANOVA
df SS MS F Significance F
Regression 1 21386.84 21386.84 117.02 5.01324E-15
Residual 53 9686.03 182.76
Total 54 31072.87

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 109.00 6.35 17.16 2.68072E-23 96.26 121.74
Temperature -1.24 0.11 -10.82 5.01324E-15 -1.46 -1.01

(b) The forecast based on regression accounts for any trend on the data due to
variations in temperature. It is clear that oil consumption depends on the
temperature, so that it is hard to accept that consumption is always going to be the
same (75.60) regardless of the temperature.

(c) The value of R2 is 0.69, which is close to 1, indicating that the model is
empirically valid, but not by much. Using the standard error we find a 95%
confidence interval for the temperature coefficient, that is [-1.46, -1.01]. Clearly,
since this interval does not contain zero, the coefficient corresponding to
temperature is significant. The scatter plot of the residuals shows more residuals
dispersion for lower temperatures and less dispersion for higher temperature
values, and so there might be a problem of heteroscedasticity.

(d) The R2 value is not very bad, but there may be a better model with a higher value.
I would recommend exploring other independent variables or trying nonlinear
models.

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 66

EXECUTIVE COMPENSATION

(a) The variables change in stock price and change in sales do not seem to be very
relevant to determine he compensation of a CEO, they may affect bonuses, stock
options, and indirect compensations, but the main portion of the salary is not
probably determined from changes in these two variables. On the other hand, the
model does not consider other variables such as the years of experience prior to
the current position outside and inside the company, education other than having
or not a MBA, knowledge or experience of the industry immediately related to the
company, the average CEO compensation in this industry, etc.

(b) After running the regression in Excel, we obtain the following summary output:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.87
R Square 0.75
Adjusted R Square 0.73
Standard Error 422.40
Observations 50

ANOVA
df SS MS F Significance F
Regression 4 23896133.23 5974033.31 33.48 5.79962E-13
Residual 45 8029082.79 178424.06
Total 49 31925216.02

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -1.59 155.91 -0.01 0.99 -315.62 312.43
Years in Position 190.59 33.51 5.69 0.00 123.09 258.09
Stock Change 1.59 2.67 0.59 0.56 -3.80 6.97
Sales Change 1.01 1.32 0.77 0.45 -1.64 3.66
MBA 304.72 139.87 2.18 0.03 23.01 586.43

(c) The R2 value is 0.75, indicating a good model. The coefficients corresponding to
the variables stock change, sales change, and the intercept are not statistically
significant. The following is the correlation matrix:

Compensation Years in Position Stock Change Sales Change MBA


Compensation 1.00
Years in Position 0.85 1.00
Stock Change 0.70 0.78 1.00
Sales Change 0.16 0.12 0.20 1.00
MBA 0.51 0.43 0.36 0.04 1.00

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 67

There is high correlation between the independent variables stock change and
years in position, indicating a possible multicollinearity problem.

(d) The best model we found has the variables years in current position and MBA as
independent variables. The R2 of this model is 0.74. Both variables are significant
at the 95%level. Since the correlation between these two variables is small (0.43),
we can safely say that there are not multicollinearity problems. Furthermore, the
residuals follow an approximate normal distribution, and there are not apparent
heteroscedasticity problems. Only the intercept is not statistically significant in
our model. To summarize, we propose the model predicted CEO compensation =
207.5 x years in current position + 307.2 x MBA.

(e) As mentioned before in item 1, there are other factors that are critical in
determining the compensation of a CEO. According to our model presented in the
previous item, having a MBA will represent 307.2 million dollars increment in the
CEO compensation. Therefore, we think that having a MBA has an effect on CEO
compensation.

THE CONSTRUCTION DEPARTMENT AT CROQ’PAIN

(a) After analyzing the summary output provided in this case, we notice that the
variables EMPL, total, P25, P35, P45, P55, COMP, NCOMP, and CLI are not
statistically significant. Furthermore, by looking at the correlation matrix
provided below, we notice high correlation between pairs of variables taken from
total, P15, P25, P35, P45, and P55, indicating multicollinearity problems. Also
notice that even though the variable PRICE is statistically significant, it has a very
small correlation with the variable EARN.

EARN SIZE EMPL total P15 P25 P35 P45 P55 INC COMP NCOMP NREST PRICE CLI
EARN 1.00
SIZE 0.44 1.00
EMPL -0.11 0.05 1.00
total 0.59 -0.02 -0.10 1.00
P15 0.63 -0.05 -0.10 0.96 1.00
P25 0.23 -0.08 -0.02 0.58 0.42 1.00
P35 0.63 -0.03 -0.12 0.96 0.98 0.43 1.00
P45 0.63 -0.02 -0.11 0.96 0.98 0.41 0.99 1.00
P55 0.40 0.06 -0.09 0.77 0.68 0.29 0.67 0.65 1.00
INC 0.46 0.18 0.09 0.11 0.15 0.02 0.14 0.14 0.01 1.00
COMP -0.14 -0.17 0.12 -0.14 -0.11 -0.01 -0.12 -0.13 -0.20 -0.08 1.00
NCOMP 0.11 -0.02 0.11 0.07 0.07 0.10 0.07 0.08 0.01 0.17 0.16 1.00
NREST 0.34 -0.10 -0.16 0.05 0.07 0.01 0.10 0.09 -0.02 -0.06 0.11 0.01 1.00
PRICE -0.18 0.07 0.08 0.04 -0.03 0.08 -0.01 -0.01 0.15 0.00 -0.30 -0.20 -0.06 1.00
CLI 0.04 0.05 0.14 0.21 0.21 0.09 0.20 0.23 0.15 0.10 0.02 -0.01 -0.29 0.26 1.00

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 68

Combining all of these ideas, we end up with a regression model with only four
independent variables: SIZE, P15, INC, and NREST. After running the regression
based on this model, we obtain the following summary report.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.91
R Square 0.83
Adjusted R Square 0.81
Standard Error 39.47
Observations 60

ANOVA
df SS MS F Significance F
Regression 4 406495.50 101623.88 65.25 3.12098E-20
Residual 55 85666.22 1557.57
Total 59 492161.72

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -399.01 50.63 -7.88 0.00 -500.47 -297.55
SIZE 0.75 0.10 7.73 0.00 0.56 0.95
P15 0.04 0.00 10.10 0.00 0.04 0.05
INC 8.81 1.62 5.44 0.00 5.57 12.06
NREST 1.45 0.23 6.34 0.00 0.99 1.91

Notice that the R2 of this model is 0.83, and all of the coefficients are significant.
There are no multicollinearity problems and no apparent heteroscedasticity
problems.

(b) By testing the model as indicated, we obtain the following result

STOR EARN K Ratio Predicted EARN Predicted Ratio Open?


51 216.3 776.0 27.87% 149.4 19.26% no
52 65.7 647.8 10.14% 135.5 20.91% no
53 67.6 689.8 9.80% 49.6 7.19% no
54 127.9 715.0 17.89% 108.2 15.13% no
55 82.9 650.1 12.76% 26.8 4.12% no
56 -2.9 788.4 -0.37% 70.0 8.88% no
57 247.7 782.0 31.68% 236.3 30.22% yes
58 343.0 1557.8 22.02% 269.9 17.33% no
59 193.1 935.6 20.64% 119.6 12.78% no
60 277.5 688.0 40.34% 253.2 36.81% yes

Using the target performance ratio of 26%, our model would recommend the
opening of two of the three stores that actually attained the target in 1994. Also,

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 69

our model would not recommend the opening of the stores that did not actually
attain the target performance ratio.

(c) The results of applying our model to the new stores is shown below:

STORE Predicted EARN K Predicted Ratio Open?


Calais 32.9 660.1 4.99% no
Montchanin 55.5 733.0 7.58% no
Aubusson 75.4 1050.3 7.18% no
Toulouse 352.8 836.0 42.20% yes
Torcy 4.5 783.6 0.58% no
Marseilles-1 85.0 924.8 9.19% no
Marseilles-2 14.6 1089.6 1.34% no
Clermont 44.3 737.7 6.00% no
Montpellier 114.2 584.0 19.55% no
Dijon 173.4 681.0 25.46% no

According to this, the recommendation would be to open only the store located in
Toulouse. Notice also that the store located in Dijon has a predicted performance
rate very close to the target, so that we also recommend opening the store in
Dijon.

(d) The relative strengths of our model are that it is a very simply model in the
number of independent variables required to predict operating earnings. It is
statistically sound, based on the data provided until 1994, and it might be
improved by adding the data from 1995. The major weakness of our regression
model is that it underestimates the real performance rate when this rate is close to
the target. It is also subject to the standard criticism for regression models. For
instance, the model could produce lousy predictions if used to extrapolate.

SLOAN INVESTORS, PART I

Our analysis in this case considers the separate regression models for the GM data
and IBM data using the 12 independent variables under consideration. The
summary output for the regression model of the GM data is:

SUMMARY OUTPUT

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 70

Regression Statistics
Multiple R 0.38
R Square 0.15
Adjusted R Square -0.03
Standard Error 0.08
Observations 72

ANOVA
df SS MS F Significance F
Regression 12 0.07 0.01 0.85 0.60
Residual 59 0.41 0.01
Total 71 0.48

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 0.09 0.13 0.75 0.46 -0.16 0.35
E/P 0.05 0.06 0.99 0.33 -0.06 0.16
ROE -0.07 0.03 -2.26 0.03 -0.12 -0.01
BV/P -0.01 0.04 -0.24 0.81 -0.09 0.07
CF/P 0.50 0.31 1.62 0.11 -0.12 1.11
1 Month 0.20 0.19 1.10 0.28 -0.17 0.58
2 Month -0.08 0.14 -0.58 0.56 -0.37 0.20
6 Month -0.09 0.08 -1.23 0.22 -0.24 0.06
12 Month -0.02 0.06 -0.26 0.80 -0.14 0.11
S&P -0.17 0.32 -0.54 0.59 -0.80 0.46
SD Returns -1.60 1.25 -1.28 0.21 -4.10 0.90
SD CF/P 0.58 1.36 0.43 0.67 -2.14 3.30
V/MC 0.03 0.03 0.98 0.33 -0.03 0.08

For the IBM data we obtain:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.66
R Square 0.43
Adjusted R Square 0.31
Standard Error 0.06
Observations 72

ANOVA
df SS MS F Significance F
Regression 12 0.18 0.02 3.70 0.00
Residual 59 0.24 0.00
Total 71 0.42

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 0.49 0.15 3.39 0.00 0.20 0.79

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.
Instructor’s Manual Chapter 6 71

E/P -0.48 0.14 -3.39 0.00 -0.76 -0.20


ROE 0.53 0.13 4.25 0.00 0.28 0.78
BV/P -0.45 0.18 -2.52 0.01 -0.81 -0.09
CF/P 0.37 0.47 0.80 0.43 -0.57 1.32
1 Month -0.03 0.14 -0.23 0.82 -0.32 0.26
2 Month -0.11 0.12 -0.91 0.37 -0.36 0.13
6 Month -0.44 0.10 -4.38 0.00 -0.64 -0.24
12 Month 0.17 0.10 1.71 0.09 -0.03 0.36
S&P -0.40 0.25 -1.61 0.11 -0.90 0.10
SD Returns -2.75 0.94 -2.93 0.00 -4.63 -0.87
SD CF/P -5.17 2.74 -1.89 0.06 -10.66 0.31
V/MC 0.06 0.02 2.32 0.02 0.01 0.10

We also take into consideration the correlation matrices for both sets of data (not
shown).

Based on our analysis, we conclude that the variables ROE, previous 6-month
return, S.D. of stock returns, and V/MC are statistically significant for at least one
of the two sets of data. The other variables are not significant. Furthermore, there
is high correlation between S.D. of stock returns and V/MC, so that to avoid
multicollinearity and since V/MC has a higher correlation with the dependent
variable returns, we eliminate S.D. of stock returns. Therefore, our final model
only has three independent variables, that is, ROE, previous 6-month return, and
V/MC.

To make the predictions, we use two regression equations, one for each set of
data. For the GM data we use the model predicted return = -0.01 - 0.02 x ROE -
0.06 x 6-month + 0.01 x V/MC. For the IBM data we use the model predicted
return = -0.04 + 0.36 x ROE - 0.06 x 6-month + 0.05 x V/MC.

The final return predictions for the two companies in the requested months are:

GM IBM
Date Return Date Return
960131 -0.53% 960131 4.78%
960228 0.01% 960228 7.38%
960331 0.27% 960331 3.68%
960428 -0.44% 960428 2.68%
960531 -1.09% 960531 4.05%
960630 -0.48% 960630 1.33%

Manual to accompany Data, Models & Decisions: The Fundamentals of Management Science by Bertsimas and Freund. Copyright
2000, South-Western College Publishing. Prepared by Manuel Nunez, Chapman University.

You might also like