Business Statistics Canadian 3rd Edition Sharpe Solutions Manual Download

Business Statistics Canadian
3rd Edition Sharpe

Full download at link:
Solution manual: https://testbankpack.com/p/solution-

manual-for-business-statistics-canadian-3rd-edition-by-
wright-sharpe-velleman-veaux-isbn-9780133899122-
0133899128/
Test Bank: https://testbankpack.com/p/test-bank-for-

business-statistics-canadian-3rd-edition-by-wright-sharpe-
velleman-veaux-isbn-9780133899122-0133899128/
Chapter 7 – Introduction to Linear Regression
SECTION EXERCISES
SECTION 7.1
1.
a) False. We choose the best fitting line through the data points on a scatterplot. This is done by
minimizing the sum of the squared errors. The predicted values (not the data values) fall on
the line.
b) True
c) False. Least squares means that the sum of all squared residuals (difference between the observed
y-values and the y-values predicted by the line) is minimized.
2.
a) True
b) False. Least squares means that the sum of all squared residuals (difference between the observed
y-values and the y-values predicted by the line) is minimized.
c) True
SECTION 7.2
Copyright © 2018 Pearson Canada Inc.

110
Chapter 7 Introduction to Linear Regression
3.
a) We would expect sales to be 2 x 0.965 = 1.93 SDs above the mean.
b) This corresponds to 17.6 + 1.93 x 5.34 = 27.906, or $27,906 in sales.
c) We would expect sales to be 0.965 SDs below the mean.
d) This corresponds to 17.6 - 0.965 x 5.34 = 12.447, or $12,447 in sales.
4.
a) b1 = 0.914
sy
To find the slope use b1 = r
sx
5.34
b1 = 0.965  = 0.914
5.64
b) The slope of 0.914 indicates that an additional 0.914 ($1000), or $914 in sales, is associated with
each additional salesperson working.
c) b0 = 8.09
To find the intercept use b0 = y − b1 x
b0 = 17.6 − (0.914)(10.4) = 8.09
d) The intercept of 8.09 implies that, on average, we expect sales of 8.09 ($1000), or $8090, with
zero salespeople working. This value is not meaningful in this context.
e) The equation that predicts Sales from Number of Salespeople Working is

yˆ = 8.09 + 0.914 x.
f) Using the equation to predict Sales if 18 people are working, we have

ŷ = 8.09 + 0.914(18) = 24.542 or 24.542($1000) = $24,542.
g) The value of the residual is 25 – 24.542 = 0.458 ($1000), or $458.
h) Because the actual sales value is more than the predicted value, the estimated regression equation
has underestimated sales when x = 18.
SECTION 7.3
5. The winners may be suffering from regression to the mean. Regression toward the mean implies that a
variable that yields an extreme value on the first measurement will tend to be closer to the mean on a later
measurement. So the winners of the “rookie junior executive of the year award” had an extreme
measurement their first year. This extremely good performance may have been due, in part, to just an
unusually lucky year. Their next year’s performance moved closer to the mean.
6. If the poor performance of mutual funds in the previous year is not the result of extreme values, then
regression to the mean won’t ensure better performance the next year. Although, on average, the
performance of funds will cluster around the mean, we can’t predict how any particular fund will do.
SECTIONS 7.4 and 7.5

7.
a) The units of the residuals are the same as the units of the response variable, Sales. Therefore the
units are thousands of dollars.

111
Part I Exploring and Collecting Data
b) The residual with the largest absolute value contributes the most to the sum being minimized by
the least squares criterion. In this case it is 2.77.
c) The residual with the smallest absolute value contributes the least to the sum being minimized by
the least squares criterion. In this case it is 0.07.
8.
a) The curved pattern in the residuals violates the Linearity Assumption.
b) The unusual residual value that is extremely different from the others violates the Outlier
Condition.
c) The increasing spread of the residual values resulting in a cone-shaped pattern violates the Equal
Spread Condition.
SECTIONS 7.6 and 7.7

9. R2 = 93.12%. R2 is the squared correlation, or .9652. It means that about 93% of the variance in Sales can be
accounted for by the regression model of Sales on Number of Salespeople Working.
10. R2 = 98.8%. R2 is the squared correlation, or .9942. It means that about 98.8% of the variance in the price of
disk drives can be accounted for by the regression model of Price on Capacity.
SECTION 7.8
11.
a) The original values are the squared resulting values: 16, 16, 36, 49, 49, 64, and 100.
b) The values are not symmetric but instead are skewed to the right (high end).
12. The average is $370,000, a value not representative of any of the three individual salaries since they are
very unequally spaced. The logarithm to the base 10 of the three individual salaries. The % would be 4, 5,
and 6, resulting in values that are equally spaced.
CHAPTER EXERCISES
13. Pizza sales and price, part 1.

a) The problem states that the weekly Sales (in pounds) of frozen pizza are being predicted from the
average Price/unit ($). The variable being predicted is the response variable and the variable doing
the predicting or explaining is the explanatory variable. In this case, the explanatory variable is the
unit Price (pizza Sales predicted from unit Price). In addition, the prediction equation is stated as:
·
Sales = 141,865.53 − 24,369.49* Price which is of the form yˆ = b0 + b1x.
Price is x, or the explanatory variable.
b) The problem states that the weekly Sales (in pounds) of frozen pizza are being predicted from the
average Price/unit ($). Whatever is being predicted is the response variable. Sales of frozen pizza
are being predicted, and therefore Sales is y , or the response variable. In addition, the prediction
·
equation is stated as Sales = 141,865.53 − 24,369.49* Price , which is of the form ŷ = b + b x .
0 1
Sales is y, or the response variable.
c) ·
The prediction equation is stated as: Sales = 141,865.53 − 24,369.49* Price , which is of the form:
ŷ = b0 + b1 x , where b1 is the slope. The slope for this equation is –$24,369.49, which means that
for every extra dollar increase in the price of pizza, weekly sales of frozen pizzas are predicted to
decrease by 24,369.49 kilos.

112
d) The y -intercept is the value of the line when the x -variable is zero. The intercept can be used as
a starting point for predictions, but it is not meaningful in all circumstances. In this equation,
·
Sales = 141,865.53 − 24,369.49* Price is of the form ŷ = b + b x , where b is the y -intercept.
0 1 0
The y -intercept for this equation is $141,865.53. This number is not meaningful except as a base
or starting value for the line because it is obviously not realistic to set the Price at zero dollars.
e) A prediction can be made by substituting the given value for the Price of pizza into the equation
and solving for weekly Sales. If the Price/unit of the pizza is $3.50, the equation is:
·
Sales = 141,865.53 − 24,369.49*(3.50) = 141,865.53 − 85,293.22 = 56,572.32 kg.
f) If a Price of $3.50 yields 60,000 pounds, the residual is calculated by subtracting the predicted
value (calculated from the linear model equation) from the observed or measured value
(Residual = Data – Predicted or e = y − yˆ ). The predicted value of y at an x -value of $3.50 was
calculated in part e) as 56,572.32 pounds. The Residual = 60,000 (Data) –56,572.32 (Predicted)
= 3,427.68 kg.
14. Used Honda prices 2012, part 1.

a) The problem states that the Price ($) of a used Honda are being predicted from its Mileage (in miles).
The variable being predicted is the response variable and the variable doing the predicting or explaining is
the explanatory variable. In this case, the explanatory variable is the Mileage (Price predicted from
· ice = 21, 253.58 − 0.11097 Mileage
Mileage). In addition, the prediction equation is stated as: Pr
which is of the form: ŷ = b0 + b1 x Mileage is x, or the explanatory variable.
b) The problem states that the Price ($) of a used Honda are being predicted from its Mileage (in miles).
The variable being predicted is the response variable and the variable doing the predicting or explaining is
the explanatory variable. Price of a used Honda is being predicted and, therefore, Price is y or the response
variable. In addition, the prediction equation is stated as: Pr· ice = 21, 253.58 − 0.11097 Mileage
which is of the form: ŷ = b0 + b1 x . Price is y, or the response variable.
c) The prediction equation is stated as: Pr · ice = 21, 253.58 − 0.11097 Mileage which is of the form:
ŷ = b0 + b1 x . where b1 is the slope. The slope for this equation is –0.11097, which means that for every
extra mile driven, the price of the 2012 Honda is predicted to decrease by $0.11097 (about $111 per extra
1000 miles driven).
d) The y intercept is the value of the line when the x variable is zero. The intercept can be used as a starting
point for predictions but it is not meaningful in all circumstances. In this equation,
· ice = 21, 253.58 − 0.11097 Mileage is of the form: ŷ = b + b x where b0 is the y intercept. The y
Pr 0 1
intercept for this equation is $21,253.58. This number represents the base or starting value for the line
which means the average Price for a used car with no miles on the odometer. This is not very realistic.
e) A prediction can be made by substituting the given value for Mileage into the equation and solving for
Price. If the mileage is 100,000 miles, the equation is:
· ice = 21, 253.58 − 0.11097 Mileage = 21, 253.58 − 0.11097  50, 000
Pr
= 21, 253.58 − 5548.5 = $15,705.08 .
f) If a car with 50,000 miles on it cost $14,000, the residual is calculated by subtracting the predicted value
(calculated from the linear model equation) from the observed or measured value (Residual = Data –
Predicted or e = y - y ). The predicted value of y at an x value of 50,000 was calculated in part e. to be
$15,705.08. The Residual = $14,000 (Data) –$15,705.08 (Predicted) = -$1705.08.
g) Yes, this would be a good deal since the predicted price was calculated to be $15,705.08.

113
Average Sales = 141,865.53 – 24,369.49*(Average Price)

$52,697 = $141,865.53 – 24,369.49*(Average Price)
Thus, Average Price = (52,697 – 141,865.53)/(–24,369.49) = $3.66
s   10, 261
Now, b1 = r *  y  or –24,369.49 = (–0.547)*  or
 sx   s x 
 ( −0.547) *(10, 261) 
sx = 
 −24,369.49  = $0.23
So the price at one SD above the average = $3.66 + $0.23 = $3.89.

At that price, the predicted Sales = 141,865.53 – 24,369.49*(3.89) = 47,068.21 kg.
16. Used Honda prices 2012, part 2.

Using the correlation = -0.889, 1 SD below mileage is 0.889 above price, so price is:
$19,843.5 + 0.889(1853.59) = $21,491.34
17. Sales by region. The model is meaningless because the variable Region is categorical, not quantitative.
Although each region is denoted by a number, the variable is still categorical. The slope makes no sense
because Region has no units. The boxplot comparisons are informative, but the regression is meaningless.
18. Salary by job type. The model is meaningless because the variable Job Type is categorical, not
quantitative. It has no units, so the linear model and slope make no sense. A bar chart of average salary for
the different job types would be a good display of the data.
19. GDP growth 2012, part 1.

a) The variables are both quantitative, the trend is positive and somewhat straight, there are a couple of
outliers that influence the fit (especially 2009), and the spread is roughly consistent although the spread is
large. We should be cautious in interpreting the model too strictly
b) About 31.6% of the variation in the growth rates of developing countries is accounted for by the
growth rates of developed countries.
c) Each point represents one of the years 1970–2011, which are the cases in the model.
20. European GDP growth 2012.

a) The variables are both quantitative (with units % of GDP), the trend is positive and reasonably straight,
there is one outlier in the lower left corner of the plot (2009) and the spread is roughly consistent.
b) About 44.9% of the variation in the growth rates of the 25 European countries is accounted for by the
growth rates of the United States.
21. GDP growth 2012, part 2

a) The output of regression analysis gives the coefficients of the linear model equation. The y intercept is given
as 3.38. The slope, which is given as the coefficient of the explanatory variable (Annual GDP Growth of
Developing Countries), is 0.468. Therefore, the linear model is:
·
Growth ( DevelopingCountries) = 3.38 + 0.468Growth( Developed Countries)
b) The y intercept is the value of the line when the x variable is zero. The intercept can be used as a starting
·
Growth ( DevelopingCountries) = 3.38 + 0.468Growth( Developed Countries)
ŷ = b0 + b1 x where b0 is the y intercept. The y intercept for this equation is 3.38. This number represents the
base or starting value for the line which means the Annual GDP Growth Rate of Developing Countries when
the Annual GDP Growth Rate of Developed Countries is zero percent. This value is 3.38% and the concept
makes sense.

114
c) The slope represents the change in y or the response variable for every x unit or one unit step in the predictor
variable. The slope for this equation is 0.468, which means that for every 1% increase in Annual GDP Growth
of Developed Countries, the Annual GDP Growth of Developing Countries increases by 0.468%.
d) A prediction can be made by substituting the given value of 4% for Annual GDP Growth of Developing
Countries into the equation and solving for Annual GDP Growth of Developing Countries. The equation is:
·
Growth ( DevelopingCountries) = 3.38 + 0.468Growth( Developed Countries) = 3.38 + 0.468 4
= 5.25%.
e) If developed countries experience a 2.65% growth while developing countries grew at a rate of 6.09%, the
predicted value of y at an x value of 2.65% is:
·
Growth ( DevelopingCountries) = 3.38 + 0.468  2.65 = 4.62%.
The predicted value using the linear model is less than the actual percentage. The actual value performed better
than expected.
f) The residual is calculated by subtracting the predicted value (calculated from the linear modelequation) from
the observed or measured value (Residual = Data – Predicted or e = e = y − yˆ ). The predicted value of 4.62%
is compared to the actual value of 6.09%. The Residual = 6.09% (Data) – 4.62%. (Predicted) = 1.47%.
22. European GDP growth 2012, part 2.

a) The output of regression analysis gives the coefficients of the linear model equation. The y intercept is given
as 0.693. The slope, which is given as the coefficient of the explanatory variable (Annual GDP Growth of
United States), is 0.534. Therefore, the linear model is:
(Growth 25EuropeanCountries) = 0.693+ 0.534Growth (United States)
b) The y intercept is the value of the line when the x variable is zero. The intercept can be used as a starting
(Growth 25EuropeanCountries) = 0.693+ 0.534Growth (United States) is of the form: yˆ = b0 + b1x where b0 is
the y intercept. The y intercept for this equation is 0.693. This number represents the base or starting value for
the line which means the Annual GDP Growth Rate of 25European Countries when the Annual GDP Growth
Rate of the United States is 0%. This value is 0.693% and the concept makes sense. There are years of no
growth and also negative growth.
c) The slope represents the change in y or the response variable for every x unit or one unit step in the predictor
variable. The slope for this equation is 0.534, which means that for every 1% increase in Annual GDP Growth
of US, the Annual GDP Growth of 25 European Countries increases by 0.534%.
d) When the US has 0% GDP Growth, the value of European GDP Growth calculated from the regression
equation is 0.693% which is the y intercept.
e) If the US experience a 3.0% growth while European countries grew at a rate of 1.78%, the predicted value of
y at an x value of 3.0% is:(Growth 25EuropeanCountries) = 0.693+ 0.5343.0 = 0.693+ 0.5343.0 = 2.295% or
2.30%. The predicted value using the linear model is lower than the actual percentage.
f) The residual is calculated by subtracting the predicted value (calculated from the linear model equation) from
the observed or measured value (Residual = Data – Predicted or e = y − y ). The predicted value of y at an x
value of 1.78% was calculated in e) as 2.30%. The Residual = 1.78% (Data) – 2.3%. (Predicted) = –0.52%.
23. Mutual funds.

a. The y -intercept is the value of the line when the x -variable is zero. The intercept can be used
as a starting point for predictions, but it is not meaningful in all circumstances. In this equation,
·
Flow = 9747 − 771Return is of the form ŷ = b + b x , where b is the y -intercept. The
0 1 0
y -intercept for this equation is 9747 ($M). This represents the value of money flowing into mutual
funds when mutual fund performance return is zero.
b. The slope represents the change in y , or the response variable, for every x -unit or one-unit step in the
predictor variable. The slope for this equation is 771, which means that for every 1% increase in
mutual fund Return, the Flow into mutual funds increases by 771 ($M).

115
c. The predicted fund Flow for a month that had a market Return of 0% is the y -intercept, which has a
value of 9747 ($M).
d. If the recorded fund Flow was $5 billion during a month when the Return was 0%, the residual is
calculated by subtracting the predicted value (calculated from the linear model equation) from the
observed or measured value (Residual = Data – Predicted or e = y − yˆ ). The predicted value of y at
an x -value of 0% was calculated in c) as 9747 ($M). $5 billion = 5000 $Million. The Residual = 5000
($M) (Data) – 9747 ($M) (Predicted) = –4747 ($M). This model overestimated the Flow value.
24. Online clothing purchases.

a. The y -intercept is the value of the line when the x -variable is zero. The intercept can be used as a
starting point for predictions, but it is not meaningful in all circumstances. In this equation,
·
Purchases = −31.6 + 0.012 Income is of the form ŷ = b + b x , where b is the y -intercept. The y -
0 1 0
intercept for this equation is –31.6 ($). This represents the dollar value for yearly value in Purchases
when Income is zero. It is a starting value but meaningless as a prediction.
b. The slope represents the change in y , or the response variable, for every x -unit or one-unit step in the
predictor variable. The slope for this equation is 0.012, which means that for every 1($) increase in
Income, the Purchases increase by 0.012 ($), or just over one cent. If you multiply both by $1000, for
every $1000 increase in Income, Purchases increase by $12.
c. The predicted Purchases for an Income of $20,000 is calculated by substituting $20,000 for income in
·
the regression equation: Purchases = −31.6 + 0.012* 20, 000 = $208.40 .
d. The actual Purchases were $100, so the model overestimated the prediction. The residual is calculated
by subtracting the predicted value (calculated from the linear model equation) from the observed or
measured value (Residual = Data – Predicted or e = y − yˆ ). The predicted value of y at an x -value of
$20,000 was calculated in c) as $208.40. The Residual = $100 (Data) – $208.40 (Predicted) = –
$108.40.
25. The Home Depot, part 1.
a. The slope has units corresponding to a change in y / x , which in this case is quarterly Sales ($B)/U.S.
Housing Starts (thousands), translating into billions of dollars per thousand Housing Starts.
b. The R 2 value is r*r. Correlation r is given as 0.70, therefore r*r = 0.70*0.70=0.49 or 49%.
c. For one standard deviation below average in Housing Starts, we can use the standardized equation to
find a solution: zˆ y = rz x . One standard deviation below the mean in x represents a z-score
of –1, which results in a z -score for y of –r = –0.70, representing 0.70 standard deviations
below the mean in Sales.
26. House prices.

a. The slope has units corresponding to a change in y / x , which in this case is House Price($)/Living
Area (sq. ft.), translating into dollars per square foot
b. The R 2 value is r*r. Correlation r is given as 0.79, therefore r*r = 0.79*0.79=0.6241 or 62.41%.
c. For two standard deviation above average in Price, we can use the standardized equation to find a
solution: zˆ y = rz x . Two standard deviation above the mean in x represents a z-score of +2, which
results in a z -score for y of 2r = 2*0.79 = 1.58, representing 1.58 standard deviations above the
mean in Price.

116
27. Retail sales, part 1.

a. The R 2 value is given as 88.3%. This means that 88.3% of the variation in quarterly Sales can be
accounted for by the regression in the U.S. unemployment Rate. This is a high number and reflects a
strong relationship between these variables.
b. The R 2 value is r*r. Correlation r can be calculated by finding the square root of R 2 . In this case,
.883 = 0.94 . We expect a negative correlation between unemployment Rate and Sale, so r = 0.94.
c. The slope is –2.99, which means that for every 1% increase in unemployment Rate, Sales decrease by
2.99 ($B).

a. The R 2 value is given as 32.8%. This means that 32.8% of the variation in pizza Sales (in kg) can be
accounted for by the regression in Price per unit ($). This is a low number and reflects a fairly weak
relationship between these variables.
b. The R 2 value is r*r. Correlation r can be calculated by finding the square root of R 2 . In this case,
.329 = 0.573 , so r = –0.573. The slope is negative; therefore the correlation is also negative.
c. The slope is –24,369.49, which means that for every $1 increase in Price, Sales decrease by 24,369.49
kg. For a $0.50 increase, the predicted decrease is 12,184.75 kg.
29. Residual plots, part 1.

a. The linear model seems appropriate. The residual plot has appropriate scatter of points and nothing
remarkable.
b. The linear model is not appropriate for this data set. The data points are curved, indicating a nonlinear
relationship.
c. The linear model is not appropriate for this data set. The data points start out close together and then
the spread increases as x increases.
30. Residual plots, part 2.
a. The linear model is not appropriate for this data set. The data points are curved, indicating a nonlinear
relationship.
b. The linear model is not appropriate for this data set. The data points are spread out at the start and then
the spread decreases as x increases.
c. The linear model seems appropriate. The residual plot has appropriate scatter of points and nothing
remarkable.
31. The Home Depot, part 2.

a. The slope is 0.0535, which means that for every 1000 increase in unemployment Housing Starts, Sales
increase by 0.0535 ($B), or $53.5 million.
b. Calculate the prediction by substituting 500,000 units for Housing Starts and solve for Sales.
·
Sales = −11.5 + 0.0535*500 = 15.25($ B )
c. If Sales are $3 billion higher than predicted, the difference is the residual (the difference in the
y-values at a particular x-value).
32. Retail sales, part 2.

117
a. The slope is –2.994, which means that for every 1% increase in unemployment Rate, Sales decrease by
2.994 ($B).
b. Calculate the prediction by substituting 6% for unemployment Rate and solve for Sales.
·
Sales = 20.91 − 2.994*6 = 2.946($ B)
c. Calculate the prediction by substituting 4% for unemployment Rate and solve for Sales.
·
Sales = 20.91 − 2.994* 4 = 8.934($ B) . The actual Sales value is $8.5 billion. The residual is the
difference between the actual data and the predicted value: 8.5–8.934 = –$0.434 billion.
33. Consumer spending. There are two influential outliers that give more weight to the linear regression (slope and
intercept) and R 2 at 79%. The predictions will not be accurate for this regression. Looking at the scatterplot
illustrates why. Without these two data points, the R 2 drops to about 31%. The analyst should identify these
two customers and examine why they are outliers. For analysis of the rest of the data points, these two
customers should be set aside and the model refit to the rest of the data.
34. Insurance policies. There is one very influential outlier that gives more weight to the linear regression (slope
and intercept) and R 2 at 99.9%. The predictions will not be accurate for this regression. Looking at the
scatterplot illustrates why. Most of the data are clustered at the low end, with the exception of the high outlier.
Without this data point, the R 2 drops to 23.5%. The analyst should identify this salesperson and examine why
he/she is an outlier. For analysis of the rest of the data points, the salesperson should be set aside and the model
refit to the rest of the data.

118

119
35. Supermarket sales, part 1.

a. A linear model is somewhat appropriate here. The variables are quantitative and the relationship is
roughly linear with one or two possible outliers. The model may be influenced by at least one of the
outliers. R 2 is 56.9%, which makes the value of r close to 0.75, denoting at least a moderate
association. However, there are only 10 data points and more data for more store locations would
provide information that could alter our conclusion.
b. The correlation r is the square root of R 2 at .569, which equals 0.75.
c. The meaning of R 2 in this problem is that 56.9% of the variability in annual Sales in 2000 can be
accounted for by the variability in the Population of the town where the store is located.

a. Assuming population is in thousands and Sales are in millions of dollars in the regression:
Predicted Sales = 2.924 + 0.0703*Population = 2.924 + 0.0703*80 = $8.548M or $8,548,000.
b. Assuming population is in thousands and Sales are in millions of dollars, the slope of 0.0703 means
that for every increase in population of 1000 residents, sales increase by $0.0703M or $70,300.
c. The y -intercept is $2,924,068, which means that for a town with no Population, Sales is nearly $3
million. This does not make sense.
37. Misinterpretations, part 1.

a. R 2 is an indication of the strength of a model but not the appropriateness of the model. The model may
be influenced by other factors such as outliers or unusual data patterns. It is important to look at the
scatterplot for these other factors.
b. The statement should not be declared an absolute fact. The annual sales figure is a prediction.
The statement should be rephrased as “The model predicts the quarterly sales will be $10M when $1.5
million is spent on advertising.”

120
38. Misinterpretations, part 2.
a. R 2 measures the amount of variation explained by the linear model. Literacy Rate accounts for 64% of
the variation in GDP.
b. The slope of the line is a trend. In this case, it reflects a prediction of how GDP changes with a change
in Literacy Rate. Absolute statements should not be made with regard to specific values in the
interpretation of a regression.
39. Used BMW prices 2013, part 1.

a)
b) There is a fairly strong positive association between used BMW 850 CSI prices and their model year.
There seems to be a decreasing spread of data points as years increase (1994-1996 seem about the same).
c) A linear model is appropriate for this relationship. It satisfies all requirements for a linear model but use
with caution due to the curvature. There are no outliers or other unusual patterns although there is an
upward curvature.
d) Correlation r which can be calculated by finding the square root of R2 . In the case,
0.573 = 0.757 so r = 0.757.
e) 57.3% of the variation in Price of a used BMW 850 CSI can be accounted for by the Year the car was
made.
f) This relationship is not perfect. Other factors contribute to the variability of the price, such as options,
condition of car, and mileage.
40. Used BMW prices 2013, part 2.

· ice = −4, 291, 691 + 2160.28Year
a) Using technology, the regression equation is: Pr
b) The slope represents the change in y or the response variable for every x unit or one unit step in the predictor
variable. The slope for this equation is $2160.28, which means that for every one Year increase or one Year newer
model, the Price increases by $2160.28.
c) The y intercept does not make sense here as the value is negative and it represents the year zero.
d) To predict a sale Price for a 1997 model, substitute the year into the equation and solve:

121
· ice = −4, 291, 691 + 2160.28  1997 = $22388.16

Pr
e) It is to your advantage to buy a car with a negative residual. This means that the car’s Price is below the predicted
value for a particular Year using the linear model.

a. The scatterplot shows a general decreasing trend of Sales with Median Age. There is a weak
negative association between these variables. There are only 10 data points, and there is one low
outlier and possibly two high ones.
b. A linear model is possible for this data set but there are only 10 data points with a lot of scatter
and three possible outliers—one low value and two high values. Both variables are quantitative
and there are no obvious differences in the spread. However, you would be right to be cautious in
using any predictive equation.
c. ·
The regression equation is determined from technology as Sales = 24.116 − 0.4829  Median Age

122
d. The
predicted Sales
for a similar
supermarket in a
town with a
·
Median Age of 32 are Sales = 24.1155 − 0.48285*32 = 8.664 ($M).
e. The same reservations apply as defined in part b); the linear model is probably not very accurate.
Specifically, in looking at the scatterplot for a Median Age of 32, there are no other data than the
two high outliers.

a. The scatterplot shows the relationship as being positive and fairly straight with at least a moderate
association. There does seem to be one high outlier and only 10 data points.
b. A linear
model is
appropriate
because the
conditions are
satisfied for
quantitative variables with an even spread. However, there is possibly one high outlier and there
are only 10 data points. This is reason to be cautious about using a linear prediction.
c. ·
The regression equation from technology is Sales = 2.956$M + 0.168  Total Housing Units

123
d. ·
Sales = 2.952$M + 0.168*100 = 19.76$M
e. This is an extrapolation—100,000 units is more than twice the number of units in the largest town
in this data set. We should not be confident of this prediction.
43. Expensive cities

a) The association between cost of living in 2013 and 2007 is very weak. There are no outliers. The scatterplot
indicates that the linear model is appropriate but not likely to reveal much.
b) 7.0% of the variability in cost of living in 2013 can be explained by variability in cost of living in 2007.
c) -0.26
· 2013 = 191.6 - 0.683 Index 2007
d) Index
e) Moscow was predicted to have an index of 191.6 - 683(134.4) = 99.80, but the 2013 index was only 81.58,
so the residual is 81.58 - 99.80 = -18.22%.
44. El Niño.
a. The correlation r is the square root of R 2 (33.4%), which equals 0.578.
b. The meaning of R 2 in this context is that CO2 levels account for 33.4% of the variation in Mean
Temperature.
c. The regression equation can be developed from the regression output:

·
MeanTemperature = 15.3066 + 0.004 CO .2
d. The slope of 0.004 means that the predicted Mean Temperature (in degrees Celsius) has been
increasing at an average rate of 0.004 degrees ( 0C)/ parts per million of CO2.
e. The intercept would be the value for the Mean Temperature at 15.3066, if the parts per million of
CO2 were zero, but this doesn’t make any sense.
f. The residual scatterplot does not show anything remarkable, just a scattering about the zero point.
There is no evidence of violation of the assumptions of regression.
g. The Mean Temperature (0C) prediction for 364 parts per million CO2 levels is:
·
MeanTemperature = 15.3066 + 0.004*364 = 16.7626 0C.
45. Global fertility rate.
a)

124
b) Linear regression should not be used directly as the data are not linear.
The data would not become linear using logarithms, squares, or square roots, since the shape of the graph in a)
is different from the shapes of graphs of logarithms, squares, or square roots. Therefore linear regression cannot
be used on transformed data, either.
c)
Log10(global fertility rate - 2)

0.6
0.5
0.4
0.3
0.2
0.1
0
-0.11950 1960 1970 1980 1990 2000 2010 2020
-0.2
-0.3
-0.4
-0.5
The variables (year and fertility rate) are quantitative
The scatterplot is linear from 1970 onwards after doing the transformation
There are no outliers
The spread is uniform
The residuals show no pattern

125
0.04
0.03
0.02
0.01
Residual
0
-0.6 -0.4 -0.2 0 0.2 0.4 0.6
-0.01
-0.02
-0.03
-0.04
-0.05
Predicted
The regression is: log10(fertility - 2) = 38.35 - 0.0192 * year

The forecast for 2020 is log(fertility - 2) = 38.35 - 0.0192 * 2020 = -0.499
The forecast for 2020 is fertility = 2 + 10^(-0.499) = 2.32
The original data look as though it trends towards 2 and will not go lower
Hence we subtract 2 from the data so as to get a number that will trend towards 0 which is the minimum for taking
logarithms
The data is linear after 1970 but not if we include the period 1955-65
46. Solar Power.

a)
12
10
8
Cost ($/W)
0
0 50 100 150 200 250 300
Cumulative Volume (MW)
The data are quantitative

The scatterplot is non-linear
There are some high outliers at the top left
The spread is not uniform
A linear model should not be used
b) Using logarithms to the base 10

126
1.2
0.8
Log(Cost)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
Log(Cumulative Volume)
The data are quantitative

The trend is linear
There are some outliers
0.3
0.25
0.2
0.15
0.1
Residual
0.05
Predicted
0
0 0.2 0.4 0.6 0.8 1 1.2
-0.05
-0.1
-0.15
-0.2
-0.25

The linear model is Log(cost) = 1.12 - 0.288 * log(cumulative volume)
c) Log(cost) = 0.373039
Cost = 2.360692 $/W
d) The data are quantitative
The trend is linear
There are some outliers

127
0.8
0.6
0.4
0.2
Residuals
Predicted
0
0 0.5 1 1.5 2 2.5
-0.2
-0.4
-0.6
-0.8

The linear model is Log(cumulative volume) = 3.20 - 2.49 * log(cost)
e)
log(cumulative volume) = 2.762762
Cumulative volume =579.1108 MW
f) R squared = 0.72
This is the same for each model because it is the square of the correlation coefficient
There is only one correlation coefficient
47. Commercial Bakery.

a)
11.00%
10.00%
9.00%
Spoilage (%)
8.00%
7.00%
6.00%
5.00%
4.00%
1.5 2 2.5 3 3.5 4
Volume shipped (tns)
The variables are quantitative

The scatterplot is linear
128
The spread is fairly uniform
The linear model is spoilage = 0.144 - 0.0215 * volume
0.008
0.006
0.004
0.002
Residuals
Spoilage
0
0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105
-0.002
-0.004
-0.006
-0.008
b)
Spoilage = 0.144 - 0.0215 * 4 = 5.84%
c)
The spread is fairly uniform
The linear model is volume = 6.09 - 39.1 * spoilage
0.4
0.3
0.2
0.1
Residual
0
1.5 2 2.5 3 3.5 4
-0.1
-0.2
-0.3
-0.4
Volume shipped

129
d) Volume = 6.09 - 39.1 * 0.05 = 4.13 tns
e) R squared is the same for each model = 0.84

R squared is the square of the correlation coefficient
There is only one correlation coefficient
Therefore the R squared must be the same for each model
f) For every additional ton shipped we estimate a reduction of 2.15% in spoilage.

For every percentage reduction in spoilage we estimate an increase of 0.391 tons of volume.
48 LEED Certified Condos in Toronto.

a) The variables are quantitative
The scatterplot is non-linear
b) The explanatory variable is the additional cost.

The reduction in utility bills comes as a result of spending more on the condo

130
c)
The linear model is: Reduction = 1.18 + 47.4 * Log(additional cost)
There is no pattern in the residuals,

except that they increase as additional cost increases
d) Reduction = 1.18 + 47.4 * Log(5.2) = 35.0854

e) R squared = 0.97
f) For every additional unit of log(additional cost) forecast a reduction of $47.4 in monthly utility bills.
49. High Quality Paper

131
British Columbia
5 ($m)
Ontario ($m)
Sales ($m)
4
Quebec ($m)
0
1 2 3 4 5 6 7 8 9 10 11 12
Month
a)
Quantitative Linear
variables trend Outliers Spread
BC Yes Yes No Uniform
Ontario Yes No No Uniform
Quebec Yes No Yes Uniform
b) A linear model can be used for British Columbia The residuals show no pattern

132
Sales = 5.28 + 0.0919 * Month
c) The increase in sales is 0.0191 $m per month

This is consistent in the sense that the linear model
fits the data throughout the whole year
The R Squared is 0.944 showing that the model explains 94.4% of the data variability.
d) British Columbia
Amount = 0.0919 * 0.5 = 0.0459 $m
50. Racing Cars

a)
Agreed. There is no evidence of any linear trend and it is not possible to fit a linear model,
b)

133
Agreed. There is no evidence of any linear trend and it is not possible to fit a linear model,
c) The drivers cost is shifted back a season, since it is based on winnings the previous season.
Agreed The trend is linear so a linear model can be fitted

Driver = 1.05 + 0.983 * Winnings

134
The linear model explains 80% of the variability in the data as given by R Squared
d)
Agreed The trend is linear and the spread is uniform so a linear model can be fitted

135
Mechanics = 0.331 + 0.33 * Winnings
When winnings = $0.5m, mechanics should get 0.331 +0.33 * 0.5 = $0.496m 0.496 $m
Developers = 0.131 + 0.33 * winnings

When winnings = $0.5m, developers should get 0.131 +0.33 * 0.5 = $0.296m
0.296 $m
51. Bricks.

136
20
18
16
Sales revenue ($)
14
12
10
8
6
4
2
0
0 0.5 1 1.5 2 2.5
Price/brick ($)
The data provided shows a non-linear trend, with sales rising to a peak and declining at the higher prices. We do not
have a simple way to transform the variables to deal with this shape.
Although some students may have additional knowledge beyond this introductory chapter on linear regression and
may be able to fit a quadratic model to this data, the key to the exercise is a careful reading of the question.
The question specifically asks for a linear model, and it also ask us to estimate the number of bricks not the sales
revenue. We should therefore use data on the number of bricks which can easily be calculated from the sales
revenue and the price of the bricks. We obtain:
25
20
# bricks sold (m)
15
10
0
0 0.5 1 1.5 2 2.5
Price/brick ($)
Checking the conditions for linear regression:

• Data are quantitative
• Trend is linear
• No outliers
• Uniform spread throughout the data.
The linear model is: # bricks = 24.1 – 9.13 * price

137
1.5
1
Residuals
0.5
0
0 5 10 15 20 25
-0.5
-1
-1.5
Predicted
The residuals show no pattern.

The numbers of bricks the company could forecast to sell are:
18.0m at $0.67/brick
16.2m at $0.87/brick
52 Gas Pipeline Costs

a)
Materials cost $K/km

1000
900
800
700 y = 17.547x + 33.124
600 R² = 0.6159
500
400
300
200
100
0
0 5 10 15 20 25 30 35 40
Diameter (in)
b)

138
Materials cost $K/km

1000
900
800
y = 0.4201x + 180.89
700 R² = 0.707
600
500
400
300
200
100
0
0 200 400 600 800 1000 1200 1400
Diameter^2 (sq in)
c) The regression against the square of diameter is preferable to the regression against diameter for two reasons. (i)
The R-squared is higher indicating that the regression explains more of the variability in the data. (ii) The residuals
are positive at the start and at the end and are negative in the middle. In the regression against diameter, this effect is
more pronounced than in the regression against diameter squared indicating that the trend line is curved.
53 Piston Ring Entrepreneur

From the graph of sales, we can see that Larry did not have enough production capacity in quarters 2, 4, 7, 10. We
need to remove these data points from our forecast of demand, since we don’t know how much higher demand was
than sales during those quarters.
Sales
3
2.5
1.5
0.5
0
0 2 4 6 8 10 12
Quarter
a)

139
Demand
3.5
3 y = 0.2426x + 0.2424
R² = 0.9384
2.5
1.5
0.5
0
0 2 4 6 8 10 12
Quarter
b)
Demand
3
y = 2.4294x + 0.0486
2.5 R² = 0.9916
1.5
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2
Log(quarter)
c) The regression in b) is preferable for two reasons. (i) The R-squared is higher indicating that the regression
explains more of the variability in the data. (ii) The residuals are negative at the start and at the end and are positive
in the middle in a). This indicates that the trend line is curved and is better represented by the log transformation in
b).
Mini Case: Cost of Living
PLAN Setup: State the objective Examine the relationship between the overall cost of living and the cost of
a luxury apartment (per month),the price of a bus or subway ride, the price
of a CD, the price of an international newspaper, the price of a cup of
coffee (including service), and the price of a fast-food hamburger meal.

140
DO Mechanics: Large format tables and graphs (if any) are placed below this
PLAN/DO/REPORT table
REPORT Conclusion: State the Among the variables considered to be related to cost of living, rent has the
conclusion in the context strongest positive relationship and would be the best predictor of overall
of the original objective cost. The price of a cup of coffee has the weakest relationship (positive)
and would therefore be the worst predictor. A surprising relationship is the
strong negative association between the cost of living index and the price of
an international newspaper.
Scatterplot of Cost of Living vs Rent

120
110
100
Cost of Living
90
80
70
60
50
500 1000 1500 2000 2500
Rent
Correlations: Cost of Living, Rent

Pearson correlation of Cost of Living and Rent = 0.874
P-Value = 0.000
Regression Analysis: Cost of Living versus Rent

The regression equation is
Cost of Living = 61.4 + 0.0245 Rent
Predictor Coef SE Coef T P

Constant 61.385 4.286 14.32 0.000
Rent 0.024518 0.003648 6.72 0.000
S = 7.32085 R-Sq = 76.3%

141
Versus Fits
(response is Cost of Living)
10
0
Residual
-5
-10
-15
-20
70 80 90 100 110 120

Fitted Value
Scatterplot of Cost of Living vs Pubic Trans

120
110
100
Cost of Living
90
80
70
60
50
0.0 0.5 1.0 1.5 2.0
Pubic Trans
Correlations: Cost of Living, Pubic Trans

Pearson correlation of Cost of Living and Pubic Trans = 0.696
P-Value = 0.003
Regression Analysis: Cost of Living versus Pubic Trans

Cost of Living = 66.2 + 22.4 Pubic Trans

Constant 66.192 6.457 10.25 0.000
Pubic Trans 22.379 6.176 3.62 0.003
S = 10.8131 R-Sq = 48.4%

142
Versus Fits
20
10
Residual
-10
-20
70 80 90 100 110
Fitted Value
Scatterplot of Cost of Living vs CD

120
110
100
Cost of Living
90
80
70
60
50
6 7 8 9 10 11 12 13 14 15
CD
Correlations: Cost of Living, CD

Pearson correlation of Cost of Living and CD = 0.243
P-Value = 0.365
Regression Analysis: Cost of Living versus CD

Cost of Living = 65.9 + 1.73 CD

Constant 65.91 23.28 2.83 0.013
CD 1.725 1.843 0.94 0.365
S = 14.6020 R-Sq = 5.9%

143
Versus Fits
30
20
10
Residual
-10
-20
-30
76 78 80 82 84 86 88 90 92
Fitted Value
Scatterplot of Cost of Living vs News

120
110
100
Cost of Living
90
80
70
60
50
0.5 1.0 1.5 2.0 2.5
News
Correlations: Cost of Living, News

Pearson correlation of Cost of Living and News = -0.834
P-Value = 0.000
Regression Analysis: Cost of Living versus News

Cost of Living = 128 - 27.7 News

Constant 128.213 7.509 17.07 0.000
News -27.738 4.909 -5.65 0.000
S = 8.31014 R-Sq = 69.5%

144
Versus Fits
10
5
Residual
-5
-10
-15
50 60 70 80 90 100 110
Fitted Value
Scatterplot of Cost of Living vs Coffee

120
110
100
Cost of Living
90
80
70
60
50
1.0 1.5 2.0 2.5 3.0
Coffee
Correlations: Cost of Living, Coffee

Pearson correlation of Cost of Living and Coffee = 0.225
P-Value = 0.401
Regression Analysis: Cost of Living versus Coffee

Cost of Living = 74.7 + 7.24 Coffee

Constant 74.68 15.19 4.92 0.000
Coffee 7.236 8.362 0.87 0.401
S = 14.6652 R-Sq = 5.1%

145
Summary for Sale Price

Bedrooms = 4
A nderson-Darling N ormality Test
A -S quared 9.94
P -V alue < 0.005
M ean 288603
S tDev 133759
V ariance 17891502728
S kew ness 2.15002
Kurtosis 8.26042
N 362
M inimum 51000
1st Q uartile 204400
M edian 265000
200000 400000 600000 800000 1000000 3rd Q uartile 342774
M aximum 1155000
95% C onfidence Interv al for M ean
274778 302429
95% C onfidence Interv al for M edian
251149 279000
9 5 % C onfidence Inter vals 95% C onfidence Interv al for S tDev
Mean 124674 144284
Median
250000 260000 270000 280000 290000 300000

Bedrooms = 2
A -S quared 8.82
P -V alue < 0.005
M ean 152722
S tDev 67130
V ariance 4506414630
S kew ness 1.80255
Kurtosis 4.57971
N 223
M inimum 30000
M edian 136500
200000 400000 600000 800000 1000000 3rd Q uartile 173000
M aximum 501500
143863 161581
130000 147057
Mean 61424 74013
Median
130000 135000 140000 145000 150000 155000 160000
Correlations: Cost of Living, Fast Food

Pearson correlation of Cost of Living and Fast Food = 0.358
P-Value = 0.174
Regression Analysis: Cost of Living versus Fast Food

Cost of Living = 66.4 + 5.96 Fast Food

Constant 66.42 15.08 4.40 0.001
Fast Food 5.959 4.158 1.43 0.174
S = 14.0565 R-Sq = 12.8%

146

Central Air = Yes
A -S quared 11.64
P -V alue < 0.005
M ean 280605
S tDev 133721
V ariance 17881406429
S kew ness 2.18037
Kurtosis 8.75416
N 485
M inimum 82200
M edian 257500
200000 400000 600000 800000 1000000 3rd Q uartile 336000
M aximum 1155000
268675 292536
244922 270000
Mean 125802 142712
Median
240000 250000 260000 270000 280000 290000
Mini Case: Canadian Retail Sales
PLAN Setup. State the We want to find whether there is a relationship between buying power indices and retail sales in
objectives of the study.
Mechanics. Identify the Sales per household

variables Income per household
Sales per capita
Income per capita
Clothing sales per capita
Model. Check the The scatterplots given below confirm the conditions.
conditions. Quantitative Variables Condition—yes.
Linearity Condition—There are not many data points, but there is no indication that the relation
Outlier Condition—There are two cities that could be regarded as outliers. We will do the analy
Equal Spread Condition—We do not have enough data to check this condition, but there is no e
DO Mechanics. Find the

regression equations
and associated R2.

147
Without the two cities at the top right, we obtain:

148
(b)
Without the two cities at the top right, we get:

149
REPORT Conclusion. Interpret The value of R2 of around 0.5 in the regressions before removing the two cities at the top right
the results
When we remove those two cities, the value of R2 becomes too small to indicate any relationshi
It is marginal as to whether these two cities are outliers or not.
The data provided are inconclusive as to whether the buying power indices are related to retail s

150

Business Statistics Canadian 3rd Edition Sharpe Solutions Manual Download

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Statistics Canadian 3rd Edition Sharpe Solutions Manual Download

Uploaded by

Copyright:

Available Formats

Business Statistics Canadian

3rd Edition Sharpe

Solution manual: https://testbankpack.com/p/solution-

Test Bank: https://testbankpack.com/p/test-bank-for-

Chapter 7 – Introduction to Linear Regression

Copyright © 2018 Pearson Canada Inc.

a) We would expect sales to be 2 x 0.965 = 1.93 SDs above the mean.

b) This corresponds to 17.6 + 1.93 x 5.34 = 27.906, or $27,906 in sales.

c) We would expect sales to be 0.965 SDs below the mean.

d) This corresponds to 17.6 - 0.965 x 5.34 = 12.447, or $12,447 in sales.

e) The equation that predicts Sales from Number of Salespeople Working is

f) Using the equation to predict Sales if 18 people are working, we have

g) The value of the residual is 25 – 24.542 = 0.458 ($1000), or $458.

SECTIONS 7.4 and 7.5

Copyright © 2018 Pearson Canada Inc.

SECTIONS 7.6 and 7.7

13. Pizza sales and price, part 1.

Copyright © 2018 Pearson Canada Inc.

14. Used Honda prices 2012, part 1.

15. Pizza sales and price, part 2.

Average Sales = 141,865.53 – 24,369.49*(Average Price)

So the price at one SD above the average = $3.66 + $0.23 = $3.89.

16. Used Honda prices 2012, part 2.

19. GDP growth 2012, part 1.

20. European GDP growth 2012.

21. GDP growth 2012, part 2

Copyright © 2018 Pearson Canada Inc.

22. European GDP growth 2012, part 2.

23. Mutual funds.

Copyright © 2018 Pearson Canada Inc.

24. Online clothing purchases.

26. House prices.

Copyright © 2018 Pearson Canada Inc.

27. Retail sales, part 1.

28. Pizza sales and price, part 3.

29. Residual plots, part 1.

31. The Home Depot, part 2.

32. Retail sales, part 2.

Copyright © 2018 Pearson Canada Inc.

Copyright © 2018 Pearson Canada Inc.

Copyright © 2018 Pearson Canada Inc.

35. Supermarket sales, part 1.

b. The correlation r is the square root of R 2 at .569, which equals 0.75.

36. Supermarket sales, part 2.

37. Misinterpretations, part 1.

Copyright © 2018 Pearson Canada Inc.

38. Misinterpretations, part 2.

39. Used BMW prices 2013, part 1.

40. Used BMW prices 2013, part 2.

Copyright © 2018 Pearson Canada Inc.

· ice = −4, 291, 691 + 2160.28  1997 = $22388.16

41. Supermarket sales, part 3.

Copyright © 2018 Pearson Canada Inc.

42. Supermarket sales, part 4.

Copyright © 2018 Pearson Canada Inc.

43. Expensive cities

c. The regression equation can be developed from the regression output:

45. Global fertility rate.

Copyright © 2018 Pearson Canada Inc.

Log10(global fertility rate - 2)

Copyright © 2018 Pearson Canada Inc.

The regression is: log10(fertility - 2) = 38.35 - 0.0192 * year

46. Solar Power.

The data are quantitative