Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Self-Test Solutions and Answers to

Selected Even-Numbered Problems


The following sections present worked-out solutions to Self-Test Problems and brief answers to most of the even-numbered problems in the text. For more
detailed solutions, including explanations, interpretations, and Excel, JMP, and Minitab results, see the Student Solutions Manual.

CHAPTER 1 1.24 (a)


Row 16: 2323 6737 5131 8888 1718 0654 6832 4647 6510 4877
1.2 Small, medium, and large sizes imply order but do not specify how Row 17: 4579 4269 2615 1308 2455 7830 5550 5852 5514 7182
much the size of the business increases at each level. Row 18: 0989 3205 0514 2256 8514 4642 7567 8896 2977 8822
1.4 (a) The number of cellphones is a numerical variable that is discrete Row 19: 5438 2745 9891 4991 4523 6847 9276 8646 1628 3554
because the outcome is a count. It is ratio scaled because it has a true zero Row 20: 9475 0899 2337 0892 0048 8033 6945 9826 9403 6858
point. (b) Monthly data usage is a numerical variable that is continuous Row 21: 7029 7341 3553 1403 3340 4205 0823 4144 1048 2949
because any value within a range of values can occur. It is ratio scaled Row 22: 8515 7479 5432 9792 6575 5760 0408 8112 2507 3742
because it has a true zero point. (c) Number of text messages exchanged Row 23: 1110 0023 4012 8607 4697 9664 4894 3928 7072 5815
per month is a numerical variable that is discrete because the outcome is Row 24: 3687 1507 7530 5925 7143 1738 1688 5625 8533 5041
a count. It is ratio scaled because it has a true zero point. (d) Voice usage Row 25: 2391 3483 5763 3081 6090 5169 0546
per month is a numerical variable that is continuous because any value Note: All sequences above 5,000 are discarded. There were no repeating
within a range of values can occur. It is ratio scaled because it has a true sequences.
zero point. (e) Whether a cellphone is used for email is a categorical (b)
variable because the answer can be only yes or no. This also makes it a 089  189  289  389  489  589  689  789  889  989
nominal-scaled variable. 1089 1189 1289 1389 1489 1589 1689 1789 1889 1989
2089 2189 2289 2389 2489 2589 2689 2789 2889 2989
1.6 (a) Categorical, nominal scale (b) Numerical, continuous, ratio scale
3089 3189 3289 3389 3489 3589 3689 3789 3889 3989
(c) Categorical, nominal scale (d) Numerical, discrete, ratio scale
4089 4189 4289 4389 4489 4589 4689 4789 4889 4989
(e) Categorical, nominal scale.
(c) With the single exception of invoice 0989, the invoices ­selected
1.8 Type of data: (a) Numerical, continuous (b) Numerical, discrete
in the simple random sample are not the same as those selected in
(c) Numerical, continuous (d) Categorical scale. Measurement
the systematic sample. It would be highly unlikely that a simple
scale: (a) ratio scale (b) ratio scale (c) ratio scale (d) nominal scale.
­random sample would select the same units as a systematic sample.
1.10 The underlying variable, ability of the students, may be continuous,
1.26 (a) For the third value, Apple is spelled incorrectly. The twelfth
but the measuring device, the test, does not have enough precision to
value should be Blackberry not Blueberry. The fifteenth value, APPLE,
distinguish between the two students.
may lead to an irregularity. The eighteenth value should be Samsung not
1.12 (a) Data distributed by an organization or individual (b) sample. Samsun. (b) The eighth value is a missing value.

1.18 Sample without replacement: Read from left to right in three-digit 1.28 (a) The times for each of the hotels would be arranged in separate
sequences and continue unfinished sequences from the end of the row to columns. (b) The hotel names would be in one column and the times
the beginning of the next row: would be in a second column.
Row 05: 338 505 855 551 438 855 077 186 579 488 767 833 170 1.30 Before accepting the results of a survey of college students, you
Rows 05–06: 897 might want to know, for example: Who funded the survey? Why was
Row 06: 340 033 648 847 204 334 639 193 639 411 095 924 it conducted? What was the population from which the sample was
Rows 06–07: 707 selected? What sampling design was used? What mode of response was
Row 07: 054 329 776 100 871 007 255 980 646 886 823 920 461 used: a personal interview, a telephone interview, or a mail survey? Were
Row 08: 893 829 380 900 796 959 453 410 181 277 660 908 887 interviewers trained? Were survey questions field-tested? What questions
Rows 08–09: 237 were asked? Were the questions clear, accurate, unbiased, and valid?
Row 09: 818 721 426 714 050 785 223 801 670 353 362 449 What operational definition of immediately and effortlessly was used?
Rows 09–10: 406 What was the response rate?
Note: All sequences above 902 and duplicates are discarded.
1.32 The results are based on a survey of bank executives. If the frame
1.20 A simple random sample would be less practical for personal is supposed to be banking institutions, how is the population defined?
interviews because of travel costs (unless interviewees are paid to go to a There is no information about the response rate, so there is an undefined
central interviewing location). nonresponse error.
1.22 Here all members of the population are equally likely to be selected, 1.34 Before accepting the results of the survey, you might want to know,
and the sample selection mechanism is based on chance. But selection of for example: Who funded the study? Why was it conducted? What was
two elements is not independent; for example, if A is in the sample, we the population from which the sample was selected? What sampling
know that B is also and that C and D are not. design was used? What mode of response was used: a personal interview,

761
762 Self-Test Solutions and Answers to Selected Even-Numbered Problems

a telephone interview, or a mail survey? Were interviewers trained? Were (c) The percentage of complaints for each company:
survey questions field-tested? What other questions were asked? Were
the questions clear, accurate, unbiased, and valid? What was the response Company Total Percentage
rate? What was the margin of error? What was the sample size? What Bank of America 42 3.64%
frame was used? Capital One 93 8.07%
1.52 (a) All benefitted employees at the university. (b) The 3,095 employees Citibank 59 5.12%
who responded to the survey. (c) Gender, marital status, and employment Ditech Financial 31 2.69%
are categorical. Age (years), education level (years completed), and house- Equifax 217 18.82%
hold income ($) are numerical. Experian 177 15.35%
JPMorgan 128 11.10%
Nationstar Mortgage 39 3.38%
CHAPTER 2 Navient 38 3.30%
Ocwen 41 3.56%
2.2 (a) Table of frequencies for all student responses:
Synchrony 43 3.73%
Trans-Union 168 14.57%
Student Major Categories
Wells Fargo 77 6.68%
Gender A C M Totals
Grand Total 1,153
Male 14  9 2 25
Female  6  6 3 15 (d) Equifax, Trans-Union, and Experion, all of which are credit score
Totals 20 15 5 40 companies, have the most complaints.

(b) Table based on total percentages: 2.6 The largest sources of summer power-generating capacity in the
­United States are natural gas followed by coal. Nuclear, hydro, wind,
Student Major Categories and other generate about the same, and solar generates very little.
Gender A C M Totals
2.8 (a) Table of row percentages:
Male 35.0% 22.5%  5.0%  62.5%
Female 15.0% 15.0%   7.5%   37.5% Gender
Totals 50.0% 37.5% 12.5% 100.0% Overloaded Male Female Total
Table based on row percentages: Yes 44.08% 55.92% 100%
No 53.54% 46.46% 100%
Student Major Categories Total 51.64% 48.36% 100%
Gender A C M Totals
Male 56.0% 36.0%  8.0% 100.0% Table of column percentages:
Female 40.0% 40.0% 20.0% 100.0%
Totals 50.0% 37.5% 12.5% 100.0% Gender
Overloaded Male Female Total
Table based on column percentages: Yes 17.07% 23.13% 20.00%
Student Major Categories No 82.93% 76.87% 80.00%
Gender A C M Totals Total 100% 100% 100%
Male  70.0%  60.0%  40.0%  62.5%
Table of total percentages:
Female   30.0%   40.0%   60.0%  37.5%
Totals 100.0% 100.0% 100.0% 100.0% Gender
Overloaded Male Female Total
2.4 (a) The percentage of complaints for each category:
Yes 8.82% 11.18% 20.00%
Category Total Percentage No 42.82% 37.18% 80.00%
Bank Account or Service 202 9.330% Total 51.64% 48.36% 100%
Consumer Loan 132 6.097%
(b) A higher percentage of females feel information overload.
Credit Card 175 8.083%
Credit Reporting 581 26.836% 2.10 Social recommendations had very little impact on correct recall.
Debt Collection 486 22.448% Those who arrived at the link from a recommendation had a correct recall
Mortgage 442 20.416% of 73.07% as compared to those who arrived at the link from browsing
Other 72 3.326% who had a correct recall of 67.96%.
Student Loan 75 3.464% 2.12 73 78 78 78 85 88 91.
Grand Total 2,165
2.14 (a) $60, 000 – under $100,000, $100, 000 – under $140,000,
(b) There are more complaints for credit reporting, debt collection, and $140, 000 – under $180,000, $180, 000 – under $220,000,
mortgage than the other categories. These categories account for about $220, 000 – under $260,000, $260, 000 – under $300,000 (b) $40,000
70% of all the complaints. (c) $80,000, $120,000, $160,000, $200,000, $240,000, $280,000
Self-Test Solutions and Answers to Selected Even-Numbered Problems 763

2.16 (a) 2.22 (a)


Electricity Costs Frequency Percentage Percentage, Percentage,
$80 but less than $100 4 8% Bulb Life (hours) Mftr A Mftr B
$100 but less than $120 7 14% 46,500 but less than 47,500 7.5% 0.0%
$120 but less than $140 9 18% 47,500 but less than 48,500 12.5% 5.0%
$140 but less than $160 13 26% 48,500 but less than 49,500 50.0% 20.0%
$160 but less than $180 9 18% 49,500 but less than 50,500 22.5% 40.0%
$180 but less than $200 5 10% 50,500 but less than 51,500 7.5% 22.5%
$200 but less than $220 3 6% 51,500 but less than 52,500 0.0% 12.5%
(b)
(b)
Electricity Cumulative
Costs Frequency Percentage % Percentage Less Percentage Less
$ 99 4 8.00% 8.00% % Less Than Than, Mftr A Than, Mftr B
$119 7 14.00% 22.00% 47,500 7.5% 0.0%
$139 9 18.00% 40.00% 48,500 20.0% 5.0%
$159 13 26.00% 66.00% 49,500 70.0% 25.0%
$179 9 18.00% 84.00% 50,500 92.5% 65.0%
$199 5 10.00% 94.00% 511,500 100.0% 87.5%
$219 3 6.00% 100.00% 52,500 100.0% 100.0%

(c) The majority of utility charges are clustered between $120 and $180. (c) Manufacturer B produces bulbs with longer lives than Manufacturer
A. The cumulative percentage for Manufacturer B shows that 65% of its
2.18 (a), (b)
bulbs lasted less than 50,500 hours, contrasted with 92.5% of Manufacturer
Percent Cumulative A’s bulbs. None of Manufacturer A’s bulbs lasted at least 51,500 hours,
Credit Score Frequency (%) Percent (%) but 12.5% of Manufacturer B’s bulbs lasted at least 51,500 hours. At the
same time, 7.5% of Manufacturer A’s bulbs lasted less than 47,500 hours,
560 – under 580 4 0.16 0.16
whereas none of Manufacturer B’s bulbs lasted less than 47,500 hours.
580 – under 600 24 0.93 1.09
600 – under 620 68 2.65 3.74 2.24 (b) The Pareto chart is best for portraying these data because it not
620 – under 640 290 11.28 15.02 only sorts the frequencies in descending order but also provides the cu-
640 – under 660 548 21.32 36.34 mulative line on the same chart. (c) You can conclude that searching and
660 – under 680 560 21.79 58.13 buying online was the highest category and the other three were equally
likely.
680 – under 700 507 19.73 77.86
700 – under 720 378 14.71 92.57 2.26 (b) 84%. (d) The Pareto chart allows you to see which ­sources
720 – under 740 168 6.54 99.11 account for most of the electricity.
740 – under 760 22 0.86 99.96
2.28 (b) Since energy use is spread over many types of appliances, a bar
760 – under 780 1 0.04 100.00
chart may be best in showing which types of appliances used the most
(c) The average credit scores are concentrated between 620 and 720. energy. (c) Heating, water heating, and cooling accounted for 40% of the
­residential energy use in the United States.
2.20 (a)
2.30 (b) Females are more likely to be overloaded with information
Time in Seconds Frequency Percentage
2.32 (b) Social recommendations had very little impact on correct recall.
  5 – under 10 8 16%
10 – under 15 8 30% 2.34 50 74 74 76 81 89 92.
15 – under 20 8 36% 2.36 (a)
20 – under 25 8 12%
25 – under 30 8 6% Stem Unit 100
(b) 2 266889
3 0223558
Time in Seconds Percentage Less Than
4 0222349
5 0 5 1479
10 16 6
15 46 7 0239
20 82 8 8
25 94
30 100 (b) The results are concentrated between $220 and $490.
(c) The target is being met since 82% of the calls are ­being answered 2.38 (c) The majority of utility charges are clustered between
in less than 20 seconds. $120 and $180.
764 Self-Test Solutions and Answers to Selected Even-Numbered Problems

2.40 Property taxes on a $176K home seem concentrated between 2.60 Pivot table of tallies in terms of %:
$700 and $2,200 and also between $3,200 and $3,700.
Count of Type Star Rating
2.42 The average credit scores are concentrated between 620 and 720. Type One Two Three Four Five Grand Total
2.44 The target is being met since 82% of the calls are being answered in Growth 5.43% 17.12% 27.35% 11.27% 2.71% 63.88%
less than 20 seconds.  Low 1.25% 2.09% 4.80% 3.55% 1.46% 13.15%
2.46 (c) Manufacturer B produces bulbs with longer lives than  Average 1.67% 7.72% 15.87% 6.05% 0.42% 31.73%
­Manufacturer A.  High 2.51% 7.31% 6.68% 1.67% 0.84% 19.00%
Value 2.92% 10.65% 13.99% 7.31% 1.25% 36.12%
2.48 (b) Yes, there is a strong positive relationship between X and Y.
As X increases, so does Y.  Low 0.84% 4.38% 7.10% 4.38% 0.84% 17.54%
 Average 1.25% 4.80% 5.85% 2.71% 0.42% 15.03%
2.50 (c) There appears to be a linear relationship between the first  High 0.84% 1.46% 1.04% 0.21% 0.00% 3.55%
­weekend gross and either the U.S. gross or the worldwide gross of Harry
Grand Total 8.35% 27.77% 41.34% 18.58% 3.96% 100.00%
Potter movies. However, this relationship is greatly affected by the results
of the last movie, Deathly Hallows, Part II. (b) Patterns of star rating conditioned on risk:
2.52 (a), (c) There appears to be a positive relationship between the For the growth funds as a group, most are rated as three-star,
­download speed and the upload speed. Yes, this is borne out by the data. followed by two-star, four-star, one-star, and five-star. The pattern of star
rating is different among the various risk growth funds.
2.54 (b) There is a great deal of variation in the returns from decade to For the value funds as a group, most are rated as three-star, followed
decade. Most of the returns are between 5% and 15%. The 1950s, 1980s, by two-star, four-star, one-star and five-star. Among the high-risk value
and 1990s had exceptionally high returns, and only the 1930s and 2000s funds, more are two-star than three-star.
had negative returns. Most of the growth funds are rated as average-risk, followed by
2.56 (b) There was a decline in movie attendance between 2001 and high-risk and then low-risk. The pattern is not the same among all the
2016. During that time, movie attendance increased from 2002 to 2004 rating categories.
but then decreased to a level below that in 2001. Most of the value funds are rated as low-risk, followed by
­
average-risk and then high-risk. The pattern is the same among the
2.58 Pivot Table in terms of % ­three-star, four-star, and five-star value funds. Among the one-star and
two-star funds, there are more average risk funds than low risk funds.
Count of Type Star Rating (c)
Type One Two Three Four Five Grand Total
Growth 5.43% 17.12% 27.35% 11.27% 2.71% 63.88% Average of
3YrReturn% Star Rating
 Large 3.76% 7.72% 13.57% 5.43% 1.67% 32.15%
 Mid-Cap 1.25% 5.43% 7.52% 3.13% 0.63% 17.96% Type One Two Three Four Five Grand Total
 Small 0.42% 3.97% 6.26% 2.71% 0.42% 13.78% Growth 5.41 7.04 8.94 10.14 12.83 8.51
Value 2.92% 10.65% 13.99% 7.31% 1.25% 36.12%  Low 7.53 8.60 9.89 10.29 12.64 9.87
 Large 2.09% 6.68% 9.19% 3.97% 1.25% 23.18%  Average 6.17 7.99 9.28 10.43 11.96 9.06
 Mid-Cap 0.63% 2.09% 2.71% 1.04% 0.00% 6.47%  High 3.83 5.59 7.45 8.76 13.59 6.64
 Small 0.21% 1.88% 2.09% 2.30% 0.00% 6.48% Value 4.43 5.49 7.29 8.34 10.23 6.84
Grand Total 8.35% 27.77% 41.34% 18.58% 3.97% 100.00%  Low 5.29 7.00 7.66 8.57 10.74 7.76
 Average 5.01 4.98 6.97 7.96 9.23 6.41
(b) The growth and value funds have similar patterns in terms of star
 High 2.71 2.63 6.53 8.39 4.13
­rating and type. Both growth and value funds have more funds with a
Grand Total 5.07 6.45 8.38 9.43 12.01 7.91
rating of three. Very few funds have ratings of five.
(c) Pivot Table in terms of Average Three-Year Return
The three-year returns for growth funds is higher than for value funds.
The return is higher for funds with higher ratings than lower ratings. This
Count of Type Star Rating
pattern holds for the growth funds for each risk level. For the low risk and
Type One Two Three Four Five Grand Total average risk value funds, the return is lowest for the funds with a two-star
Growth 5.41 7.04 8.94 10.14 12.83 8.51 rating.
 Large 6.97 9.43 10.62 11.83 14.25 10.30 (d) There are 32 growth funds with high risk with a rating of three.
 Mid-Cap 2.27 5.07 7.93 8.77 11.22 6.93 These funds have an average three-year return of 7.45.
 Small 0.78 5.09 6.52 8.35 9.53 6.39 2.62 The fund with the highest five-year return of 15.72 is a large cap
Value 4.43 5.49 7.29 8.34 10.23 6.84 growth fund that has a four-star rating and low risk.
 Large 5.23 6.05 7.58 8.85 10.23 7.29
2.64 Funds 479, 471, 347, 443, and 477 have the lowest five-year return.
 Mid-Cap 2.79 5.77 7.32 9.26 - 6.69
 Small 1.33 3.20 5.93 7.04 - 5.39 2.66 The five funds with the lowest five-year return have (1) midcap
Grand Total 5.07 6.45 8.38 9.43 12.01 7.91 growth, average risk, one-star rating, (2) midcap growth, high risk,
two-star rating, (3) large value, average risk, two-star rating, (4) midcap
(d) There are 65 large cap growth funds with a rating of three. growth, high risk, one-star rating, and (5) small value, average risk,
Their average three year return is 10.62. ­two-star rating.
Self-Test Solutions and Answers to Selected Even-Numbered Problems 765

2.68 There has been a decline in the price of natural gas over time. (b) If the owner is interested in finding out the percentage of males
However, there is no pattern within the years. For some years, the price is and ­females who order dessert or the percentage of those who or-
higher in the beginning of the year. For other years, the price is higher in the der a beef ­entrée and a dessert among all patrons, the table of total
latter part of the year. Sometimes, there is little variation within the year. ­percentages is most informative. If the owner is interested in the effect
of gender on ordering of dessert or the effect of ordering a beef en-
2.88 (c) The publisher gets the largest portion (66.06%) of the revenue.
trée on the ordering of dessert, the table of column percentages will
24.93% is editorial production manufacturing costs. The publisher’s
be most informative. Because dessert is usually ordered after the main
marketing accounts for the next largest share of the revenue, at 11.6%.
entrée, and the owner has no direct control over the gender of patrons,
Author and bookstore personnel each account for around 11 to 12% of the
the table of row percentages is not very useful here. (c) 29% of the
revenue, whereas the publisher and bookstore profit and income account
men ordered desserts, compared to 17 of the women; men are almost
for more than 26% of the revenue. Yes, the bookstore gets almost twice
twice as likely to order dessert as women. Almost 38% of the patrons
the revenue of the authors.
ordering a beef entrée ordered dessert, compared to 16% of patrons
2.90 (b) The pie chart or the Pareto chart would be best. The pie chart ordering all other entrées. Patrons ordering beef are more than 2.3
would allow you to see each category as part of the whole, while the Pareto times as likely to order dessert as patrons ordering any other entrée.
chart would enable you to see that Small marketing/content marketing
team is the dominant category. (d) The pie chart or the Pareto chart would 2.94 (a) Most of the complaints were against U.S. airlines.
be best. The pie chart would allow you to see each category as part of the (b) More of the complaints were due to flight problems.
whole while the Pareto chart would enable you to see that very committed
2.96 (c) The alcohol percentage is concentrated between 4% and 6%,
to content marketing is the dominant category. (e) Most organizations
with more between 4% and 5%. The calories are concentrated between
have a small marketing/content marketing team and are very committed to
140 and 160. The carbohydrates are concentrated between 12 and 15.
content marketing.
There are outliers in the percentage of alcohol in both tails. There are a
2.92 (a) few beers with alcohol content as high as around 11.5%. There are a few
beers with calorie content as high as around 313 and carbohydrates as
Dessert Gender
high as 32.1. There is a strong positive relationship between percentage
Ordered Male Female Total of alcohol and calories and between calories and carbohydrates, and there
Yes 66% 34% 100% is a moderately positive relationship between percentage alcohol and
No 48% 52% 100% carbohydrates.
Total 52% 48% 100%
2.98 (c) There appears to be a strong positive relationship between the
Gender yield of the one-year CD and the five-year CD.
Dessert
Ordered Male Female Total 2.100 (a)
Yes  29%  17%  23%
No   71%   83%   77% Frequency (Boston)
Total 100% 100% 100% Weight (Boston) Frequency Percentage
3,015 but less than 3,050 2 0.54%
Dessert Gender 3,050 but less than 3,085 44 11.96%
Ordered Male Female Total 3,085 but less than 3,120 122 33.15%
Yes 15%  8%  23% 3,120 but less than 3,155 131 35.60%
No 37% 40%   77% 3,155 but less than 3,190 58 15.76%
Total 52% 48% 100% 3,190 but less than 3,225 7 1.90%
3,225 but less than 3,260 3 0.82%
Dessert Beef Entrée 3,260 but less than 3,295 1 0.27%
Ordered Yes No Total
Yes 52% 48% 100% (b)
No 25% 75% 100%
Frequency (Vermont)
Total 31% 69% 100%
Weight (Vermont) Frequency Percentage
Dessert Beef Entrée 3,550 but less than 3,600 4 1.21%
Ordered Yes No Total 3,600 but less than 3,650 31 9.39%
Yes  38%  16%  23% 3,650 but less than 3,700 115 34.85%
No   62%   84%   77% 3,700 but less than 3,750 131 39.70%
Total 100% 100% 100% 3,750 but less than 3,800 36 10.91%
3,800 but less than 3,850 12 3.64%
Dessert Beef Entrée 3,850 but less than 3,900 1 0.30%
Ordered Yes No Total
Yes 11.75% 10.79% 22.54% (d) 0.54% of the Boston shingles pallets are underweight and 0.27% are
No 19.52% 57.94% 77.46% overweight. 1.21% of the Vermont shingles pallets are underweight
Total 31.27% 68.73% 100% and 3.94% are overweight.
766 Self-Test Solutions and Answers to Selected Even-Numbered Problems

2.102 (a) of the New Call to Action Button and the New web design results in more
Percentage than twice as high a percentage of downloads than the combination of the
Calories Frequency Percentage Limit Less Than Original Call to Action Button and the Origuinal web design.
50 but less than 100 3 12% 100 12%
100 but less than 150 3 12% 150 24%
CHAPTER 3
150 but less than 200 9 36% 200 60%
200 but less than 250 6 24% 250 84% 3.2 (a) Mean = 7, median = 7, mode = 7. (b) Range = 9, S2 = 10.8,
250 but less than 300 3 12% 300 96% S = 3.286, CV = 46.948%. (c) Z scores: 0, -0.913, 0.609, 0,
300 but less than 350 0 0% 350 96% -1.217, 1.522. None of the Z scores are larger than 3.0 or smaller than
350 but less than 400 1 4% 400 100% -3.0. There is no outlier. (d) Symmetric because mean = median.
(b) 3.4 (a) Mean = 2, median = 7, mode = 7. (b) Range = 17, S2 = 62
Percentage S = 7.874, CV = 393.7%. (c) 0.635, -0.889, -1.270, 0.635, 0.889.
Cholesterol Frequency Percentage Limit Less Than There are no outliers. (d) Left-skewed because mean 6 median.
0 but less than 50 2 8% 50 8% 3.6 -0.0835
50 but less than 100 17 68% 100 76%
3.8 (a)
100 but less than 150 4 16% 150 92%
Grade X Grade Y
150 but less than 200 1 4% 200 96%
Mean 575 575.4
200 but less than 250 0 0% 250 96%
Median 575 575
250 but less than 300 0 0% 300 96% Standard deviation 6.40 2.07
300 but less than 350 0 0% 350 96%
(b) If quality is measured by central tendency, Grade X tires provide
350 but less than 400 0 0% 400 96% slightly better quality because X’s mean and median are both equal to the
400 but less than 450 0 0% 450 96% expected value, 575 mm. If, however, quality is measured by consistency,
450 but less than 500 1 4% 500 100% Grade Y provides better quality because, even though Y’s mean is only
slightly larger than the mean for Grade X, Y’s standard deviation is much
(e) There is very little relationship between calories and cholesterol. smaller. The range in values for Grade Y is 5 mm compared to the range
(f) The sampled fresh red meats, poultry, and fish vary from 98 to in values for Grade X, which is 16 mm.
397 ­calories per serving, with the highest concentration between (c)
150 and 200 calories. One protein source, spareribs, with 397 calories, Grade X Grade Y, Altered
is more than 100 calories above the next-highest-caloric food. Spareribs
Mean 575 577.4
and fried liver are both very different from other foods sampled—the
former on calories and the latter on cholesterol content. Median 575 575
Standard deviation 6.40 6.11
2.104 (b) There is a downward trend in the amount filled. (c) The amount
When the fifth Y tire measures 588 mm rather than 578 mm, Y’s mean
filled in the next bottle will most likely be below 1.894 liters. (d) The
inner diameter becomes 577.4 mm, which is larger than X’s mean inner
scatter plot of the amount of soft drink filled against time reveals the
diameter, and Y’s standard deviation increases from 2.07 mm to 6.11 mm.
trend of the data, whereas a histogram only provides information on
In this case, X’s tires are providing better quality in terms of the mean
the distribution of the data.
inner diameter, with only slightly more variation among the tires than Y’s.
2.106 (a) The percentage of downloads is 9.64% for the Original Call to
3.10 (a), (b)
Action Button and 13.64% for the New Call to Action Button. (c) The New
Call to Action Button has a higher percentage of downloads at 13.64% when Download Upload
compared to the Original Call to Action Button with a 9.64% of downloads. Speed (Mbps) Speed (Mbps)
(d) The percentage of downloads is 8.90% for the Original web design and Mean 14.2333 8.1222
9.41% for the New web design. (f) The New web design has only a slightly Median 11.2 6.4
higher percentage of downloads at 9.41% when compared to the Original Minimum 4.5 3
web design with an 8.90% of downloads. (g) The New web design is only
Maximum 24 14.3
slightly more successful than the Original web design while the New Call
Range 19.5 11.3
to Action Button is much more successful than the Original Call to Action
Variance 49.7950 16.2319
Button with about 41% higher percentage of downloads.
(h) Standard deviation 7.0566 4.0289
Coefficient of variation 49.58% 49.60%
Call to Action Percentage of
Skewness 0.1932 0.3862
Button Web Design Downloads
Kurtosis -1.5292 -1.2358
Old Old 8.30%
Sample size 9 9
New Old 13.70%
Old New 9.50% (c) The mean is greater than the median for both the download speed and
New New 17.00% the upload speed indicating a right or positive skewed distribution (the
(i) The new Call to Action Button and the New web design together had a skewness statistic is also positive). The kurtosis statistic is negative for
higher percentage of downloads. (j) The New web design is only slightly both the download speed and the upload speed indicating distributions
more successful than the Original web design while the New Call to Action that are less peaked than a normal (bell-shaped) distribution.
Button is much more successful than the Original Call to Action Button (d) The mean download speed is much higher than the mean ­upload
with about 41% higher percentage of downloads. However, the combination speed. The median download speed indicates that half the carriers have
Self-Test Solutions and Answers to Selected Even-Numbered Problems 767

a download speed of at least 11.2 mbps as compared to a median Mobile Commerce


upload speed of 6.4 mbps that indicates that half the carriers Country Penetration (%) Z Score
have an upload speed of at least 6.4 mbps. There is much more
­variation in the download speed than the upload speed because France 19 -1.09664
the standard deviation is 7.0566 as compared to 4.0289. Germany 26 -0.37777

3.12 (a), (b) Hong Kong 36 0.649184


60-Second Ads 30-Second Ads India 23 -0.68586
Mean 5.10 4.90 Indonesia 33 -0.341097
Median 5.30 4.81 Italy 23 -0.68586
Minimum 3.22 3.55 Japan 11 -1.91821
Maximum 6.91 6.64 Malaysia 38 0.854576
Range 3.69 3.09 Mexico 21 -0.89125
Variance 1.1088 0.6745 Philippines 26 -0.37777
Standard deviation 1.0530 0.8213 Poland 23 -0.68586
Coefficient of variation 20.63% 16.76% Russia 21 -0.89125
Skewness -0.5268 0.4382 Saudi Arabia 33 -0.341097
Kurtosis -0.3289 -0.2371 Singapore 40 1.059968
Sample size 17 40 South Africa 15 -1.50743
South Korea 55 2.600405
(c) The mean score is less than the median for the 60-second ads
Spain 30 0.033009
­indicating a left- or negative-skewed distribution (the skewness statistic
is also negative). The mean score is slightly greater than the median for Thailand 41 -1.162664
the 30-second ads indicating a right- or positive-skewed distribution Turkey 31 0.135705
(the skewness statistic is also positive). The kurtosis statistic is slightly United Arab Republic 47 1.778838
negative for both the 60- and 30-second ads indicating distributions that
United Kingdom 37 0.75188
are less peaked than a normal (bell-shaped) distribution. (d) The mean ad
score is higher for the 60-second ads than for the 30-second ads. The United States 33 0.341097
median ad score for the 60-second ads indicates that half the scores Vietnam 28 -0.17238
are at least 5.30 as compared to a median ad score for the 30-­second
ads that indicates that half scores are at least 4.81. There is much more Because there are no Z values below -3.0 or above 3.0, there are no
variation in the scores of the 60-second ads than the 30-­second ads outliers.(c) The mean is greater than the median, so Mobile Commerce
because the standard deviation is 1.0530 as compared to 0.8213. ­Penetration is right-skewed. (d) The mean Mobile Commerce Penetration
is 29.6786% and half the countries have values greater than or equal to
3.14 (a), (b) 27.5%. The average scatter around the mean is 9.375%. The lowest value
Mobile Commerce Penetration (%) is 11% (Japan) and the highest value is 55% (South Korea).
Mean 29.6786 3.16 (a), (b)
Median 27.5 Price (USD)
Mode 23 Mean 117.4615
Minimum 11 Median 116
Maximum 55 Mode 138
Range 44 Range 53
Variance 94.8188 Variance 263.6025
Standard Deviation 9.7375 Standard Deviation 16.2358
Coefficient of Variation 32.81%
(c) The mean room price is $117.4615 and half the room prices are
Skewness 0.5506 greater than or equal to $116, so room price is slightly right-skewed. The
Kurtosis 0.5024 average scatter around the mean is 16.2358. The lowest room price is
Count 28 $85 in Mexico and the highest room price is $138 in Japan. (d) The mean
increases to 120.7692, while the median and the mode remain the same.
Standard Error 1.8402
The data is now slightly more right-skewed. The average scatter around
the mean increases to 22.5876. The range is now 90.
Mobile Commerce
Country Penetration (%) Z Score 3.18 (a) Mean = 7.11, median = 6.68. (b) Variance = 4.336,
Argentina 23 0.68586 standard deviation = 2.082, range = 6.67, CV = 29.27%.
Australia 27 -0.27508
Waiting Time Z Score Waiting Time Z Score
Brazil 26 -0.3777
9.66 1.222431 10.49 1.62105
Canada 25 -0.48047
5.90 -0.58336 6.68 -0.20875
China 40 1.059968 8.02 0.434799 5.64 -0.70823
(continued)
768 Self-Test Solutions and Answers to Selected Even-Numbered Problems

Waiting Time Z Score Waiting Time Z Score ing times greater than five minutes. So the customer is likely to ­experience
a waiting time in excess of five minutes. The manager overstated the bank’s
5.79 -0.63619 4.08 -1.45744
service record in responding that the customer would “almost certainly” not
8.73 0.775786 6.17 -0.45369 wait longer than five minutes for service.
3.82 -1.58231 9.91 1.342497
3.20 (a) [(1 + 0.3415) * (1 + (0.0993)]1/2 - 1 = 0.2144 or 21.44%.
8.01 0.429996 5.47 -0.78987
(b) = ($1,000) * (1 + 0.2144) * (1 + 0.2144) = $1,474.77 (c) The
8.35 0.593286 result for Facebook was better than the result for GE, which was worth
Since there are no Z values below -3.0 or above 3.0, there are no outliers. $1,250.37.

(c) Because the mean is greater than the median, the distribution is right- 3.22 (a) Platinum = -10.09% gold = -9.33% silver = -10.48%.
skewed. (d) The mean and median are both greater than five minutes. The (b) All the metals had about the same negative return of approximately 10%.
distribution is right-skewed, meaning that there are some unusually high (c) All the metals had negative returns, whereas the three stock indices all
values. Further, 13 of the 15 bank customers sampled (or 86.7%) had wait- had positive returns.

3.24 (a)
Mean of 3YrReturn% Rating
Type One Two Three Four Five Grand Total
Growth 5.41 7.04 8.94 10.14 12.83 8.51
 Large 6.97 9.43 10.62 11.83 14.25 10.30
 Mid-Cap 2.27 5.07 7.93 8.77 11.22 6.93
 Small 0.78 5.09 6.52 8.35 9.53 6.39
Value 4.43 5.49 7.29 8.34 10.23 6.84
 Large 5.23 6.05 7.58 8.85 10.23 7.29
 Mid-Cap 2.79 5.77 7.32 9.26 – 6.69
 Small 1.33 3.20 5.93 7.04 – 5.39

(b)
StdDev of 3Yr Return% Rating
Type One Two Three Four Five Grand Total
Growth 3.72 2.85 2.71 2.23 2.12 3.19
 Large 2.86 1.34 2.23 1.43 0.89 2.56
 Mid-Cap 3.49 2.04 2.08 1.03 1.02 2.86
 Small 0.84 2.40 2.08 2.11 0.62 2.52
Value 2.07 2.40 1.20 2.09 1.32 2.33
 Large 1.81 1.68 0.98 1.63 1.32 1.93
 Mid-Cap 1.00 2.90 1.13 0.99 – 2.51
 Small – 2.88 1.36 2.62 – 2.35
Grand Total 3.24 2.78 2.44 2.34 2.24 3.02

(c) The mean three-year return of small-cap funds is much lower than rises, consistent to the mean three-year returns for all growth and value
mid-cap and large funds. Five-star funds for all market cap categories funds.
show the highest mean three-year returns. The mean three-year returns The standard deviations of the three-year return for large-cap and
for all combinations of type and market cap rises as the star rating mid-cap value funds vary greatly among star rating categories.
3.26 (a)
Mean of 3Yr Return% Rating
Type One Two Three Four Five Grand Total
Growth 5.41 7.04 8.94 10.14 12.83 8.51
 Low 7.53 8.60 9.89 10.29 12.64 9.87
 Average 6.17 7.99 9.28 10.43 11.96 9.06
 High 3.83 5.59 7.45 8.76 13.59 6.64
Value 4.43 5.49 7.29 8.34 10.23 6.84
 Low 5.29 7.00 7.66 8.57 10.74 7.76
 Average 5.01 4.98 6.97 7.96 9.23 6.41
 High 2.71 2.63 6.53 8.39 – 4.13
Grand Total 5.07 6.45 8.38 9.43 12.01 7.91
Self-Test Solutions and Answers to Selected Even-Numbered Problems 769

(b)
StdDev of 3Yr Return% Rating
One Two Three Four Five Grand Total
Growth 3.72 2.85 2.71 2.23 2.12 3.19
 Low 3.27 1.57 2.02 2.05 2.04 2.42
 Average 4.37 2.43 2.67 2.42 2.51 2.86
 High 2.98 2.92 2.73 1.43 2.47 3.39
Value 2.07 2.40 1.20 2.09 1.32 2.33
 Low 1.46 1.12 1.00 2.15 0.85 1.72
 Average 2.11 2.43 1.25 2.09 1.87 2.27
 High – 2.88 1.36 2.62 – 2.35
Grand Total 3.24 2.78 2.44 2.34 2.24 3.02

(c) The mean three-year return of high-risk funds is much lower than 3.44 (a) Covariance = 65.2909, (b) r = +1.0. (c) there is a perfect
the other risk categories except for five-star funds. In all risk catego- positive relationship.

a (Xi - X)(Yi - Y)
ries, five-star funds have the highest mean three-year return. The mean n

three-year returns for high-risk growth and value funds for one-, two-, i=1 800
and three-star rating funds are lower than the means for the other risk 3.46 (a) cov(X, Y) = = = 133.3333.
n - 1 6
categories.
cov(X, Y) 133.3333
The standard deviations of the three-year return for low-risk funds (b) r = = = 0.8391.
SXSY (46.9042)(3.3877)
show the most consistency across star rating categories and the standard
deviations of the three-year return for low-risk funds are the lowest across (c) The correlation coefficient is more valuable for expressing the
categories. They also vary greatly among star rating categories. ­relationship between calories and sugar because it does not depend on the
units used to measure calories and sugar. (d) There is a strong positive
3.28 (a) 4, 9, 5. (b) 3, 4, 7, 9, 12. (c) The distances between the median linear relationship between calories and sugar.
and the extremes are close, 4 and 5, but the differences in the tails are
different (1 on the left and 3 on the right), so this distribution is slightly 3.48 (a) cov(X, Y) = 26.9842 (b) r = 0.9491 (c) There is a positive
right-skewed. (d) In Problem 3.2 (d), because mean = median, the linear relationship between download and upload speed.
­distribution is symmetric. The box part of the graph is slightly left 3.64 (a) Mean = 45.22, median = 45, 1st quartile = 25,
skewed, but the tails show right-skewness. 3rd quartile = 63. (b) Range = 83, interquartile range = 38,
3.30 (a) -6.5, 8, 14.5. (b) -8, -6.5, 7, 8, 9. (c) The shape is left-skewed. variance = 535.7949, ­standard deviation = 23.1472, CV = 51.19%.
(d) This is consistent with the answer in Problem 3.4 (d). (c) The distribution is approximately symmetric. (d) The mean approval
process takes 45.22, days, with 50% of the policies being approved in less
3.32 (a), (b) Minimum = 11 Q1 = 23, Median = 27.5 Q3 = 37 than 45 days. 50% of the applications are approved between 25 and 63
Maximum = 55 Interquartile range = 14 (c) the boxplot is right days. About 25% of the applications are approved in no more than 25 days.
skewed.
3.66 (a) Mean = 14.98, median = 15 range = 23, S = 5.5567. The
3.34 (a), (b) 60 Seconds: Q1 = 4.46, Q3 = 5.88, Interquartile mean and median width virtually equal. The range of the answer time is
range = 1.42; 30 Seconds: Q1 = 4.37, Q3 = 5.31, Interquartile 23 seconds, and the average scatter around the mean is 5.5567 seconds.
range = 0.94 (c) The boxplot plot for 60 seconds is approximately (b) 5 12 15 18 28. (c) Even though the mean = median, the right tail is
­symmetrical while the boxplot for 30 seconds is right-skewed. longer, so the distribution is right-skewed. (d) The service level is being
3.36 (a) Commercial district five-number summary: 0.38 3.2 4.5 5.55 met because 75% of the calls are answered in less than 18 seconds.
6.46. Residential area five-number summary: 3.82 5.64 6.68 8.73 10.49. 3.68 (a), (b)
(b) Commercial district: The distribution is left-skewed. Residential area: Bundle Score Typical Cost ($)
The distribution is slightly right-skewed. (c) The central tendency of the Mean 54.775 24.175
waiting times for the bank branch located in the commercial district of a
Median 62 20
city is lower than that of the branch located in the residential area. There
are a few long waiting times for the branch located in the residential area, Mode 75 8
whereas there are a few exceptionally short waiting times for the branch Standard Deviation 27.6215 18.1276
located in the commercial area. Sample Variance 762.9481 328.6096
3.38 (a) Population mean, m = 6. (b) Population standard deviation, Range 98 83
s = 1.673, population variance, s2 = 2.8. Minimum 2 5
3.40 (a) 68%. (b) 95%. (c) At least 0%, 75%, 88.89%. (d) m - 4s to Maximum 100 88
m + 4s or -2.8 to 19.2. First Quartile 34 9
67.33 Third Quartile 75 31
3.42 (a) Mean = = 13.4771 variance = 11.6792, standard
51
Interquartile Range 41 22
deviation = 211.6792 = 3.4175 (b) 74.51%, 96.08%, and 98.04% of
these locations have mean per capita energy consumption within 1, 2, and CV 50.43% 74.98%
3 standard deviations of the mean, respectively. (c) This is slightly differ- (c) The typical cost is right-skewed, while the bundle score is left-skewed.
ent from 68%, 95%, and 99.7%, according to the empirical rule. (d) r = 0.3465. (e) The mean typical cost is $24.18, with an average
770 Self-Test Solutions and Answers to Selected Even-Numbered Problems

spread around the mean equaling $18.13. The spread between the lowest (c) The data are right-skewed. (d) r = 0.7575 (e) The mean abandon-
and highest costs is $83. The middle 50% of the typical cost fall over ment rate is 13.86%. Half of the abandonment rates are less than 10%.
a range of $22 from $9 to $31, while half of the typical cost is below One-quarter of the abandonment rates are less than 9% while another
$20. The mean bundle score is 54.775, with an average spread around one-quarter are more than 20%. The overall spread of the abandonment
the mean equaling 27.6215. The spread between the lowest and highest rates is 29%. The middle 50% of the abandonment rates are spread over
scores is 98. The middle 50% of the scores fall over a range of 41 from 11%. The average spread of abandonment rates around the mean is
34 to 75, while half of the scores are below 62. The typical cost is right- 7.62%. The abandonment rates are right-skewed.
skewed, while the bundle score is left-skewed. There is a weak positive
3.78 (a), (b)
linear relationship between typical cost and bundle score.

3.70 (a) Boston: 0.04, 0.17, 0.23, 0.32, 0.98; Vermont: 0.02, 0.13, 0.20, Average Credit Score
0.28, 0.83. (b) Both distributions are right-skewed. (c) Both sets of shingles Mean 673.24
did well in achieving a granule loss of 0.8 gram or less. Only two Boston Median 672.02
shingles had a granule loss greater than 0.8 gram. The next highest to these Mode 684.52
was 0.6 gram. These two values can be considered outliers. Only 1.176% of Standard Deviation 31.7156
the shingles failed the specification. Only one of the Vermont shingles had
Sample Variance 1,005.8784
a granule loss greater than 0.8 gram. The next highest was 0.58 gram. Thus,
only 0.714% of the shingles failed to meet the specification. Range 214.51
Minimum 565.00
3.72 (a) The correlation between calories and protein is 0.4644. (b) The
Maximum 779.51
correlation between calories and cholesterol is 0.1777. (c) The correlation
between protein and cholesterol is 0.1417. (d) There is a weak posi- Count 2,570
tive linear relationship between calories and protein, with a correlation First Quartile 649.82
coefficient of 0.46. The positive linear relationships between calories and Third Quartile 697.21
cholesterol and between protein and cholesterol are very weak. Interquartile Range 47.39
3.74 (a), (b) Skewness -0.0071
Kurtosis -0.3710
Annual Taxes on Median Home
CV 4.71%
$176K Home Value ($000)
Mean 1,979.490196 195.6509804 (c) The data are symmetrical. (d) The mean of the average credit scores is
673.24. Half of the average credit scores are less than 672.02. One-quarter
Median 1,763 165.9
of the average credit scores are less than 649.82 while another one-quarter
Mode #N/A #N/A is more than 697.21. The overall spread of average credit scores is 214.51.
Minimum 489 100.2 The middle 50% of the average credit scores spread over 47.39. The
Maximum 4,029 504.5 ­average spread of average credit scores around the mean is 31.7156.
Range 3,540 404.3
Variance 11,065.8549 7,418.7265 CHAPTER 4
Standard Deviation 900.5919 86.1320
4.2 (a) Simple events include selecting a red ball. (b) Selecting a white ball.
Coeff. of Variation 45.50% 44.02% (c) The sample space consists of the 12 red balls and the 8 white balls.
Skewness 0.6423 1.6988
4.4 (a) 0.6. (b) 0.10. (c) 0.35. (d) 0.90.
Kurtosis -0.5014 3.3069
Count 51 51 4.6 (a) Mutually exclusive, not collectively exhaustive. (b) Not mutually
Standard Error 126.1081 12.0609 exclusive, not collectively exhaustive. (c) Mutually exclusive, not collec-
tively exhaustive. (d) Mutually exclusive, collectively exhaustive.
(c) The box plot shows that taxes are right skewed and the median value
of homes is highly right skewed.(d) The coefficient of correlation is 4.8 (a) Is a millennial. (b) Is a millennial and feels tense or stressed out at
-0.041. (e) There is a large variation in taxes and the median value of work. (c) Does not feel tense or stressed out at work. (d) Is a millennial
homes from state to state.. and feels tense or stressed out at work is a joint event because it consists
of two characteristics.
3.76 (a), (b)
4.10 (a) A marketer who plans to increase use of LinkedIn. (b) A B2B mar-
Abandonment Rate in % (7:00 am–3:00 pm)
keter who plans to increase use of LinkedIn. (c) A marketer who does not
Mean 13.8636 plan to increase use of LinkedIn. (d) A marketer who plans to increase use
Median 10 of LinkedIn and is a B2C marketer is a joint event because it consists of two
Mode  9 characteristics, plans to increase use of LinkedIn and is a B2C marketer.
Standard Deviation  7.6239
4.12 (a) 1,010/1,740 = 0.5805. (b) 69/1,740 = 0.0397.
Sample Variance 58.1233 (c) 1,021/1,740 = 0.5868. (d) The probability in (c) includes the proba-
Range 29 bility that gains in students’ learning attributable to education technology
Minimum  5 have justified colleges’ spending in this area plus the probability that the
Maximum 34 person is a technology leader.
First Quartile  9 4.14 (a) 304/1,520 = 0.20. (b) 170/1,520 = 0.1118.
Third Quartile 20 (c) 869/1,520 = 0.5717. (d) 1.00.
Interquartile Range 11 4.16 (a) 0.33. (b) 0.33. (c) 0.67. (d) Because P(A∙ B) = P(A) = 1/3,
CV 54.99% events A and B are independent.
Self-Test Solutions and Answers to Selected Even-Numbered Problems 771

4.18 0.50. CHAPTER 5


4.20 Because P(A and B) = 0.20 and P(A)P(B) = 0.12, events A and B 5.2 (a)
are not independent. m = 0(0.10) + 1(0.20) + 2(0.45) + 3(0.15) + 4(0.05) + 5(0.05) = 2.0.
4.22 (a) 0.7601. (b) 0.5200. (c) probability (increased use of LinkedIn) = (0-2)2(0.10) + (1-2)2(0.20) + (2-2)2(0.45) +
(b) s = = 1.183.
0.6040, which is not equal to P(Increased use of LinkedIn∙ B (3-2)2(0.15) + (4-2)2(0.05) + (5-2)2(0.05)
B2B) = 0.7601. Therefore, increased use of LinkedIn and business focus (c) 0.45 + 0.15 + 0.05 + 0.05 = 0.70.
are not independent.
5.4 (a) X P(X)
4.24 (a) 952/1,671 = 0.5697. (b) 719/1,671 = 0.4303. $ - 1 21>36
(c) 58/69 = 0.8406. (d) 11/69 = 0.1594. $ + 1 15>36
4.26 (a) 0.0417. (b) 0.0375. (c) Because P(Needs warranty repair∙
(b) X P(X)
Manufacturer based in the United States) = 0.0417 and
$ - 1 21>36
P(Needs warranty repair) = 0.04, the two events are not independent.
$ + 1 15>36
4.28 (a) 0.0045. (b) 0.012. (c) 0.0059. (d) 0.0483.
(c) X P(X)
4.30 0.095.
$ - 1 30>36
4.32 (a) 0.736. (b) 0.997.
$ + 4 6>36
(0.5)(0.3)
4.34 (a) P(B′∙ O) = = 0.4615.
(0.5)(0.3) + (0.25)(0.7) (d) - $0.167 for each method of play.
(b) P(O) = 0.175 + 0.15 = 0.325.
5.6 (a) 2.1058. (b) 1.4671. (c) 66>104 = 0.6346.
4.36 (a) P(Huge success ∙ Favorable review) = 0.099/0.459 = 0.2157;
5.8 (a) E(Bond Fund) = $58.20; E(Common Stock Fund) = $63.01.
P(Moderate success ∙ Favorable review) = 0.14/0.459 = 0.3050; (b) sX = $61.55; sY = $195.22. (c) Based on the expected value cri-
P(Break even∙ Favorable review) = 0.16/0.459 = 0.3486; teria, you would choose the common stock fund. However, the common
P(Loser∙ Favorable review) = 0.06/0.459 = 0.1307. stock fund also has a ­standard deviation more than three times higher than
(b) P(Favorable review) = 0.459. that for the ­corporate bond fund. An investor should carefully weigh the
increased risk. (d) If you chose the common stock fund, you would need
4.38 310 = 59,049. to assess your reaction to the small possibility that you could lose virtual-
4.40 (a) 27 = 128. (b) 67 = 279,936. (c) There are two mutually ly all of your entire investment.
e­ xclusive and collectively exhaustive outcomes in (a) and six in (b). 5.10 (a) 0.40, 0.60. (b) 1.60, 0.98. (c) 4.0, 0.894. (d) 1.50, 0.866.
4.42 (5)(7)(4)(5) = 700. 5.12 (a) 0.2436. (b) 0.0176. (c) 0.3627. (d) m = 3.06, s = 1.2245.
4.44 5! = (5)(4)(3)(2)(1) = 120. Not all the orders are equally likely (e) That each American adult owns a tablet or does not own a tablet and
because the teams have a different probability of finishing first through that each person is independent of all other persons.
fifth. 5.14 (a) 0.7374. (b) 0.2281. (c) 0.9972. (d) 0.0028.
4.46 6! = 720. 5.16 (a) 0.7412. (b) 0.0009. (c) 0.9746. (d) m = 2.715, s = 0.5079.
4.48 210. (e) McDonald’s has a slightly higher probability of filling orders
­correctly.
4.50 = 4,950.
5.18 (a) 0.2565. (b) 0.1396. (c) 0.3033. (d) 0.0247.
4.62 (a)
5.20 (a) 0.0337. (b) 0.0067. (c) 0.9596. (d) 0.0404.
Generation
5.22 (a)
Prefer Hybrid Baby
P(X 6 5) = P(X = 0) + P(X = 1) + P(x = 2) + P(X = 3)
Advice Boomers Millennials Total
+ P(X = 4)
Yes 140 320   460
e - 6(6)0 e - 6(6)1 e - 6(6)2 e - 6(6)3 e - 6(6)4
No 360 180  540 = + + + +
0! 1! 2! 3! 4!
Total 500 500 1,000
= 0.002479 + 0.014873 + 0.044618 + 0.089235 + 0.133853
(b) Preferring hybrid investment advice; being a baby boomer and = 0.2851.
preferring hybrid investment advice. (c) 0.46. (d) 0.14. (e) They are not e-6(6)5
independent because baby boomers and millennials have different proba- (b) P(X = 5) = = 0.1606.
5!
bilities of preferring hybrid investment advice. (c) P(X Ú 5) = 1 - P(X 6 5) = 1 - 0.2851 = 0.7149.
4.64 (a) 82/276 = 0.2971. (b) 115/276 = 0.4167. (c) 142/276 = 0.5145. e-6(6)4 e-6(6)5
(d) P(X = 4 or X = 5) = P(X = 4) + P(X = 5) = +
(d) 32/276 = 0.1159. (e) 4/147 = 0.0272. 4! 5!
= 0 .2 9 4 5 .
4.66 (a) 125/386 = 0.3238. (b) 90/272 = 0.3309. (c) 35/114 = 0.3070.
5.24 (a) 0.2592. (b) 0.7408. (c) 0.3908.
(d) 111/386 = 0.2876. (e) 75/272 = 0.2757. (f) 36/114 = 0.3158.
(g) There is very little difference between B2B and B2C firms. 5.26 (a) 0.0302. (b) 0.1057. (c) 0.8641. (d) 0.1359.
772 Self-Test Solutions and Answers to Selected Even-Numbered Problems

5.28 (a) 0.3946. (b) 0.9321. (c) Because Ford had a higher mean rate of is approximately the same as the median. The range is much less than
problems per car than Toyota, the probability of a randomly selected Ford 6S, and the interquartile range approximately the same as 1.33S. (b) The
having zero problems and the probability of no more than two problems normal probability plot appears to be a straight liine indicating a normal
are both lower than for Toyota. distribution. The skewness statistic is 0.0834 The kurtosis is -0.4578,
indicating some departure from a normal distribution.
5.34 (a) 0.67. (b) 0.67. (c) 0.3325. (d) 0.0039. (e) The assumption of
independence may not be true. 6.18 (a) Mean = $1,979.49, median = $1,763, S = $900.5919,
range = $3,540, 6S = 6(900.5919) = $5,403.5514, interquartile
5.36 (a) 0.0287. (b) 0.5213.
range = $1,333, 1.33(900.5919) = 1,197.7872. The mean is greater
5.38 (a) 0.0060. (b) 0.2007. (c) 0.1662. (d) Mean = 4.0, standard than the median. The range is much less than 6S, and the interquartile
deviation = 1.5492. (e) Since the percentage of bills containing an error range is less than 1.33S. (b) The normal probability plot appears to be
is lower in this problem, the probability is higher in (a) and (b) of this right skewed. The skewness statistic is 0.6423. The kurtosis is -0.5014,
problem and lower in (c). ­indicating some departure from a normal distribution.

5.40 (a) 9.2. (b) 2.2289. (c) 0.1652. (d) 0.0461. (e) 0.9848. 6.20 (a) Interquartile range = 0.0025, S = 0.0017, range = 0.008,
1.33(S) = 0.0023, 6(S) = 0.0102. Because the interquartile range is
5.42 (a) 0.0000. (b) 0.0054. (c) 0.7604. (d) Based on the results in close to 1.33S and the range is also close to 6S, the data appear to be
(a)–(c), the probability that the Standard & Poor’s 500 Index will increase approximately normally distributed. (b) The normal probability plot
if there is an early gain in the first five trading days of the year is very ­suggests that the data appear to be approximately normally distributed.
likely to be close to 0.90 because that yields a probability of 76.04% that
at least 37 of the 42 years the Standard & Poor’s 500 Index will increase 6.22 (a) Five-number summary: 82 127 148.5 168 213; mean = 147.06,
the entire year. mode = 130, range = 131, interquartile range = 41, standard deviation
= 31.69. The mean is very close to the median. The five-number
5.44 (a) The assumptions needed are (i) the probability that a questionable  ­summary suggests that the distribution is approximately symmetric around
claim is referred by an investigator is constant, (ii) the probability that the median. The interquartile range is very close to 1.33S. The range is
a questionable claim is referred by an investigator approaches 0 as the about $50 below 6S. In general, the distribution of the data appears to
interval gets smaller, and (iii) the probability that a questionable claim is closely resemble a normal distribution. (b) The normal ­probability plot
referred by an investigator is independent from interval to interval. confirms that the data appear to be approximately normally distributed.
(b) 0.1277. (c) 0.9015. (d) 0.0985.
6.24 (a) 0.1667. (b) 0.1667. (c) 0.7083. (d) Mean = 60,
standard deviation = 34.641.
CHAPTER 6 6.26 (a) 0.0714. (b) 0.5000. (c) 0.7143. (d) Mean = 36,
6.2 (a) 0.9089. (b) 0.0911. (c) +1.96. (d) -1.00 and +1.00. standard deviation = 4.0415.

6.4 (a) 0.1401. (b) 0.4168. (c) 0.3918. (d) +1.00. 6.34 (a) 0.4772. (b) 0.9544. (c) 0.0456. (d) 1.8835. (e) 1.8710 and 2.1290.

6.6 (a) 0.9599. (b) 0.0228. (c) 43.42. (d) 46.64 and 53.36. 6.36 (a) 0.0228. (b) 0.1524. (c) $275.63. (d) $224.37 to $275.63. (e) 0.10.
(f) 0.30. (g) The uniform distribution results are much higher because
6.8 (a) P(34 6 X 6 50) = P(-1.33 6 Z 6 0) = 0.4082.
these values are close to the extremes of the range of possible values.
(b) P(X 6 30) + P(X 7 60) = P(Z 6 -1.67) + P(Z 7 0.83)
= 0.0475 + (1.0 - 0.7967) = 0.2508. (c) P(Z 6 -0.84) ≅ 0.20, 6.38 (a) Waiting time will more closely resemble an exponential
X - 50 ­distribution. (b) Seating time will more closely resemble a normal
Z = -0.84 = , X = 50 - 0.84(12) = 39.92 thousand miles, or
12 ­distribution. (c) Both the histogram and normal probability plot suggest
39,920 miles. (d) The smaller standard deviation makes the absolute that waiting time more closely resembles an exponential distribution.
Z values larger. (a) P(34 6 X 6 50) = P(-1.60 6 Z 6 0) = 0.4452. (d) Both the histogram and normal probability plot suggest that seating
(b) P(X 6 30) + P(X 7 60) = P(Z 6 -2.00) + P(Z 7 1.00) time more closely resembles a normal distribution.
= 0.0228 + (1.0 - 0.8413) = 0.1815. (c) X = 50 - 0.84(10) = 41.6
6.40 (a) 0.4602. (b) 0.3812. (c) 0.0808. (d) $5,009.46. (e) $5,156.01 and
thousand miles, or 41,600 miles.
6,723.99.
6.10 (a) 0.9878. (b) 0.8185. (c) 86.16%. (d) Option 1: Because your score
of 81% on this exam represents a Z score of 1.00, which is below the min-
imum Z score of 1.28, you will not earn an A grade on the exam under CHAPTER 7
this grading option. Option 2: Because your score of 68% on this exam
7.2  (a) Virtually 0. (b) 0.1587. (c) 0.0139. (d) 50.195.
represents a Z score of 2.00, which is well above the minimum Z score of
1.28, you will earn an A grade on the exam under this grading option. You 7.4  (a) Both means are equal to 6. This property is called unbiasedness.
should prefer Option 2. (c) The distribution for n = 3 has less variability. The larger sample size
has resulted in sample means being closer to m. (d) Same answer as in (c).
6.12 (a) 0.1587. (b) 0.0441. (c) 0.0228. (d) 882.6348.
7.6  (a) The probability that an individual energy bar has a weight below
6.14 With 39 values, the smallest of the standard normal quantile values
42.05 grams is 0.2743. (b) The probability that the mean of a sample of
covers an area under the normal curve of 0.025. The corresponding Z
four energy bars has a weight below 42.05 grams is 0.1151. (c) The prob-
value is -1.96. The middle (20th) value has a cumulative area of 0.50
ability that the mean of a sample of 25 energy bars has a weight below
and a corresponding Z value of 0.0. The largest of the standard normal
42.05 grams is 0.00135. (d) (a) refers to an individual energy bar while
quantile values covers an area under the normal curve of 0.975, and its
(c) refers to the mean of a sample of 25 energy bars. There is a 27.43%
corresponding Z value is +1.96.
chance that an individual energy bar will have a weight below 42.05
6.16 (a) Mean = 4.96, median = 4.94, S = 0.892, range = 3.69, 6S grams but only a chance of 0.135% that a mean of 25 energy bars will
= 5.352, interquartile range = 1.22, 1.33 (0.892) = 1.1864. The mean have a weight below 42.05 grams. (e) Increasing the sample size from
Self-Test Solutions and Answers to Selected Even-Numbered Problems 773

four to 25 reduced the probability the mean will have a weight below the population mean and population standard deviation, you don’t need a
42.05 grams from 11.51% to 0.135%. confidence interval estimate of the population mean because you already
know the mean.
7.8  (a) When n = 4, because the mean is larger than the median, the
distribution of the sales price of new houses is skewed to the right, and 8.8 Equation (8.1) assumes that you know the population standard devia-
so is the sampling distribution of X although it will be less skewed than tion. Because you are selecting a sample of 100 from the population, you
the population. (b) If you select samples of n = 100, the shape of the are computing a sample standard deviation, not the population standard
sampling distribution of the sample mean will be very close to a normal deviation.
distribution, with a mean of $370,800 and a standard error of the mean of
8.10  (a) X { Z # = 49,875 { 1.96 #
s 1,500
$9,000. (c) 0.4646. (d) 0.1047. ;
2n 264
7.10  (a) 0.8413. (b) 16.0364. (c) To be able to use the standardized 49,507.51 … m … 50,242.49
normal distribution as an approximation for the area under the curve, you (b) Yes, because the confidence interval includes 50,000 hours the man-
must assume that the population is approximately symmetrical. ufacturer can support a claim that the bulbs have a mean of 50,000 hours.
(d) 15.5182. (c) No.Because s is known and n = 64, from the Central Limit Theorem,
7.12  (a) 0.40. (b) 0.0704. you know that the sampling distribution of X is approximately normal.
(d) The confidence interval is narrower, based on a population
7.14  standard deviation of 500 hours rather than the original standard
p(1 - p) 0.501(1 - 0.501) 49,752.50 … m … 49,997.50. No, because the confidence interval does
(a) p = 0.501, sp = = = 0.05
A n A 100 not include 50,000 hours.
P(p 7 0.55) = P(Z 7 0.98) = 1.0 - 0.8365 = 0.1635.
8.12  (a) 2.2622. (b) 3.2498. (c) 2.0395. (d) 1.9977. (e) 1.7531.
p(1 - p) 0.6(1 - 0.6)
(b) p = 0.60, sp = = = 0.04899 8.14 -0.12 … m … 11.84, 2.00 … m … 6.00. The presence of the outlier
A n A 100
P(p 7 0.55) = P(Z 7 -1.021) = 1.0 - 0.1539 = 0.8461. increases the sample mean and greatly inflates the sample standard
deviation.
p(1 - p) 0.49(1 - 0.49)
(c) p = 0.49, sp = = = 0.05 8.16  (a) 87 { (1.9781)(9)/ 287; 85.46 … m … 88.54. (b) You can
A n A 100
P(p 7 0.55) = P(Z 7 1.20) = 1.0 - 0.8849 = 0.1151. be 95% confident that the population mean amount of one-time gift is
between $85.46 and $88.54.
(d) Increasing the sample size by a factor of 4 decreases the standard 
error by a factor of 2. 8.18  (a) 6.31 … m … 7.87. (b) You can be 95% confident that the popula-
(a) P(p 7 0.55) = P(Z 7 1.96) = 1.0 - 0.9750 = 0.0250. tion mean amount spent for lunch at a fast-food restaurant is between $6.31
and $7.87. (c) That the population distribution is normally distributed.
(b) P(p 7 0.55) = P(Z 7 -2.04) = 1.0 - 0.0207 = 0.9793.
(d) The assumption of normality is not seriously violated and with a sample
(c) P(p 7 0.55) = P(Z 7 2.40) = 1.0 - 0.9918 = 0.0082. of 15, the validity of the confidence interval is not seriously impacted.
7.16  (a) 0.8522. (b) 0.7045. (c) 0.1478. (d)   (a) 0.9820. (b) 0.9640. 8.20  (a) For 30-second ads: 4.64 … m … 5.16 For 60-second ads:
(c) 0.0180. 4.56 … m … 5.65. (b) You are 95% confident that the mean rating for
7.18  (a) 0.7676. (b) The probability is 90% that the sample percentage 30-second ads is between 4.56 and 5.16. You are 95% confident that the
will be contained between 0.2840 to 0.4000. (c) The probability is 95% mean rating for 60-second ads is between 4.64 and 5.65. (c) The confi-
that the sample percentage will be contained between 0.27 to 0.41. dence intervals for 30-second ads and 60-second ads are very similar.
(d) You need to assume that the distributions of the rating for 30-second
7.20  (a) 0.1098. (b) 0.0030. (c) Increasing the sample size by a factor of ads and 60-second ads are normally distributed. (e) The distribution of the
5 decreases the standard error by a factor of more than 2. The sampling 30-second ads is slightly right-skewed. With a sample of 40, the validity
distribution of the proportion becomes more concentrated around the true of the confidence interval is not in question. The distribution of the
proportion of 0.326 and, hence, the probability in (b) becomes smaller 60-second ads is slightly left-skewed. With a sample of 17, the validity of
than that in (a). the confidence interval is not seriously in question.
7.26  (a) 0.4999. (b) 0.00009. (c) 0. (d) 0. (e) 0.7518. 8.22  (a) 31.12 … m … 54.96. (b) The number of days is approximately 
normally distributed. (c) No, the outliers skew the data. (d) Because
7.28  (a) 0.8944. (b) 4.617; 4.783. (c) 4.641.
the sample size is fairly large, at n = 50, the use of the t distribution is
7.30  (a) 0.00023. (b) 0.0645. (c) 0.9332. appropriate.

8.24  (a) 25.90 … m … 33.45. (b) That the population distribution is


­normally distributed. (c) The boxplot and the skewness and kurtosis
CHAPTER 8 statistics indicate a right skewed distribution. However, the validity of the
8.2 114.68 … m … 135.32. results should not be greatly affected.

8.4 Yes, it is true because 5% of intervals will not include the population 8.26 0.19 … p … 0.31.
mean. 8.28  (a)
8.6  (a) You would compute the mean first because you need the mean to X 135 p(1 - p) 0.27(0.73)
p = = = 0.27, p { Z = 0.27 { 2.58 ;
compute the standard deviation. If you had a sample, you would compute n 500 A n A 500
the sample mean. If you had the population mean, you would compute the 0.2189 … p … 0.3211. (b) The manager in charge of promotional
population standard deviation. (b) If you have a sample, you are comput- ­programs can infer that the proportion of households that would upgrade
ing the sample standard deviation, not the population standard deviation to an improved cellphone if it were made available at a substantially
needed in Equation (8.1). If you have a population and have computed reduced cost is somewhere between 0.22 and 0.32, with 99% confidence.
774 Self-Test Solutions and Answers to Selected Even-Numbered Problems

8.30  (a) 0.2328 … p … 0.2872. (b) No, you cannot because the interval CHAPTER 9
estimate includes 0.25 (25%). (c) 0.2514 … p … 0.2686. Yes, you can,
because the interval is above 0.25 (25%). (d) The larger the sample size, 9.2 Because ZSTAT = +2.21 7 1.96, reject H0.
the narrower the confidence interval, holding everything else constant. 9.4 Reject H0 if ZSTAT 6 -2.58 or if ZSTAT 7 2.58.
8.32  (a) 0.8632 … p … 0.8822. (b) 0.1770 … p … 0.2007. (c) Because 9.6 p@value = 0.0456.
almost 90% of adults have purchased something online, but only about
20% are weekly online shoppers, the director of e-commerce sales may 9.8 p@value = 0.1676.
want to focus on those adults who are weekly online shoppers.
9.10 H0: Defendant is guilty; H1: Defendant is innocent. A Type I error
8.34 n = 35. would be not convicting a guilty person. A Type II error would be con-
victing an innocent person.
8.36 n = 1,041.
Z 2s2 (1.96)2(400)2 9.12 H0: m = 20 minutes. 20 minutes is adequate travel time between
8.38  (a) n = 2
= = 245.86. Use n = 246. classes. H1: m ∙ 20 minutes. 20 minutes is not adequate travel time
e 502
between classes.
Z 2s2 (1.96)2(400)2
(b) n = 2
= = 983.41. Use n = 984. 49,875 - 50,000
e 252 9.14 (a) ZSTAT = = -0.6667. Because
1,000
8.40 n = 55.
264
8.42  (a) n = 107. (b) n = 62. -1.96 6 ZSTAT = -0.6667 6 1.96, do not reject H0. (b) p@value
8.44  (a) n = 246. (b) n = 385. (c) n = 554. (d) When there is more = 0.5050. (c) 49,507.51 … m … 50,242.49. (d) The conclusions are
variability in the population, a larger sample is needed to accurately the same.
estimate the mean. 9.16 (a) Because -2.58 6 ZSTAT = -1.7678 6 2.58, do not reject H0.
8.46  (a) 6209 … p … 0.7878. (b) 0.5015 … p … 0.6812. (b) p@value = 0.0771. (c) 0.9877 … m … 1.0023. (d) The conclusions
(c) 0.0759 … p … 0.2024. (d) (a) n = 2,017, (b) n = 2,324, are the same.
(c) n = 1,157. 9.18 tSTAT = 2.00.
8.48  (a) If you conducted a follow-up study, you would use p = 0.38 9.20 { 2.1315.
in the sample size formula because it is based on past information on the
proportion. (b) n = 1,006. 9.22 No, you should not use a t test because the original population is left-
skewed, and the sample size is not large enough for the t test to be valid.
8.54  (a)  PC/laptop: 0.8173 … p … 0.8628.
9.24 (a) tSTAT = (3.57 - 3.70)>(0.8> 264) = -1.30. Because
Smartphone: 0.8923 … p … 0.9277. -1.9983 6 tSTAT = -1.30 6 1.9983 and p@value = 0.1984 7 0.05,
Tablet: 0.4690 … p … 0.5310. do not reject H0. There is insufficient evidence that the population mean
waiting time is different from 3.7 minutes. (b) Because n = 64, the
Smart watch: 0.0814 … p … 0.1186. sampling distribution of the t test statistic is approximately normal. In
(b) Most adults have a PC/laptop and a smartphone. Some adults general, the t test is appropriate for this sample size except for the case
have a tablet computer and very few have a smart watch. where the population is extremely skewed or bimodal.

8.56  (a) 49.88 … m … 52.12. (b) 0.6760 … p … 0.9240. (c) n = 25. 9.26 (a) -1.9842 6 tSTAT = 1.25 6 1.9842, do not reject H0. There is
(d) n = 267. (e) If a single sample were to be selected for both purposes, insufficient evidence that the population mean spent by Amazon Prime
the larger of the two sample sizes (n = 267) should be used. customers is different from $1,475. (b) p@value = 0.2142 7 0.05. The
probability of getting a tSTAT statistic greater than +1.25 or less than
8.58  (a) 3.19 … m … 9.21. (b) 0.3242 … p … 0.7158. (c) n = 110. -1.25, given that the null hypothesis is true, is 0.2142.
(d) n = 121. (e) If a single sample were to be selected for both purposes,
the larger of the two sample sizes (n = 121) should be used. 9.28 (a) Because -2.1448 6 tSTAT = 1.6344 6 2.1448, do not reject H0.
There is not enough evidence to conclude that the mean amount spent for
8.60  (a) 0.2562 … p … 0.3638. (b) 3.22 … m … $3.78. lunch at a fast-food restaurant, is different from $6.50. (b) The p-value
(c) $17,581.68 … m … $18,418.32. is 0.1245. If the population mean is $6.50, the probability of observing a
sample of fifteen customers that will result in a sample mean farther away
8.62  (a) $36.66 … m … $40.42. (b) 0.2027 … p … 0.3973. (c) n = 110.
from the hypothesized value than this sample is 0.1245. (c) The distribu-
(d) n = 423. (e) If a single sample were to be selected for both purposes, the
tion of the amount spent is normally distributed. (d) With a sample size
larger of the two sample sizes (n = 423) should be used.
of 15, it is difficult to evaluate the assumption of normality. However, the
8.64  (a) 0.4643 … p … 0.6690. (b) $136.28 … m … $502.21. distribution may be fairly symmetric because the mean and the median
are close in value. Also, the boxplot appears only slightly skewed so the
8.66  (a) 13.40 … m … 16.56. (b) With 95% confidence, the population
normality assumption does not appear to be seriously violated.
mean answer time is somewhere between 13.40 and 16.56 seconds.
(c) The assumption is valid as the answer time is approximately normally 9.30 (a) Because -2.0096 6 tSTAT = 0.114 6 2.0096, do not reject H0.
distributed. There is no evidence that the mean amount is different from 2 liters.
(b) p@value = 0.9095. (d) Yes, the data appear to have met the normality
8.68  (a) 0.2425 … m … 0.2856. (b) 0.1975 … m … 0.2385. (c) The
assumption. (e) The amount of fill is decreasing over time so the values
amounts of granule loss for both brands are skewed to the right, but the
are not independent. Therefore, the t test is invalid.
sample sizes are large enough. (d) Because the two confidence intervals
do not overlap, it appears that the mean granule loss of Boston shingles is 9.32 (a) Because tSTAT = -5.9355 6 -2.0106, reject H0. There is
higher than that of Vermont shingles. enough evidence to conclude that mean widths of the troughs is different
Self-Test Solutions and Answers to Selected Even-Numbered Problems 775

from 8.46 inches. (b) The population distribution is normal. (c) Although Z scores, the probability of committing a Type I error will increase. Many
the distribution of the widths is left-skewed, the large sample size means more of the firms will be predicted to go bankrupt than will go bankrupt.
that the validity of the t test is not seriously affected. The large sample On the other hand, the revised model that results in more moderate or
size allows you to use the t distribution. large Z scores will lower the probability of committing a Type II error
because few firms will be predicted to go bankrupt than will actually go
9.34 (a) Because -2.68 6 tSTAT = 0.094 6 2.68, do not reject H0.
bankrupt.
There is no evidence that the mean amount is different from 5.5 grams.
(b) 5.462 … m … 5.542. (c) The conclusions are the same. 9.72 (a) Because tSTAT = 3.3197 7 2.0010, reject H0. (b) p@value =
0.0015. (c) Because ZSTAT = 0.2582 6 1.645, do not reject H0.
9.36 p@value = 0.0228.
(d) Because -2.0010 6 tSTAT = -1.1066 6 2.0010, do not reject H0.
9.38 p@value = 0.0838. (e) Because ZSTAT = 2.3238 7 1.645, reject H0.

9.40 p@value = 0.9162. 9.74 (a) Because tSTAT = -1.69 7 -1.7613, do not reject H0. (b) The
data are from a population that is normally distributed. (d) With the
9.42 2.7638. exception of one extreme value, the data are approximately normally
9.44 -2.5280. distributed. (e) There is insufficient evidence to state that the waiting time
is less than five minutes.
9.46 (a) tSTAT = 2.6880 7 1.6694, reject H0. There is evidence that the
population mean bus miles is greater than 8,000 miles. (b) p@value = 9.76 (a) Because tSTAT = -1.47 7 -1.6896, do not reject H0.
0.0046 6 0.05. The probability of getting a tSTAT statistic greater than (b) p@value = 0.0748. If the null hypothesis is true, the probability
2.6880 given that the null hypothesis is true, is 0.0046. of obtaining a tSTAT of -1.47 or more extreme is 0.0748. (c) Because
tSTAT = -3.10 6 -1.6973, reject H0. (d) p@value = 0.0021. If the null
9.48 (a) tSTAT = (24.05 - 30)>(16.5> 2860) = -10.5750. Because hypothesis is true, the probability of obtaining a tSTAT of -3.10 or more
tSTAT = -10.5750 6 -2.3307, reject H0. p@value = 0.0000 6 0.01, extreme is 0.0021. (e) The data in the population are assumed to be
reject H0. (b) The probability of getting a sample mean of 24 minutes or normally distributed. (g) Both boxplots suggest that the data are skewed
less if the population mean is 30 minutes is 0.000. slightly to the right, more so for the Boston shingles. However, the very
large sample sizes mean that the results of the t test are relatively insensi-
9.50 (a) tSTAT = 1.9221 6 2.3549, do not reject H0. There is insufficient
tive to the departure from normality.
evidence that the population mean one-time gift donation is greater than
$85.50. (b) The probability of getting a sample mean of $87 or more if 9.78 (a) tSTAT = -3.2912, reject H0. (b) p@value = 0.0012. The proba-
the population mean is $85.50 is 0.0284. bility of getting a tSTAT value below -3.2912 or above +3.2912 is 0.0012.
(c) tSTAT = -7.9075, reject H0. (d) p@value = 0.0000. The probability of
9.52 p = 0.22.
getting a tSTAT value below -7.9075 or above +7.9075 is 0.0000.
9.54 Do not reject H0. (e) Because of the large sample sizes, you do not need to be concerned
with the normality assumption.
9.56 (a) ZSTAT = 0.7200, p@value = 0.2358. Because ZSTAT =
0.7200 6 1.645 or p@value = 0.2358 7 0.05, do not reject H0. There is
no evidence to show that more than 56.43% of students at your university CHAPTER 10
use the Chrome web browser. (b) ZSTAT = 1.7636, p@value = 0.0389.
Because ZSTAT = 1.7636 7 1.645, or p@value = 0.0389 6 0.05, reject 10.2 (a) t = 3.8959. (b) df = 21. (c) 2.5177. (d) Because tSTAT =
H0. There is evidence to show that more than 56.43% of students at your tSTAT = 3.8959 7 2.5177, reject H0.
university use the Chrome web browser. (c) The sample size had a major
10.4 3.73 … m1 - m2 … 12.27.
effect on being able to reject the null hypothesis. (d) You would be very
unlikely to reject the null hypothesis with a sample of 20. 10.6 Because tSTAT = 2.6762 6 2.9979 or p@value = 0.0158 7 0.01,
do not reject H0. There is no evidence that the mean of population one is
9.58 H0: p = 0.60; H1: p ∙ 0.60. Decision rule: If ZSTAT 7 1.96 or
greater than the mean of population 2.
ZSTAT 6 -1.96, reject H0.
464 10.8 (a) Because tSTAT = 2.8990 7 1.6620 or p@value = 0.0024 6 0.05,
p = = 0.6600 reject H0. There is evidence that the mean amount of Walker Crisps eaten
703
by children who watched a commercial featuring a long-standing sports
Test statistic:
celebrity endorser is higher than for those who watched a commercial for
p - p 0.6600 - 0.60 an alternative food snack. (b) 3.4616 … m1 - m2 … 18.5384. (c) The
ZSTAT = = = 3.2488.
p(1 - p) 0.60(1 - 0.60) results cannot be compared because (a) is a one-tail test and (b) is a
A n A 703 confidence interval that is comparable only to the results of a two-tail test.
Because ZSTAT = 3.2488 7 1.96 or p@value = 0.0012 6 0.05, reject (d) You would choose the commercial featuring a long-standing celebrity
H0 and conclude that there is evidence that the proportion of all talent endorser.
acquisition professionals who report competition is the biggest obstacle to 10.10 (a) H0: m1 = m2, where Populations: 1 = Southeast, 2 =
attracting the best talent at their company is different from 60%. Gulf Coast. H1: m1 ∙ m2. Decision rule: df = 33. If tSTAT 6 -2.0484
9.60 (a) H0: p Ú 0.294. H1: p 6 0.294. or tSTAT 7 2.0484, reject H0.
(b) ZSTAT = -0.5268 7 -1.645; p@value = 0.2992. Because Test statistic:
ZSTAT = -0.5268 7 -1.645 or p@value = 0.2992 7 0.05, do not reject
(n1 - 1)(S21) + (n2 - 1)(S22)
H0. There is insufficient evidence that the percentage is less than 29.4%. S2p =
(n1 - 1) + (n2 - 1)
9.70 (a) Concluding that a firm will go bankrupt when it will not.
(b) Concluding that a firm will not go bankrupt when it will go bankrupt. (16)(37.35632) + (17)(47.029012)
= = 1,828.6631
(c) Type I. (d) If the revised model results in more moderate or large 10 + 18
776 Self-Test Solutions and Answers to Selected Even-Numbered Problems

(X1 - X2) - (m1 - m2) 10.26 (a) Because tSTAT = -9.3721 6 -2.4258, reject H0. There is
tSTAT = evidence that the mean strength is lower at two days than at seven days.
1 1 (b) The population of differences in strength is approximately normally
S2p a + b
B n1 n2 distributed. (c) p = 0.000.
(36.3529 - 33.3333) - 0 10.28 (a) Because -2.58 … ZSTAT = -0.58 … 2.58, do not reject H0. (b)
= = 0.2088.
1 1 -0.273 … p1 - p2 … 0.173.
1,828.6631a + b
B 17 18 10.30 (a) H0: p1 … p2. H1: p1 7 p2. Populations: 1 = VOD D4 +
Decision: Because -2.0345 6 tSTAT = 0.2088 6 2.0345, do not reject 2 = general TV. (b) Because ZSTAT = 8.9045 7 1.6449 or
H0. There is not enough evidence to conclude that the mean number of p@value = 0.0000 6 0.05, do not reject H0. There is evidence to con-
partners between the Southeast and Gulf Coast is different. clude that the population proportion of those who viewed the brand on
(b) p@value = 0.83589. (c) In order to use the pooled-variance t test, you VOD D4 were more likely to visit the brand website. (c) Yes, the result in
need to assume that the populations are normally distributed with equal (b) makes it appropriate to claim that the population proportion of those
variances. who viewed the brand on VOD D4 were more likely to visit the brand
website than those who viewed the brand on general TV.
10.12 (a) Because tSTAT = -4.1343 6 -2.0484, reject H0.
(b) p@value = 0.0003. (c) The populations of waiting times are approxi- 10.32 (a) H0: p1 = p2. H1: p1 ∙ p2. Decision rule: If ∙ ZSTAT ∙ 7 2.58,
mately normally distributed. (d) -4.2292 … m1 - m2 … -1.4268. reject H0.
X1 + X2 326 + 167
10.14 (a) Because tSTAT = 2.7349 7 2.0484, reject H0. There is evidence Test statistic: p = = = 0.8016
n1 + n2 423 + 192
of a difference in the mean time to start a business between developed
(p1 - p2) - (p2 - p2) (0.7707 - 0.8698) - 0
and emerging countries. (b) p@value = 0.0107. The probability that two ZSTAT = = .
samples have a mean difference of 14.62 or more is 0.0107 if there is no 1 1 1 1
difference in the mean time to start a business between developed and p(1 - p)a + b 0.8016(1 - 0.8016)a + b
B n1 n2 B 423 192
emerging countries. (c) You need to assume that the population distri-
bution of the time to start a business of both developed and emerging ZSTAT = -2.8516 6 -2.58, reject H0. There is evidence of a difference
countries is normally distributed. (d) 3.6700 … m1 - m2 … 25.5700. in the proportion of organizations with recognition programs between
organizations that have between 500 and 2,499 employees and organizations
10.16 (a) Because tSTAT = -2.1554 6 -2.0017 or p@value = 0.03535 that have 2,500+ employees (b) p@value = 0.0043. The probability of
6 0.05, reject H0. There is evidence of a difference in the mean time per obtaining a difference in proportions that gives rise to a test statistic below
day accessing the Internet via a mobile device between males and fe- -2.8516 or above +2.8516 is 0.0043 if there is no difference in the propor-
males. (b) You must assume that each of the two independent populations tion based on the size of the organization.
is normally distributed. (c) -0.1809 … (p1 - p2) … -0.0173. You are 99% confident that the
difference in the proportion based on the size of the organization is between
10.18 df = 19.
1.73% and 18.09%.
10.20 (a) tSTAT = (-1.5566)/(1.424/ 29) = -3.2772. Because
10.34 (a) Because ZSTAT = 4.4662 7 1.96, reject H0. There is evidence
tSTAT = -3.2772 6 -2.306 or p@value = 0.0112 6 0.05, reject H0.
of a difference in the proportion of co-browsing organizations and
There is enough evidence of a difference in the mean summated ratings
non-co-browsing organizations that use skills-based routing to match
between the two brands. (b) You must assume that the distribution of the
the caller with the right agent. (b) p@value = 0.0000. The probability
differences between the two ratings is approximately normal.
of obtaining a difference in proportions that is 0.2586 or more in either
(c) p@value = 0.0112. The probability of obtaining a mean d­ ifference in
direction is 0.0000 if there is no difference between the proportion of
ratings that results in a test statistic that deviates from 0 by 3.2772 or more
co-browsing organizations and non-co-browsing organizations that use
in either direction is 0.0112 if there is no difference in the mean summated
skills-based routing to match the caller with the right agent.
ratings between the two brands. (d) -2.6501 … mD … -0.4610. You are
95% confident that the mean ­difference in summated ratings between brand 10.36 (a) 2.20. (b) 2.57. (c) 3.50.
A and brand B is somewhere between -2.6501 and -0.4610.
10.38 (a) Population B: S2 = 25. (b) 1.5625.
10.22 (a) Because tSTAT = -6.9984 6 2.0423 reject H0. There is
­evidence to conclude that the mean download speed at AT&T is lower 10.40 dfnumerator = 24, dfdenominator = 24.
than at Verizon Wireless. (b) You must assume that the distribution of 10.42 Because FSTAT = 1.2109 6 2.27, do not reject H0.
the differences between the ratings is approximately normal. (d) The
­confidence interval is from -5.2767 to -4.7511. 10.44 (a) Because FSTAT = 1.2995 6 3.18, do not reject H0. (b) Because
FSTAT = 1.2995 6 2.62, do not reject H0.
10.24 (a) Because tSTAT = 1.8425 6 1.943, do not reject H0. There is
not enough evidence to conclude that the mean bone marrow microvessel 10.46 (a) H0:s21 = s22. H1:s21 ∙ s22.
density is higher before the stem cell transplant than after the stem cell Decision rule: If FSTAT 7 2.7380, reject H0.
transplant. (b) p@value = 0.0575. The probability that the t statistic for (2,236.3529)2
S21
the mean difference in microvessel density is 1.8425 or more is 5.75% Test statistic: FSTAT = 2
= = 1.6026.
S2 (1,395.4926)2
if the mean density is not higher before the stem cell transplant than
after the stem cell transplant. (c) -28.26 … mD … 200.55. You are 95% Decision: Because FSTAT = 1.6026 6 2.7380, do not reject H0. There
confident that the mean difference in bone marrow microvessel density is insufficient evidence to conclude that the two population variances
before and after the stem cell transplant is somewhere between -28.26 are different. (b) p@value = 0.3516. (c) The test assumes that each of
and 200.55. (d) That the distribution of the difference before and after the the two populations is normally distributed. (d) Based on (a) and (b), a
stem cell transplant is normally distributed. pooled-variance t test should be used.
Self-Test Solutions and Answers to Selected Even-Numbered Problems 777

10.48 (a) Because FSTAT = 1.3805 6 2.1914 or p@value = 0.4102 to Computer students to write a VB.NET program is more than 10 minutes.
7 0.05, do not reject H0. There is insufficient evidence of a difference in the As illustrated in (d), in which there is not enough evidence to conclude that
variability of the scores between the two types of ads. (b) p@value = 0.4102. the population variances are different for the Introduction to Computers stu-
The probability of obtaining a sample that yields a test statistic more extreme dents and computer majors, the pooled-variance t test performed is a valid
than 1.3805 is 0.4102 if there is no difference in the two population variances. test to determine whether computer majors can write a VB.NET program
(c) The test assumes that each of the two populations are normally distributed. in less time than introductory students, assuming that the distribution of
The boxplot for 60-second ads appears slightly left skewed and the box plot the time needed to write a VB.NET program for both the Introduction to
for 30-second ads appears slightly right skewed. (d) Based on (a) and (b), a Computers students and the computer majors are approximately normally
pooled-variance t test should be used. distributed.

10.50 (a) Because FSTAT = 69.50001 7 1.9811 or p@value = 0.0000 10.64 From the boxplot and the summary statistics, both distributions
6 0.05, reject H0. There is evidence of a difference in the variance of are approximately normally distributed. FSTAT = 1.056 6 1.89. There is in-
the delay times between the two drivers. (b) You assume that the delay sufficient evidence to conclude that the two population variances are signif-
times are normally distributed. (c) From the boxplot and the normal icantly different at the 5% level of significance. tSTAT = -5.084 6 -1.99.
probability plots, the delay times appear to be approximately normally At the 5% level of significance, there is ­sufficient evidence to reject the null
distributed. (d) Because there is a difference in the variance of the delay hypothesis of no difference in the mean life of the bulbs between the two
times between the two drivers, you should use the separate variance t-test manufacturers. You can conclude that there is a significant difference in the
to determine whether there is evidence of a difference in the mean delay mean life of the bulbs between the two manufacturers.
time between the two drivers.
10.66 (a) Because ZSTAT = 3.6911 7 1.96, reject H0. There is enough
10.58 (a) Because FSTAT = 1.3559 6 1.6409, or p@value = 0.2277 evidence to conclude that there is a difference in the proportion of men
7 0.05, do not reject H0. There is not enough evidence of a difference in and women who order dessert. (b) Because ZSTAT = 6.0873 7 1.96,
the variance of the salary of Black Belts and Green Belts. (b) The ­reject H0. There is enough evidence to conclude that there is a differ-
pooled-variance t test. (c) Because tSTAT = 3.9742 7 1.6554 or ence in the proportion of people who order dessert based on whether
p@value = 0.0001 6 0.05, reject H0. There is evidence that the mean they ordered a beef entree.
salary of Black Belts is greater than the mean salary of Green Belts.
10.68 The normal probability plots suggest that the two populations
10.60 (a) Because FSTAT = 1.3611 7 1.6854, do not reject H0. There are not normally distributed. An F test is inappropriate for testing
is insufficient evidence to conclude that there is a difference between the the difference in the two variances. The sample variances for Boston
variances in the online time per week between women and men. (b) It is and Vermont shingles are 0.0203 and 0.015, respectively. Because
more appropriate to use a pooled-variance t test. Using the pooled-variance tSTAT = 3.015 7 1.967 or p@value = 0.0028 6 a = 0.05, reject H0.
t test, because tSTAT = -9.7619 6 -2.0609, reject H0. There is evidence There is sufficient evidence to conclude that there is a difference in the
of a difference in the mean online time per week between women and men. mean granule loss of Boston and Vermont shingles.
(c) Because FSTAT = 1.7778 7 1.6854, reject H0. There is evidence to
conclude that there is a difference between the variances in the time spent
playing games between women and men. (d) Using the separate-variance CHAPTER 11
t test, because tSTAT = -.26.4 6 -2.603, reject H0. There is evidence of a
11.2 (a) SSW = 150. (b) MSA = 15. (c) MSW = 5. (d) FSTAT = 3.
difference in the mean time spent playing game. between women and men.
11.4 (a) 2. (b) 18. (c) 20.
10.62 (a) Because tSTAT = 3.3282 7 1.8595, or the p-value = 0.0052 6
0.05 reject H0. There is enough evidence to conclude that the introducto- 11.6 (a) Reject H0 if FSTAT 7 2.95; otherwise, do not reject H0.
ry computer students required more than a mean of 10 minutes to write (b) Because FSTAT = 4 7 2.95, reject H0. (c) The table does not have
and run a program in VB.NET (b) Because tSTAT = 1.3636 6 1.8595, 28 degrees of freedom in the denominator, so use the next larger critical
do not reject H0. There is not enough evidence to conclude that the value, Qa = 3.90. (d) Critical range = 6.166.
introductory computer students required more than a mean of 10 min-
11.8 (a) H0:mA = mB = mC = mD and H1: At least one mean is different.
utes to write and run a program in VB.NET (c) Although the mean time
necessary to complete the assignment increased from 12 to 16 minutes SSA 1,151,016.4750
as a result of the increase in one data value, the standard deviation went MSA = = = 383,672.1583.
c - 1 3
from 1.8 to 13.2, which reduced the value of t statistic. (d) Because
FSTAT = 1.2308 6 3.8549, do not reject H0. There is not enough evidence SSW 2,961,835.3000
MSW = = = 82,273.2028.
to conclude that the population variances are different for the Introduction n - c 36
to Computers students and computer majors. Hence, the pooled-variance
t test is a valid test to determine whether computer majors can write a MSA 383,672.1583
FSTAT = = = 4.6634.
VB.NET program in less time than introductory students, assuming that the MSW 82,273.2028
distributions of the time needed to write a VB.NET program for both the
Because the p-value is 0.0075 and FSTAT = 5.7121 7 4.6634, reject H0.
Introduction to Computers students and the computer majors are approxi-
There is sufficient evidence of a difference in the mean import cost across
mately normally distributed. Because tSTAT = 4.0666 7 1.7341, reject H0.
There is enough evidence that the mean time is higher for Introduction to MSW 1 1
the four global regions. (b) Critical range = Qa a + b
Computers students than for computer majors. (e) p@value = 0.0052. If B 2 nj nj′
the true population mean amount of time needed for Introduction to Com-
puter students to write a VB.NET program is no more than 10 minutes, the 82,273.2028 1 1
= 3.81 a + b = 90.7046.
probability of observing a sample mean greater than the 12 minutes in the B 2 10 10
current sample is 0.0362%. Hence, at a 5% level of significance, you can From the Tukey-Kramer procedure, there is a difference in the mean
conclude that the population mean amount of time needed for ­Introduction import cost among the East Asia and Pacific region, Latin America and
778 Self-Test Solutions and Answers to Selected Even-Numbered Problems

the Caribbean, Eastern Europe and Central Asia, and Latin American and shows that the density seems higher with a 3 mm die diameter at 155°C
Caribbean. None of the other regions are different. (c) ANOVA output for but that there is little difference in density with a 4 mm die diameter. This
Levene’s test for homogeneity of variance: interaction is not significant at the 0.05 level of significance.
SSA 191890.4750
MSA = = = 63,630.1583 11.24 (a) H0: There is no interaction between filling time and mold
c - 1 3
temperature. H1: There is an interaction between filling time and mold
SSW 1,469,223.4 temperature.
MSW = = = 40,811.7611
n - c 36
0.1136
MSA 63,630.1583 Because FSTAT = = 2.27 6 2.9277 or the p@value =
FSTAT = = = 1.5591 0.05
MSW 40,811.7611 0.1018 7 0.05, do not reject H0. There is insufficient evidence of
Because p@value = 0.2161 7 0.05 and FSTAT = 1.5591 6 2.8663, do interaction between filling time and mold temperature.(b) FStat =
not reject H0. There is insufficient evidence to conclude that the variances 9.0222 7 3.5546, reject H0. There is evidence of a difference in the
in the import cost are different. (d) From the results in (a) and (b), the ­warpage due to the filling time. (c) FStat = 4.2305 7 3.5546, reject H0.
mean import cost for the East Asia and Pacific region and eastern Europe There is evidence of a difference in the warpage due to the mold
and Central Asia is lower than for Latin America and the Caribbean. ­temperature. (e) The warpage for a three-second filling time seems to be
much higher at 60°C and 72.5°C but not at 85°C.
11.10 (a) Because FSTAT = 12.56 7 2.76, reject H0. (b) Critical
range = 4.67. Advertisements A and B are different from Advertisements 11.26 (a) FSTAT = 0.8325, p@value = 0.3725 7 0.05, do not ­reject H0.
C and D. Advertisement E is only different from Advertisement D. There is not enough evidence to conclude that there is an ­interaction
(c) Because FSTAT = 1.927 6 2.76, do not reject H0. There is no evi- between zone lower and zone 3 upper. (b) FSTAT = .3820, p-value
dence of a significant difference in the variation in the ratings among is 0.5481 7 0.05, do not reject H0. There is insufficient ­evidence to
the five advertisements. (d) The advertisements underselling the pen’s ­conclude that there is an effect due to zone 1 lower. (c) FSTAT = 0.1048,
characteristics had the highest mean ratings, and the advertisements over- p@value = 0.7517 7 0.05, do not reject H0. There is inadequate
selling the pen’s characteristics had the lowest mean ratings. Therefore, ­evidence to conclude that there is an effect due to zone 3 upper.
use an advertisement that undersells the pen’s characteristics and avoid (d) A large difference at a zone 3 upper of 695°C but only a small dif-
advertisements that oversell the pen’s characteristics. ference at zone 3 upper of 715°C. (e) Because this difference ­appeared
on the cell means plot but the interaction was not statistically signif-
11.12] (a)
icant because of the large MSE, further testing should be done with
Degrees of larger sample sizes.
Source Freedom Sum of Squares Mean Squares F
Among groups  2 12,463,043,330 6,231,521,665 2.784 11.36 (a) Because FSTAT = 0.0111 6 2.9011, do not reject H0.
(b) Because FSTAT = 0.8096 6 4.1491, do not reject H0. (c) Because
Within groups 46 102,945,347,500 2,237,942,337
FSTAT = 5.1999 7 2.9011, reject H0. (e) Critical range = 3.56. Only
Total 48 115,408,390,800 the means of Suppliers 1 and 2 are different. You can conclude that the
mean strength is lower for Supplier 1 than for Supplier 2, but there are no
(b) Because FSTAT = 2.784 6 3.23, do not reject H0. There is insufficient
­statistically significant differences between Suppliers 1 and 3, Suppliers
evidence of a difference in the mean brand value of the different groups.
1 and 4, Suppliers 2 and 3, Suppliers 2 and 4, and Suppliers 3 and 4.
(c) Because there was no significant difference among the
(f) FSTAT = 5.6998 7 2.8663 (p@value = 0.0027 6 0.05).
groups, none of the critical ranges were significant.
There is evidence that the mean strength of suppliers is different.
11.14 (a) Because FSTAT = 6.2275 7 2.8663; p@value = 0.0016 6 0.05, Critical range = 3.359. Supplier 1 has a mean strength that is less than
reject H0. (b) Critical range = 9.5447 (using 36 degrees of freedom and suppliers 2 and 3.
interpolating). Asia is different from North America and South America.
(c) The assumptions are that the samples are randomly and independently 11.38 (a) Because FSTAT = 0.075 6 3.68, do not reject H0. (b) Because
selected (or randomly assigned), the original populations of congestion FSTAT = 4.09 7 3.68, reject H0. (c) Critical range = 1.489. Breaking
are approximately normally distributed, and the variances are equal. strength is significantly different between 30 and 50 psi.
(d) Because FSTAT = 1.5190 6 2.8663; p@value = 0.2263 7 0.05, do
11.40 (a) Because FSTAT = 0.1899 6 4.1132, do not reject H0.
not reject H0. There is insufficient evidence of a difference in the variation
There is insufficient evidence to conclude that there is any inter-
in the mean congestion level among the continents.
action between type of breakfast and desired time. (b) Because
11.16 (a) 40. (b) 60 and 55. (c) 10. (d) 10. FSTAT = 30.4434 7 4.1132, reject H0. There is sufficient evidence to
conclude that there is an effect due to type of breakfast. (c) Because
11.18 (a) Because FSTAT = 6.00 7 3.35, reject H0. (b) Because FSTAT = 12.4441 7 4.1132, reject H0. There is sufficient evidence to
FSTAT = 5.50 7 3.35, reject H0. (c) Because FSTAT = 1.00 6 2.73, conclude that there is an effect due to desired time. (e) At the 5% level of
do not reject H0. significance, both the type of breakfast ordered and the desired time have
11.20 dfB = 4, dfTOTAL = 44, SSA = 160, SSAB = 80, SSE = 150, an effect on delivery time difference. There is no interaction between the
SST = 610, MSB = 55, MSE = 5. For A: FSTAT = 16. type of breakfast ordered and the desired time.
For B: FSTAT = 11. For AB: FSTAT = 2. (a) Because
11.42 Interaction: FSTAT = 0.2169 6 3.9668 or p@value =
FSTAT = 16 7 3.32, reject H0. Factor A is significant. (b) Because
0.6428 7 0.05. There is insufficient evidence of an interaction between
FSTAT = 11 7 2.69, reject H0. Factor B is significant. (c) Because
piece size and fill height. Piece size: FSTAT = 842.2242 7 3.9668 or
FSTAT = 2.0 6 2.27, do not reject H0. The AB interaction is not significant.
p@value = 0.0000 6 0.05. There is evidence of an effect due to piece
11.22 (a) Because FSTAT = 3.4032 6 4.3512, do not reject H0. size. The fine piece size has a lower difference in coded weight. Fill
(b) Because FSTAT = 1.8496 6 4.3512, do not reject H0. (c) Because height: FSTAT = 217.0816 7 3.9668 or p@value = 0.0000 6 0.05. There
FSTAT = 9.4549 7 4.3512 reject H0. (e) Die diameter has a significant is evidence of an effect due to fill height. The low fill height has a lower
effect on density, but die temperature does not. However, the cell means plot difference in coded weight.
Self-Test Solutions and Answers to Selected Even-Numbered Problems 779

CHAPTER 12 Expected Frequencies


12.2 (a) For df = 1 and a = 0.05, x2a = 3.841. (b) For df = 1 and Global Region
a = 0.025, x2 = 5.024. (c) For df = 1 and a = 0.01, x2a = 6.635. Investing? NA E A Total
12.4 (a) All fe = 25. (b) Because x2STAT = 4.00 7 3.841, reject H0. Yes  56.6667  56.6667  56.6667 170
No 143.3333 143.3333 143.3333 430
12.6 (a) H0:p1 = p2. H1:p1 ∙ p2. (b) Because x2STAT = 79.29 7 3.841,
reject H0. There is evidence to conclude that the population proportion of Total 200 200 200 600
those who viewed the brand on general TV was different from those who
viewed the brand on VOD D4+. p@value = 0.0000. The probability of Data
obtaining a test statistic of 79.29 or larger when the null hypothesis is true Level of Significance 0.05
is 0.0000. (c) You should not compare the results in (a) to those of Problem
Number of Rows 2
10.30 (b) because that was a one-tail test.
Number of Columns 3
12.8 (a) H0: p1 = p2. H1: p1 ∙ p2. Because x2STAT = (326 - 339.0878)2 >
Degrees of Freedom 2
339.0878 + (97 - 83.9122)2 >83.9122 + (167 - 153.9122)2 >
153.9122 + (25 - 38.0878)2 >38.0878 = 8.1566 7 6.635, reject H0.
Results
There is evidence of a difference in the proportion of organizations with
500 to 2,499 employees and organizations with 2,500+ employees with Critical Value 5.9915
respect to the proportion that have employee recognition programs. Chi-Square Test Statistic 31.5841
(b) p@value = 0.0043. The probability of obtaining a difference in p-Value 0.0000
proportions that gives rise to a test statistic above 8.1566 is 0.0043 if
Reject the null hypothesis
there is no ­difference in the proportion in the two groups. (c) The re-
sults of (a) and (b) are exactly the same as those of Problem 10.32. The Because 31.5841 7 5.9915, reject H0.
x2 in (a) and the Z in Problem 10.32 (a) satisfy the relationship that There is a significant difference among business groups with respect
x2 = 8.1566 = Z 2 = (-2.856)2, and the p-­value in (b) is exactly to the proportion that say compensation (pay and rewards) makes for a
the same as the p-value computed in Problem 10.32 (b). unique and compelling EVP. (b) p@value = 0.0000. The probability of a
12.10 (b) Because x2STAT = 19.9467 7 3.841, reject H0. There is evidence test statistic greater than 31.5841 is 0.0000. (c)
that there is a significant difference between the proportion of co-browsing 
Level of Significance 0.05
organizations and non-co-browsing organizations that use skills-based
routing to match the caller with the right agent. (c) p-value is virtually Square Root of Critical Value 2.4477
zero. The probability of obtaining a test statistic of 19.9467 or larger when
the null hypothesis is true is 0.0000. (d) The results are identical because Sample Proportions
(4.4662)2 = 19.9467. Group 1 0.14
12.12 (a) The expected frequencies for the first row are 20, 30, and 40. Group 2 0.638
The expected frequencies for the second row are 30, 45, and 60. Group 3 0.33
(b) Because x2STAT = 12.5 7 5.991, reject H0.

12.14 (a) Because the calculated test statistic 46.4046 is greater than Marascuilo Table
the critical value of 7.8147, you reject H0 and conclude that there is Absolute Critical
evidence of a difference among the age groups in the proportion smart- Proportions Differences Range
phone owners who have reached the maximum amount of data they are 0.124 0.1033 Significant
∙ Group 1 ∙ Group 2∙
allowed to use as part of their plan, at least on occasion.
(b) p@value = 0.0000. The probability of obtaining a data set that gives ∙ Group 1 ∙ Group 3∙ 0.19 0.1011 Significant
rise to a test statistic of 46.4046 or more is 0.0000 if there is no differ- ∙ Group 2 ∙ Group 3∙ 0.05 0.1170 Not significant
ence in the proportion who have reached the maximum amount of data
they are allowed to use as part of their plan, at least on occasion. Business executives are different from HR leaders and from employees.
(c) There is a significant difference between 18- to 29-year-olds and
12.18 (a) Because x2STAT = 31.6888 7 5.9915, reject H0. There
50- to 64-years-olds and those 65 and older. There is a significant
is ­evidence of a difference in the percentage who use their device
difference between 30- to 49-year-olds and 50- to 64-years-olds and
to check social media while watching TV between the groups. (b)
those 65 and older.
p@value = 0.0000. (c) Cellphone versus computer 0.1616 7 0.0835.
12.16 (a) H0:p1 = p2 = p3. H1: At least one proportion differs. Significant. Cellphone versus tablet: 0.1805 7 0.0917. Significant.
Observed Frequencies ­Computer versus tablet: 0.0188 6 0.0998. Not significant. The
­smartphone group is different from the computer and tablet groups.
Group
Compensation value BE HR Employees Total 12.20 df = (r - 1)(c - 1) = (3 - 1)(4 - 1) = 6.
Yes  28  76  66 170 12.22 x2STAT = 92.1028 7 16.919, reject H0 and conclude that there is
No 172 124 134 430 evidence of a relationship between the type of dessert ordered and the
type of entrée ordered.
Total 200 200 200 600
780 Self-Test Solutions and Answers to Selected Even-Numbered Problems

12.24 H0: There is no relationship between the frequency of posting on 12.38 (a) H0: M1 = M2, where Populations: 1 = Wing A, 2 = Wing B.
Facebook and age. H1: There is a relationship between the frequency of H1: M1 ∙ M2.
posting on Facebook and age. Population 1 sample: Sample size 20, sum of ranks 561
Chi-Square Test Population 2 sample: Sample size 20, sum of ranks 259
Observed Frequencies n1(n + 1) 20(40 + 1)
mT1 = = = 410
Age Group 2 2
Frequency 16–17 18–29 30–49 50–64 65+ Total n1n2(n + 1) 20(20)(40 + 1)
sT1 = = = 36.9685
A 12 A 12
Several 36 322 353 147  64  922
T1 - mT1 561 - 410
Once a day  4  69 135 100  48  356 ZSTAT = = = 4.0846
ST1 36.9685
A few times week 20  55  90  74  27  266
Decision: Because ZSTAT = 4.0846 7 1.96 (or p@value =
Every few weeks  4  11   8  25   7   55
0.0000 6 0.05), reject H0. There is sufficient evidence of a difference in
Less often  4  14  21  25  11   75 the median delivery time in the two wings of the hotel.
Total 68 471 607 371 157 1,674 (b) The results of (a) are consistent with the results of Problem 10.65.

12.40 (a) Because ZSTAT = 2.1342 7 1.96, reject H0. There is evidence
Expected Frequencies to conclude that there is a difference in the median brand value between
Age Group the two sectors. (b) You must assume approximately equal variability in
the two populations. (c) using the pooled-variance t test you rejected the
Frequency 16–17 18–29 30–49 50–64 65+ Total
null hypothesis and the separate-variance t test rejected the null hypothe-
Several 37.453 259.416 334.321 204.338 86.472  922 sis so you conclude in Problem 10.17 that the mean brand value is differ-
Once a day 14.461 100.165 129.087  78.898 33.388  356 ent between the two sectors. In this test, using the Wilcoxon rank sum test
A few times week 10.805 74.84  96.453  58.952 24.947  266 with large-sample Z approximation you rejected the null hypothesis and
concluded that the median brand value differs between the two sectors.
Every few weeks 2.234  15.475  19.943  12.189 5.1583   55
Less often  3.0466  21.102  27.195  16.622  7.034    75 12.42 (a) Because -1.96 6 ZSTAT = 1.1687 6 1.96 (or the p@value =
0.2425 7 0.05), do not reject H0. There is not enough evidence to
Total 68 471 607 371 157 1,674
conclude that there is a difference in the median rating of 60-second and
Data 30-second ads. (b) You must assume approximately equal variability in
Level of Significance  0.01 the two populations. (c) Using the pooled-variance t-test, you do not
reject the null hypothesis (t = -2.0040 6 tSTAT = 0.7949 6 2.0040;
Number of Rows  5
p@value = 0.4301 7 0.05) and conclude that there is insufficient evi-
Number of Columns  5 dence of a difference in the mean rating of 60-second and 30-second ads
Degrees of Freedom 16 in Problem 10.11 (a).

Results 12.44 (a) Decision rule: If H 7 x2U = 15.086, reject H0.


(b) Because H = 13.77 6 15.806, do not reject H0.
Critical Value 31.99993
Chi-Square Test Statistic 119.7494 12.46 (a) H = 13.517 7 7.815, p@value = 0.0036 6 0.05, reject H0.
p-Value 6.14E-18 There is sufficient evidence of a difference in the median waiting time in the
four locations. (b) The results are consistent with those of Problem 11.9.
Reject the null hypothesis
Expected frequency assumption is met. 12.48 (a) H = 19.3269 7 9.488, reject H0. There is evidence of a
difference in the median ratings of the ads. (b) The results are consistent
Decision: Because x2STAT = 119.7494 7 31.9999 reject H0. There is with those of Problem 11.10. (c) Because the combined scores are not
evidence to conclude that there is a relationship between the frequency of true continuous variables, the nonparametric Kruskal-Wallis rank test is
Facebook posts and age. more appropriate because it does not require that the scores are normally
distributed.
12.26 Because x2STAT = 81.6061 7 47.3999 reject H0. There is evidence
of a relationship between identified main opportunity and geographic 12.50 (a) Because H = 13.0522 7 7.815 or the p-value is 0.0045, reject
region. H0. There is sufficient evidence of a difference in the median cost associ-
ated with importing a standardized cargo of goods by sea transport across
12.28 (a) 31. (b) 29. (c) 27. (d) 25.
the global regions. (b) The results are the same.
12.30 40 and 79.
12.56 (a) Because x2STAT = 0.412 6 3.841, do not reject H0. There is
12.32 (a) The ranks for Sample 1 are 1, 2, 4, 5, and 10. The ranks for insufficient evidence to conclude that there is a relationship between a
Sample 2 are 3, 6.5, 6.5, 8, 9, and 11. (b) 22. (c) 44. student’s gender and pizzeria selection. (b) Because x2STAT =
2.624 6 3.841, do not reject H0. There is insufficient evidence to con-
12.34 Because T1 = 22 7 20, do not reject H0.
clude that there is a relationship between a student’s gender and pizzeria
12.36 (a) The data are ordinal. (b) The two-sample t test is inappropri- selection. (c) Because x2STAT = 4.956 6 5.991, do not reject H0. There is
ate because the data can only be placed in ranked order. (c) Because insufficient evidence to conclude that there is a relationship between price
ZSTAT = -2.2054 6 -1.96, reject H0. There is evidence of a significance and pizzeria selection. (d) p@value = 0.0839. The probability of a sample
difference in the median rating of California Cabernets and Washington that gives a test statistic equal to or greater than 4.956 is 8.39% if the null
Cabernets. hypothesis of no relationship between price and pizzeria selection is true.
Self-Test Solutions and Answers to Selected Even-Numbered Problems 781

12.58 (a) Because x2STAT = 7.4298 6 9.4877; p@value = 0.1148 7 0.05 explain the variation in weekend box office gross could be the amount
do not reject H0. There is not enough evidence to conclude that there spent on advertising, the timing of the release of the movie, and the type
is evidence of a difference in the proportion of organizations that have of movie.
embarked on digital transformation on the basis of industry sector.
13.24 A residual analysis of the data indicates a pattern, with sizable
(b) Because x2STAT = 38.09 7 21.0261; p@value = 0.0001 6 0.05 reject
clusters of consecutive residuals that are either all positive or all nega-
H0. There is evidence of a relationship between digital transformation
tive. This pattern indicates a violation of the assumption of linearity. A
progress and industry sector.
­curvilinear model should be investigated.

13.26 There does not appear to be a pattern in the residual plot. The
CHAPTER 13 assumptions of regression do not appear to be seriously violated.

13.2 (a) Yes. (b) No. (c) No. (d) Yes. 13.28 Based on the residual plot, the assumption of equal variance may
be violated.
13.4 (a) The scatter plot shows a positive linear relationship. (b) For each
increase in alcohol percentage of 1.0, mean predicted mean wine quality is 13.30 Based on the residual plot, there is no evidence of a pattern.
estimated to increase by 0.5624. (c) Yn = 5.2715. (d) Wine quality appears
13.32 (a) An increasing linear relationship exists. (b) There is evidence of
to be affected by the alcohol percentage. Each increase of 1% in alcohol
a strong positive autocorrelation among the residuals.
leads to a mean increase in wine quality of a little more than half a unit.
13.34 (a) No, because the data were not collected over time. (b) If data
13.6 (b) b0 = -13,130.6592, b1 = 2.4218. (c) For each increase of
were collected at a single store had been selected and studied over a
$1,000 in tuition, the mean starting salary is predicted to increase by
­period of time, you would compute the Durbin-Watson statistic.
$2,421.80. (d) $109,047.01 (e) Starting salary seems higher for those
schools that have a higher tuition. 13.36 (a)
SSXY 201,399.05
13.8 (b) b0 = -1,039.5317, b1 = 8.5816. (c) For each additional million- b1 = = = 0.0161
SSX 12,495,626
dollar increase in revenue, the mean value is predicted to increase by an
estimated $8.5816 million. Literal interpretation of b0 is not meaningful b0 = Y - b1X = 71.2621 - 0.0161 (4,393) = 0.4576.
because an operating franchise cannot have zero revenue. (d) $1,105.864 (b) Yn = 0.458 + 0.0161X = 0.4576 + 0.0161(4,500) = 72.9867, or
million. (e) That the value of the franchise can be expected to increase as $72,987. (c) There is no evidence of a pattern in the residuals over time.

a (ei - ei - 1)
revenue increases. n
2
13.10 (b) b0 = -0.7744, b1 = 1.4030. (c) For each increase of million i=2 1,243.2244
(d) D = = = 2.08 7 1.45. There is no
a ei
YouTube trailer views, the predicted weekend box office gross is esti- n
2 599.0683
mated to increase by $1.4030 million. (d) $27.2847 million. (e) You can i=1
conclude that the mean predicted increase in weekend box office gross is evidence of positive autocorrelation among the residuals. (e) Based on a
$1.4030 million for each million increase in YouTube trailer views. residual analysis, the model appears to be adequate.
13.12 SST = 40, r 2 = 0.90. 90% of the variation in the dependent varia- 13.38 (a) b0 = -2.535, b1 = 0.06073. (b) $2,505.40. (d) D = 1.64 7
ble can be explained by the variation in the independent variable. dU = 1.42, so there is no evidence of positive autocorrelation among the
13.14 r 2 = 0.75. 75% of the variation in the dependent variable can be residuals. (e) The plot shows some nonlinear pattern, suggesting that a non-
explained by the variation in the independent variable. linear model might be better. Otherwise, the model appears to be adequate.

SSR 21.8677 13.40 (a) 3.00. (b) { 2.1199. (c) Reject H0. There is evidence that the
13.16 (a) r 2 = = = 0.3417, 34.17% of the variation fitted linear regression model is useful. (d) 1.32 … b1 … 7.68.
SST 64.0000
in wine quality can be explained by the variation in the percentage of
alcohol. b1 - b1 0.5624
13.42 (a) tSTAT = = = 4.9913 7 2.0106. Reject H0.
Sb1 0.1127
a (Yi - Yi)
n
n 2 There is evidence of a linear relationship between the percentage of
SSE i=1 42.1323
(b) SYX = = = = 0.9369. alcohol and wine quality.
An - 2 H n - 2 A 48
(b) b { ta/2Sb1 = 0.5624 { 2.0106 (0.1127) 0.3359 … b1 … 0.7890.
(c) Based on (a) and (b), the model should be some-
what useful for predicting wine quality. 13.44 (a) tSTAT = 10.7174 7 2.0301; p@value = 0.0000 6 0.05 reject H0.
2 There is evidence of a linear relationship between tuition and starting
13.18 (a) r = 0.7665. 76.65% of the variation in starting salary can be
salary. (b) 1.963 … b1 … 2.8805.
explained by the variation in tuition. (b) SYX = 15,944.3807. (c) Based
on (a) and (b), the model should be very useful for predicting the starting 13.46 (a) tSTAT = 26.3347 7 2.0484 or because the p-value is 0.0000,
salary. reject H0 at the 5% level of significance. There is evidence of a linear
relationship between annual revenue and franchise value.
13.20 (a) r 2 = 0.9612, 96.12% of the variation in the value of a baseball
(b) 7.9141 … b1 … 9.2491.
franchise can be explained by the variation in its annual revenue.
(b) SYX = 140.8188. (c) Based on (a) and (b), the model should be very 13.48 (a) tSTAT = 11.3381 7 1.9977 or because the p@value =
useful for predicting the value of a baseball franchise. 0.0000 6 0.05; reject H0. There is evidence of a linear relationship
­between YouTube trailer views and weekend box office gross.
13.22 (a) r 2 = 0.6676, 66.76% of the variation in weekend box office
(b) 1.1558 … b1 … 1.6501.
gross can be explained by the variation in YouTube trailer views.
(b) SYX = 19.4447. (c) Based on (a) and (b), the model should be useful 13.50 (a) (% daily change in SPUU) = b0 + 2.0 (% daily change in S&P
for predicting weekend box office gross. (d) Other variables that might 500 index). (b) If the S&P 500 gains 10% in a year, SPUU is expected
782 Self-Test Solutions and Answers to Selected Even-Numbered Problems

to gain an estimated 20%. (c) If the S&P 500 loses 20% in a year, SPUU ­assumptions. (f) tSTAT = 6.2436 7 2.0010, p-value is 0.0000. Because
is expected to lose an estimated 40%. (d) Risk takers will be attracted to p@value 6 0.05, reject H0. There is evidence of a linear relationship be-
leveraged funds, and risk-averse investors will stay away. tween asking price and living space. (g) 0.0568 … b1 … 0.1103.
(h) The living space in the house is ­somewhat useful in predicting the
13.52 (a), (b) First weekend and U.S. gross: r = 0.7284, tSTAT =
asking price, but because only 39.79% of the variation in asking price is
2.6042 7 2.4469, p@value = 0.0404 6 0.05. reject H0. At the 0.05
explained by variation in living space, other variables should be considered.
level of significance, there is evidence of a linear relationship between
first weekend sales and U.S. gross. First weekend and worldwide gross: 13.78 (a) b0 = 21.2034, b1 = -0.1517. (b) For each additional point on
r = 0.8233, tSTAT = 3.5532 7 2.4469, p@value = 0.0120 6 0.05. reject the efficiency ratio, the predicted mean tangible common equity (ROATCE)
H0. At the 0.05 level of significance, there is evidence of a linear relationship is estimated to decrease by 0.1517. For an efficiency of 0, the predicted mean
between first weekend sales and worldwide gross. U.S. gross and worldwide tangible common equity (ROATCE) is 21.2034. (c) 12.0989.
gross: r = 0.9642, tSTAT = 8.9061 7 2.4469, p@value = 0.0001 6 0.05. (d) r 2 = 0.1882. (e) There is no obvious pattern in the residuals, so the
Reject H0. At the 0.05 level of significance, there is evidence of a linear assumptions of regression are met. The model appears to be adequate.
relationship between U.S gross and worldwide gross. (f) tSTAT = -4.7662 6 -1.9845; reject H0. There is evidence of a linear re-
lationship between efficiency ratio and tangible common equity (ROATCE).
13.54 (a) r = 0.3002. There is an insignificant linear relationship
(g) 11.4060 … mY∙X = 60 … 12.7918, 5.1534 … YX = 60 … 19.0444.
­between social media networking and the GDP per capita.
(h) -0.2149 … b1 … -0.0886. (i) There is a small relationship between
(b) tSTAT = 1.6048, p@value = 0.1206 7 0.05. Do not reject H0. At
efficiency ratio and tangible common equity (ROATCE).
the 0.05 level of significance, there is insufficient evidence of a linear
­relationship between social media networking and the GDP per capita. 13.80 (a) There is no clear relationship shown on the scatter plot.
(c) There does not appear to be a linear relationship. (c) Looking at all 23 flights, when the temperature is lower, there is
likely to be some O-ring damage, particularly if the temperature is
13.56 (a) 15.95 … mY∙X = 4 … 18.05. (b) 14.651 … YX = 4 … 19.349.
below 60 degrees. (d) 31 degrees is outside the relevant range, so a
(c) The intervals in this problem are wider than in Problem 13.55 because
­prediction should not be made. (e) Predicted Y = 18.036 - 0.240X,
they involve X values that are different from the mean.
where X = temperature and Y = O@ring damage. (g) A nonlinear model
13.58 (a) Yn = -0.3529 + (0.5624)(10) = 5.2715 Yn { ta/2SYX 2hi would be more appropriate. (h) The appearance on the residual plot of a
= 5.2715 { 2.0106(0.9369)20.0249 ­nonlinear pattern indicates that a nonlinear model would be better. It also
4.9741 … mY∙X = 10 … 5.5690. appears that the normality assumption is invalid.

(b) Yn { ta/2SYX 21 + hi 13.82 (a) b0 = -893.4994, b1 = 12.3871. (b) For each additional
= 5.2715 { 2.0106(0.9369)21 + 0.0249 million-dollar increase in revenue, the franchise value will increase by an
3.3645 … YX = 10 … 7.1786. estimated 12.3871 million. Literal interpretation of b0 is not meaningful
(c) Part (b) provides a prediction interval for the individual response because an operating franchise cannot have zero revenue. (c) $964.5599
given a specific value of the independent variable, and part (a) provides million. (d) r 2 = 0.8251. 82.51% of the variation in the value of an NBA
a confidence interval estimate for the mean value, given a specific franchise can be explained by the variation in its annual revenue.
value of the independent variable. Because there is much more varia- (e) There does not appear to be a pattern in the residual plot. The
tion in predicting an individual value than in estimating a mean value, ­assumptions of regression do not appear to be seriously violated.
a prediction interval is wider than a confidence interval estimate. (f) tSTAT = 11.493 7 2.0484 or because the p-value is 0.0000, reject H0
at the 5% level of significance. There is evidence of a linear relationship
13.60 (a) $103,638.95 … mY∙X = 50,450 … $114,455.06. between annual revenue and franchise value.
(b) $76,229.52 … YX = 50,450 … $141,864.49. (c) You can estimate a mean (g) 852.6812 … mY∙X = 150 … 1,076.439. (h) 405.1897 … YX = 150 …
more precisely than you can predict a single observation. 1,523.93. (i) The strength of the relationship between revenue and value
is approximately the same for NBA franchises and for European soccer
13.62 (a) 1,043.1911 … mY∙X = 250 … 1,168.5370. (b) 810.6799 … YX = 250
teams but lower than for Major League Baseball teams.
… 1,401.0480 (c) Because there is much more variation in predicting
an individual value than in estimating a mean, the prediction interval is 13.84 (a) b0 = -2,629.222, b1 = 82.472. (b) For each additional
wider than the confidence interval. ­centimeter in circumference, the weight is estimated to increase by
82.472 grams. (c) 2,319.08 grams. (d) Yes, because circumference is a
13.74 (a) b0 = 24.84, b1 = 0.14. (b) For each additional case, the predicted
very strong predictor of weight. (e) r 2 = 0.937. (f) There appears to be a
delivery time is estimated to increase by 0.14 minute. The interpretation
nonlinear relationship between circumference and weight. (g) p-value is
of the Y intercept is not meaningful because the number of cases delivered
virtually 0 6 0.05; reject H0. (h) 72.7875 … b1 … 92.156.
cannot be 0. (c) 45.84. (d) No, 500 is outside the relevant range of the data
used to fit the regression equation. (e) r 2 = 0.972. (f) There is no obvious 13.86 (a) The correlation between compensation and stock performance
pattern in the residuals, so the assumptions of regression are met. The ­model is 0.0550. (b) tSTAT = 0.7757; p@value = 0.4388 7 0.05. The correlation
appears to be adequate. (g) tSTAT = 24.88 7 2.1009; reject H0. between compensation and stock performance is not significant, only
(h) 44.88 … mY∙X = 150 … 46.80. 41.56 … YX = 150 … 50.12. 0.3% of the variation in compensation can be explained by return.
(i) The number of cases explains almost all of the variation in delivery time. (c) The small correlation between compensation and stock performance
was surprising (or maybe it shouldn’t have been!).
13.76 (a) b0 = 326.5935, b1 = 0.0835. (b) For each additional square foot
of living space in the house, the mean asking price is predicted to increase
by $83.50. The estimated asking price of a house with 0 living space is
326.5935 thousand dollars. However, this interpretation is not meaningful
CHAPTER 14
because the living space of the house cannot be 0. (c) Yn = 493.6769 thou- 14.2 (a) For each one-unit increase in X1, you estimate that the mean of
sand dollars. (d) r 2 = 0.3979. So 39.79% of the Y will decrease 2 units, holding X2 constant. For each one-unit increase
variation in asking price is explained by the variation in living space. in X2, you estimate that the mean of Y will increase 7 units, holding X1
(e) Neither the residual plot nor the normal probability plot reveals constant. (b) The Y intercept, equal to 50, estimates the value of Y when
any ­potential violation of the linearity, equal variance, and normality both X1 and X2 are 0.
Self-Test Solutions and Answers to Selected Even-Numbered Problems 783

14.4 (a) Yn = 1.3960 - 0.0117X1 + 0.0286X2. (b) For a given capital 14.26 (a) 95% confidence interval on b1 : b1 { tSb1, -0.0117 { 1.98
adequacy, for each increase of 1% in efficiency ratio, ROAA decreases by (0.0022), -0.0161 … b1 … -0.0074. (b) For X1 : tSTAT = b1/Sb1 =
0.0117%. For a given efficiency ratio, for each increase of 1% in capital -0.0177/0.0022 = -5.3415 6 -1.98. Reject H0. There is
adequacy, ROAA increases by 0.0286% (c) Yn = 1.1214 ­evidence that X1 contributes to a model already containing X2. For
(d) 1.0798 … mY∙X … 1.1629. (e) 0.5679 … YX … 1.6749 (f) The X2 : tSTAT = b2/Sb2 = 0.0286/0.0054 = 5.2992 7 1.98. Reject H0.
interval in (e) is narrower because it is estimating the mean value, not an There is evidence that X2 contributes to a model already containing X1.
individual value. (g) The model uses both the efficiency ratio and capital Both X1 (efficiency ratio) and X2 (total risk-based capital) should be
adequacy to predict ROA. This may produce a better model than if only included in the model.
one of these independent variables is included.
14.28 (a) -5.8682 … b1 … 12.8225. (b) For X1 : tSTAT =
14.6 (a) Yn = 301.78 + 3.4771X1 + 41.041X2. (b) For a given amount of 0.7443 6 2.0003. Don’t reject H0. There is insufficient evidence that X1
voluntary turnover, for each increase of $1 billion in worldwide revenue, contributes to a model already containing X2. For X2 : tSTAT = 1.8835
the mean number of full-time jobs added is predicted to increase by 6 2.0003. Do not reject H0. There is insufficient evidence that X2
3.4771. For a given $1 billion in worldwide revenue, for each increase of ­contributes to a model already containing X1. Neither variable contributes
1% in voluntary turnover, the mean number of full-time jobs added is pre- to a model that includes the other variable. You should consider using
dicted to increase by 41.041. (c) The Y intercept has no meaning in this only a simple linear regession model.
problem. (d) Holding the other independent variable constant, voluntary
turnover has a higher slope than worldwide revenue 14.30 (a) 274.1702 … b1 … 540.0990. (b) For X1 : tSTAT = 6.2827
and p@value = 0.0000. Because p@value 6 0.05, reject H0. There
14.8 (a) Yn = 532.2883 + 407.1346X1 - 2.8257X2, where X1 = is evidence that X1 contributes to a model already containing X2. For
land area, X2 = age. (b) For a given age, each increase by one acre in X2 : tSTAT = -4.1475 and p@value = 0.0003. Because p@value 6 0.05
land area is estimated to result in an increase in the mean fair market reject H0. There is evidence that X2 contributes to a model already contain-
value by $407.1346 thousands. For a given land area, each increase of ing X1. Both X1 (land area) and X2 (age) should be included in the model.
one year in age is estimated to result in a decrease in the mean fair market
value by $2.8257 thousands. (c) The interpretation of b0 has no practical 14.32 (a) For X1: FSTAT = 1.25 6 4.96; do not reject H0. For
meaning here because it would represent the estimated fair market value X2: FSTAT = 0.833 6 4.96; do not reject H0. (b) 0.1111, 0.0769.
of a new house that has no land area. (d) Yn = $478.6577 thousands. 14.34 (a) For X1: SSR(X1 ∙ X2) = SSR (X1 and X2) - SSR(X2) =
(e) 446.8367 … mY∙X … 510.4788. (f) 307.2577 … YX … 650.0577.
SSR(X1 ∙ X2)
14.10 (a) MSR = 15, MSE = 12. (b) 1.25. (c) FSTAT = 1.25 6 4.10; do 5.9271 - 3.6923 = 2.2348 FSTAT = =
MSE
not reject H0. (d) 0.20. 20% of the variation in Y is explained by variation
in X. (e) 0.04. 2.2348
= 28.5227 7 3.897. Reject H0. There is evidence that X1
15.3521/196
14.12 p-value for revenue is 0.0395 6 0.05 and the p-value for efficiency contributes to a model already containing X2. For X2: SSR(X2 ∙ X1) =
is less than 0.0001 6 0.05. Reject H0 for each of the independent varia- SSR (X1 and X2) - SSR(X1) = 5.9271 - 3.7275 = 2.1996,
bles. There is evidence of a significant linear relationship with each of the
SSR(X2 ∙ X1) 2.1996
independent variables. FSTAT = = = 28.0823 7 3.897.
MSE 15.3521/196
14.14 (a) FSTAT = 37.8384 7 3.00; reject H0. (b) p@value = 0.0000. The Reject H0. There is evidence that X2 contributes to a model already
probability of obtaining an FSTAT value 7 37.8384 if the null hypothesis containing X1. Because both X1 and X2 make a significant contribution to
is true is 0.0000. (c) r 2 = 0.2785. 27.85% of the variation in ROA can the model in the presence of the other variable, both variables should be
be explained by variation in efficiency ratio and variation in risk-based included in the model.
capital. (d) r2adj = 0.2712. SSR(X1 ∙ X2)
(b) r2Y1.2 =
SST - SSR(X1 and X2) + SSR(X1 ∙ X2)
14.16 (a) FSTAT = 1.95 6 3.15; Do not reject H0. There is insufficient
evidence of a significant linear relationship. (b) p@value = 0.1512. The 2.2348
= = 0.1271.
probability of obtaining an FSTAT value 7 1.95 if the null hypothesis is 21.2791 - 5.9271 + 2.2348
true is 0.1512. (c) r 2 = 0.0610. 6.10% of the variation in full-time jobs Holding constant the effect of the total risk based capital, 12.71% of the
added can be explained by variation in worldwide revenue and variation variation in ROAA can be explained by the variation in efficiency ratio.
in full-time voluntary turnover. (d) r2adj = 0.0297. SSR(X2 ∙ X1)
r2Y2.1 =
14.18 (a) – (e) Based on a residual analysis, there is no evidence of a SST - SSR(X1 and X2) + SSR(X2 ∙ X1)
violation of the assumptions of regression. 2.1996
= = 0.1253
21.2791 - 5.9271 + 2.1996
14.20 (a) There is no evidence of a violation of the assumptions
Holding constant the effect of efficiency ratio 12.53% of the variation in
(b) Because the data are not collected over time, the Durbin-Watson
ROA can be explained by the variation in the total risk-based capital.
test is not appropriate. (c) They are valid
14.36 (a) For X1: FSTAT = 0.554 6 4.00; Don’t reject H0. There is
14.22 (a) The residual analysis reveals no patterns. (b) Because the data
insufficient evidence that X1 contributes to a model containing X2. For
are not collected over time, the Durbin-Watson test is not appropriate.
X2: FSTAT = 3.5476 6 4.00. Do not reject H0. There is insufficient
(c) There are no apparent violations in the assumptions.
evidence that X2 contributes to a model already containing X1. Because
14.24 (a) Variable X2 has a larger slope in terms of the t statistic of 3.75 only X1 makes a significant contribution to the model in the presence of the
than variable X1, which has a smaller slope in terms of the t statistic of other variable, only Xi should be included in the model. (b) r2Y1.2 = 0.0091.
3.33. (b) 1.46824 … b1 … 6.53176. (c) For X1 : tSTAT = 3.33 7 2.1098. ­Holding constant the effect of full-time voluntary turnover, 0.91% of the
Reject H0. There is evidence that X1 contributes to a model already con- variation in full-time jobs added be explained by the variation in total
taining X2. For X2 : tSTAT = 3.75 7 2.1098. Reject H0. There is evidence worldwide revenue . r2Y2.1 = 0.0558. Holding constant the effect of total
that X2 contributes to a model already containing X1. Both X1 and X2 worldwide revenue, 5.58% of the variation in full-time jobs created can be
should be included in the model. explained by the variation in full-time voluntary turnover.
784 Self-Test Solutions and Answers to Selected Even-Numbered Problems

14.38 (a) Holding constant the effect of X2, for each increase of one between total staff present and remote hours, the model in Problem 14.7
unit of X1, Y increases by 4 units. (b) Holding constant the effect of X1, should be used.
for each increase of one unit of X2, Y increases by 2 units. (c) Because
tSTAT = 3.27 7 2.1098, reject H0. Variable X2 makes a significant contri- 14.50 Holding constant the effect of other variables, the natural logarithm
bution to the model. of the estimated odds ratio for the dependent categorical response will in-
crease by 2.2 for each unit increase in the particular independent variable.
14.40 (a) Yn = 243.7371 + 9.2189X1 + 12.6967X2, where X1 = number
of rooms and X2 = neighborhood (east = 0). (b) Holding constant the 14.52 0.4286.
effect of neighborhood, for each additional room, the mean selling price
14.54 (a) ln(estimated odds ratio) = -6.9394 + 0.1395X1 +
is estimated to increase by 9.2189 thousands of dollars, or $9,218.9.
2.7743X2 = -6.9394 + 0.1395(36) + 2.7743(0) = -1.91908.
For a given number of rooms, a west neighborhood is estimated to
Estimated odds ratio = 0.1470. Estimated Probability of Success =
increase the mean selling price over an east neighborhood by 12.6967
Odds Ratio/(1 + Odds Ratio) = 0.1470/(1 + 0.1470) = 0.1260.
thousands of dollars, or $12,696.7. (c) Yn = 326.7076, or $326,707.6.
(b) From the text discussion of the example, 70.2% of the individ-
$309,560.04 … YX … 343,855.1. $321,471.44 … mY∙X … $331, 943.71.
uals who charge $36,000 per annum and possess additional cards
(d) Based on a residual analysis, the model appears to be adequate.
can be expected to purchase the premium card. Only 12.60% of
(e) FSTAT = 55.39, the p-value is virtually 0. Because p@value 6 0.05,
the individuals who charge $36,000 per annum and do not pos-
reject H0. There is evidence of a significant relationship between selling
sess additional cards can be expected to purchase the premium
price and the two independent variables (rooms and neighborhood). (f) For
card. For a given amount of money charged per annum, the likeli-
X1: tSTAT = 8.9537, the p-value is virtually 0. Reject H0. Number of rooms
hood of purchasing a premium card is substantially higher among
makes a significant contribution and should be included in the model. For
individuals who already possess additional cards than for those
X2: tSTAT = 3.5913, p@value = 0.0023 6 0.05. Reject H0. Neighborhood
who do not possess additional cards. (c) ln(estimated odds ratio)
makes a significant contribution and should be included in the model. Based
= -6.9394 + 0.13957X1 + 2.7743X2 = -6.9394 + 0.1395(18) +
on these results, the regression model with the two independent variables
2.7743(0) = -4.4298. Estimated odds ratio = e-4.4298 = 0.0119.
should be used. (g) 7.0466 … b1 … 11.3913. (h) 5.2378 … b2 … 20.1557.
Estimated Probability of Success = Odds Ratio/(1 + Odds Ratio) =
(i) r 2adj = 0.851. (j) r 2Y1.2 = 0.825. Holding constant the effect of neighbor-
0.0119/(1 + 0.0119) = 0.01178. (d) Among individuals who do not
hood, 82.5% of the variation in selling price can be explained by variation in
purchase additional cards, the likelihood of purchasing a premium card
number of rooms. r2Y2.1 = 0.431. Holding constant the effect of number of
diminishes dramatically with a substantial decrease in the amount
rooms, 43.1% of the variation in selling price can be explained by varia-
charged per annum.
tion in neighborhood. (k) The slope of selling price with number of rooms
is the same, regardless of whether the house is located in an east or west 14.56 (a) ln(estimated odds) = -47.4723 + 1.3099 fixed acidity +
neighborhood. (l) Yn = 253.95 + 8.032X1 - 5.90X2 + 2.089X1X2. For 90.5722 chlorides + 9.777 pH. (b) Holding constant the effect of
X1 X2, p@value = 0.330. Do not reject H0. There is no evidence that the ­chlorides and pH, for each increase of one point in fixed acidity, ln
interaction term makes a contribution to the model. (m) The model in (b) (estimated odds) increases by an estimate of 1.3099. Holding constant the
should be used. (n) The number of rooms and the neighborhood both signif- effect of fixed acidity and pH, for each increase of one point in chlorides,
icantly affect the selling price, but the number of rooms has a greater effect. ln(estimated odds) increases by an estimate of 90.5722. Holding constant
14.42 (a) Predicted time = 8.01 + 0.00523 Depth - 2.105 Dry. the effect of fixed acidity and chlorides, for each increase of one point in
(b) Holding constant the effect of type of drilling, for each foot increase pH, ln(estimated odds) increases by an estimate of 9.777. (c) 0.3686.
in depth of the hole, the mean drilling time is estimated to increase by (d) Deviance = 54.456, p@value = 1.0000, do not reject H0, so model
0.00523 minutes. For a given depth, a dry drilling hole is estimated to is adequate. (e) For fixed acidity: ZSTAT = 3.17 7 1.96, reject H0. For
reduce the drilling time over wet drilling by a mean of 2.1052 minutes. chlorides: ZSTAT = 4.00 7 1.96, reject H0. For pH: ZSTAT = 3.29 7 1.96,
(c) 6.428 minutes, 6.210 … mY∙X … 6.646, 4.923 … YX … 7.932. reject H0. Each variable makes a significant contribution to the model.
(d) The model appears to be adequate. (e) FSTAT = 111.11 7 3.09; reject (f) Fixed acidity, chlorides, and pH are all important factors in distinguish-
H0. (f) tSTAT = 5.03 7 1.9847; reject H0. tSTAT = -14.03 6 -1.9847; ing between white and red wines.
reject H0. Include both variables. (g) 0.0032 … b1 … 0.0073. 14.58 (a) ln(estimated odds) = -0.6048 + 0.0938 claims/year +
(h) -2.403 … b2 … -1.808. (i) 69.0%. (j) 0.207, 0.670. (k) The slope of 1.8108 new business (b) Holding constant the effects of whether the
the additional drilling time with the depth of the hole is the same, regard- policy is new, for each increase of the number of claims submitted
less of the type of drilling method used. (l) The p-value of the interaction per year by the policy holder, ln(odds) increases by an estimate of
term = 0.462 7 0.05, so the term is not significant and should not be 0.0938. Holding constant the number of claims submitted per year
included in the model. (m) The model in part (b) should be used. Both by the policy holder, ln(odds) is estimated to be 1.8108 higher when
variables affect the drilling time. Dry drilling holes should be used to the policy is new as compared to when the policy is not new.
reduce the drilling time. (c) ln(estimated odds ratio) = 1.2998. Estimated odds ratio = 3.6684
14.44 (a) Yn = 1.1079 - 0.0070X1 + 0.0448X2 - 0.0003X1X2, Estimated probability of a fraudulent claim = 0.7858 (d) The
where X1 = efficiency ratio, X2 = total risk@based capital, p@vale = ­deviance statistic is 119.4353 with a p@value = 0.0457 6 0.05.
0.4593 7 0.05. Do not reject H0. There is not enough evidence that the Reject H0. The model is not a good fitting model. (e) For claims/year:
interaction term makes a contribution to the model. (b) Because there is ZSTAT = 0.1865, p@value = 0.8521 7 0.05. Do not reject H0. There is
insufficient evidence of any interaction effect between efficiency ratio and insufficient evidence that the number of claims submitted per year by the
total risk-based capital, the model in Problem 14.4 should be used. policy holder makes a significant contribution to the logistic regression
model. For new business: ZSTAT = 2.2261, p@value = 0.0260 6 0.05.
14.46 (a) The p-value of the interaction term = 0.1650 6 0.05, so the Reject H0. There is sufficient evidence that whether the policy is new
term is not significant and should be not included in the model. (b) Use makes a significant contribution to the logistic model regression.
the model developed Problem 14.6. (f) ln(estimated odds) = -1.0125 + 0.9927 claims/year.
14.48 (a) For X1 X2, p@value = 0.2353 7 0.05. Do not reject H0. There is (g) ln(estimated odds) = -0.5423 + 1.9286 new business.
insufficient evidence that the interaction term makes a contribution to the (h) The ­deviance statistic for (f) is 125.0102 with a
model. (b) Because there is not enough evidence of an interaction effect p@value = 0.0250 6 0.05. Reject H0. The model is not a good
Self-Test Solutions and Answers to Selected Even-Numbered Problems 785

fitting model. The deviance statistic for (g) is 119.4702 with a square foot in the size of the house, the mean asking price is estimated to
p@value = 0.0526 7 0.05. Do not reject H0. The model is a good fitting increase by 77.50 thousand dollars. Holding constant the living space of
model. The model in (g) should be used to predict a fraudulent claim. the house, for each additional year in age, the asking price is estimated to
decrease by 0.4122 thousand dollars. (c) Yn = 492.5316 thousand dollars.
14.60 (a) ln(estimated odds) = 1.252 - 0.0323 Age + 2.2165 (d) Based on a residual analysis, the model appears to be adequate.
s­ ubscribes to the wellness newsletters. (b) Holding constant the (e) FSTAT = 19.4909, the p@value = 0.0000 6 0.05, reject H0. There
effect of subscribes to the wellness newsletters, for each increase of is evidence of a significant relationship between asking price and the
one year in age, ln(estimated odds) decreases by an estimate of 0.0323. two independent variables (size of the house and age). (f) The p-value is
Holding constant the effect of age, for a customer who subscribes to 0.0000. The probability of obtaining a test statistic of 19.4909 or greater is
the wellness newsletters, ln(estimated odds) increases by an estimate of virtually 0 if there is no significant relationship between asking price and
2.2165. (c) 0.912. (d) Deviance = 102.8762, p@value = 0.3264. Do not the two independent variables (living space of the house and age).
reject H0 so model is adequate. (e) For Age: Z = -1.8053 7 -1.96, (g) r 2 = 0.4019. 40.19% of the variation in asking price can be explained
Do not reject H0. For subscribes to the wellness newsletters: by variation in the size of the house and age. (h) r 2adj = 0.3813. (i) For
Z = 4.3286 7 1.96, ­Reject H0. (f) Only subscribes to wellness X1: tSTAT = 4.6904, the p-value is 0.0000. Reject H0. The living space
newsletters is useful in predicting whether a customer will purchase of the house makes a significant contribution and should be included in
organic food. the model. For X2: tSTAT = -0.6304, p@value = 0.5309 7 0.05. Do not
14.72 (a) Yn = -3.9152 + 0.0319X1 + 4.2228X2, where X1 = number reject H0. Age does not make a significant contribution and should not be
cubic feet moved and X2 = number of pieces of large furniture. included in the model. Based on these results, the regression model with
(b) Holding constant the number of pieces of large furniture, for each only the size of the house should be used. (j) For X1: tSTAT = 4.6904. The
additional cubic foot moved, the mean labor hours are estimated to probability of obtaining a sample that will yield a test statistic farther away
increase by 0.0319. Holding constant the amount of cubic feet moved, than 4.6904 is 0.0000 if the living space does not make a significant contri-
for each additional piece of large furniture, the mean labor hours are es- bution, holding age constant. For X2: tSTAT = -0.6304. The probability of
timated to increase by 4.2228. (c) Yn = 20.4926. (d) Based on a residual obtaining a sample that will yield a test statistic farther away than 0.6304
analysis, the errors appear to be normally distributed. The equal-variance is 0.5309 if the age does not make a significant contribution holding the ef-
assumption might be violated because the variances appear to be larger fect of the living space constant. (k) 0.0444 … b1 … 0.1106. You are 95%
around the center region of both independent variables. There might confident that the asking price will increase by an amount somewhere be-
also be violation of the linearity assumption. A model with quadratic tween $44.40 thousand and $110.60 thousand for each additional thousand
terms for both independent variables might be fitted. (e) FSTAT = 228.80, square foot increase in living space, holding constant the age of the house.
p-value is virtually 0 6 0.05, reject H0. There is evidence of a significant In Problem 13.76, you are 95% confident that the assessed value will
relationship between labor hours and the two independent variables (the increase by an amount somewhere between $56.8 thousand and $110.30
amount of cubic feet moved and the number of pieces of large furniture). thousand for each additional 1,000 square foot increase in living space,
(f) The p-value is virtually 0. The probability of obtaining a test statistic regardless of the age of the house. (l) r2Y1.2 = 0.2750. Holding constant the
of 228.80 or greater is virtually 0 if there is no significant relationship effect of the age of the house, 27.50% of the variation in asking price can
­between labor hours and the two independent variables (the amount of be explained by variation in the living space of the house. r 2Y2.1 = 0.0068.
cubic feet moved and the number of pieces of large furniture). Holding constant the effect of the size of the house, 0.68% of the variation
(g) r 2 = 0.9327. 93.27% of the variation in labor hours can be explained in asking price can be explained by variation in the age of the house. (m)
by variation in the number of cubic feet moved and the number of pieces only the living space of the house should be used to predict asking price.
of large furniture. (h) r 2adj = 0.9287. (i) For X1: tSTAT = 6.9339, the
p-value is virtually 0. Reject H0. The number of cubic feet moved makes 14.76 (a) Yn = -90.2166 + 9.2169X1 + 2.5069X2, where X1 = asking
a significant contribution and should be included in the model. For price and X2 = age. (b) Holding age constant, for each additional $1,000
X2: tSTAT = 4.6192, the p-value is virtually 0. Reject H0. The number of in asking price, the taxes are estimated to increase by a mean of $9.2169
pieces of large furniture makes a significant contribution and should be thousand. Holding asking price constant, for each additional year, the taxes
included in the model. Based on these results, the regression model with are estimated to increase by $2.5069 (c) Yn = $3,721.90. (d) Based on a
the two independent variables should be used. (j) For X1: tSTAT = 6.9339, residual analysis, the errors appear to be normally distributed. The equal-­
the p-value is virtually 0. The probability of obtaining a sample that variance assumption appears to be valid. However, there is one very large
will yield a test statistic greater than 6.9339 is virtually 0 if the num- residual that is from the house that is 107 years old. Removing this point,
ber of cubic feet moved does not make a significant contribution, still leaves a residual for the house that has an asking price of $550,000
holding the effect of the number of pieces of large furniture constant. and is 52 years old. However, because this model is an almost perfect fit,
For X2: tSTAT = 4.6192, the p-value is virtually 0. The probability of you may want to use this model. In this model, age is no longer significant.
obtaining a sample that will yield a test statistic greater than 4.6192 is (e) FSTAT = 1,677.8619, p@value = 0.0000 6 0.05, reject H0. There is
virtually 0 if the number of pieces of large furniture does not make a evidence of a significant relationship between taxes and the two independ-
significant contribution, holding the effect of the amount of cubic feet ent variables (asking price and age). (f) p@value = 0.0000. The probability
moved constant. (k) 0.0226 … b1 … 0.0413. (l) r 2Y1.2 = 0.5930. Holding of obtaining an FSTAT test statistic of 1,677.8619 or greater is virtually 0
constant the effect of the number of pieces of large furniture, 59.3% of if there is no significant relationship between taxes and the two inde-
the variation in labor hours can be explained by variation in the amount pendent variables (asking price and age). (g) r 2 = 0.9830, 98.30% of the
of cubic feet moved. r 2Y2.1 = 0.3927. Holding constant the effect of the variation in taxes can be explained by variation in asking price and age.
number of cubic feet moved, 39.27% of the variation in labor hours can (h) r 2adj = 0.9824. (i) For X1: tSTAT = 53.7184, p@value = 0.0000 6 0.05.
be explained by variation in the number of pieces of large furniture. Reject H0. The asking price makes a significant contribution and should be
(m) Both the number of cubic feet moved and the number of large pieces included in the model. For X2: tSTAT = 2.7873, p@value = 0.0072 6 0.05.
of furniture are useful in predicting the labor hours, but the cubic feet Reject H0. The age of a house makes a significant contribution and should
moved is more important. be included in the model. Based on these results, the regression model
with asking price and age should be used. (j) For X1: p@value = 0.0000.
14.74 (a) Yn = 360.2158 + 0.0775X1 - 0.4122X2, where X1 = house The probability of obtaining a sample that will yield a test statistic greater
size and X2 = age. (b) Holding constant the age, for each additional than 53.7184 is 0.0000 if the asking price does not make a significant
786 Self-Test Solutions and Answers to Selected Even-Numbered Problems

contribution, holding age constant. For X2: p@value = 0.0072. The term of gender and age does not significantly improve the model
probability of obtaining a sample that will yield a test statistic greater (tstat = -0.2371, p@value = 0.8127 7 0.05). You can conclude that
than 2.7873 is 0.0072 if the age of a house does not make a significant females are paid less than males holding constant the age of the person.
contribution, holding the effect of the asking price constant. Perhaps other variables such as department, seniority, and score on a
(k) 8.8735 … b1 … 9.5604. You are 95% confident that the mean taxes performance evaluation can be included in the model to see if the model is
will increase by an amount somewhere between $8.87 and $9.56 for improved.
each additional $1,000 increase in the asking price, holding constant the
14.82 b0 = 18.2892 (die temperature), b1 = 0.5976, (die diameter),
age. In Problem 13.77, you are 95% confident that the mean taxes will
b2 = -13.5108. The r 2 of the multiple regression model is 0.3257 so
increase by an amount somewhere between $5.968 and $11.03 for each
32.57% of the variation in unit density can be explained by the variation
additional $1,000 increase in asking price, regardless of the age.
of die temperature and die diameter. The F test statistic for the combined
(l) r 2Y1.2 = 0.9803. Holding constant the effect of age, 98.03% of the
significance of die temperature and die diameter is 5.0718 with a p-value
variation in taxes can be explained by variation in the asking price.
of 0.0160. Hence, at a 5% level of significance, there is enough evidence
r 2Y2.1 = 0.1181. Holding constant the effect of the asking price, 11.81%
to conclude that die temperature and die diameter affect unit density.
of the variation in taxes can be explained by variation in the age.
The p-value of the t test for the significance of die temperature is 0.2117,
(m) Based on your answers to (b) through (k), the age of a house has
which is greater than 5%. Hence, there is insufficient evidence to con-
an effect on its taxes. However, given the results when the 107-year-old
clude that die temperature affects unit density holding constant the effect
house is not included, the assessor can state that for houses that are not
of die diameter. The p-value of the t test for the significance of die diame-
that old, that age does not have an effect on taxes.
ter is 0.0083, which is less than 5%.There is enough evidence to conclude
14.78 (a) Yn = 160.6120 - 18.7181X1 - 2.8903X2, where X1 = ERA that die diameter affects unit density at the 5% level of significance hold-
and X2 = league (American = 0 National = 1). (b) Holding constant ing constant the effect of die temperature. After removing die temperature
the effect of the league, for each additional earned run, the number of from the model, b0 = 107.9267 (die diameter), b1 = -13.5108. The r 2
wins is estimated to decrease by 18.7181. For a given ERA, a team in the of the multiple regression is 0.2724. So 27.24% of the variation in unit
National League is estimated to have 2.8903 fewer wins than a team in density can be explained by the variation of die diameter. The p-value of
the American League. (c) 76.3803 wins. (d) Based on a residual analysis, the t test for the significance of die diameter is 0.0087, which is less than
there is no pattern in the errors. There is no apparent violation of other 5%. There is enough evidence to conclude that die diameter affects unit
assumptions. (e) FSTAT = 24.306 7 3.35, p@value = 0.0000 6 0.05, density at the 5% level of significance. There is some lack of equality in
reject H0. There is evidence of a significant relationship between the residuals and some departure from normality.
wins and the two independent variables (ERA and league). (f) For
X1: tSTAT = -6.9184 6 -2.0518, the p@value = 0.0000. Reject H0. ERA
makes a significant contribution and should be included in the model. CHAPTER 15
For X2: tSTAT = -1.1966 7 -2.0518, p@value = 0.2419 7 0.05. Do not
15.2 (a) Predicted HOCS is 2.8600, 3.0342, 3.1948, 3.3418, 3.4752,
reject H0. The league does not make a significant contribution and should
3.5950, 3.7012, 3.7938, 3.8728, 3.9382, 3.99, 4.0282, 4.0528, 4.0638,
not be included in the model. Based on these results, the regression model
4.0612, 4.045, 4.0152, 3.9718, 3.9148, 3.8442, and 3.76. (c) The curvilinear
with only the ERA as the independent variable should be used.
relationship suggests that HOCS increases at a decreasing rate. It reaches
(g) -24.2687 … b1 … -13.1676. (h) -7.8464 … b2 … 2.0639.
its maximum value of 4.0638 at GPA = 3.3 and declines after that as GPA
(i) r 2adj = 0.6165. 61.65% of the variation in wins can be explained by the
continues to increase. (d) An r 2 of 0.07 and an adjusted r 2 of 0.06 tell you
variation in ERA and league after adjusting for number of independent
that GPA has very low explanatory power in identifying the variation in
variables and sample size. (j) r 2Y1.2 = 0.6394. Holding constant the effect
HOCS. You can tell that the individual HOCS scores are scattered widely
of league, 63.94% of the variation in number of wins can be explained
around the curvilinear relationship.
by the variation in ERA. r2Y2.1 = 0.0504. Holding constant the effect of
ERA, 5.04% of the variation in number of wins can be explained by the 15.4 (a) Yn = -5.48730 - 21.5105X1 + 3.9633X2 where X1 = alcohol %
variation in league. (k) The slope of the number of wins with ERA is the and X2 = carbohydrates. FSTAT = 2,258.7579 p@value = 0.0000 6 0.05,
same, regardless of whether the team belongs to the American League or so reject H0. At the 5% level of significance, the linear terms are significant
the National League. (l) For X1X2: tSTAT = 1.175 6 2.0555 the p-value is together. (b) Yn = 10.0421 + 15.0776X1 + 4.5851X2 + 0.4874X21 -
0.2506 7 0.05. Do not reject H0. There is no evidence that the interaction 0.0209X22, where X1 = alcohol % and X2 = carbohydrates.
term makes a contribution to the model. (m) The model with one inde- (c) FSTAT = 1,154.1043 p@value = 0.0000 6 0.05, so reject H0. At the
pendent variable (ERA) should be used. 5% level of significance, the model with quadratic terms are significant.
tSTAT = 2.2414, and the p@value = 0.0264. Reject H0. There is enough
14.80 The multiple regression model is Predicted base
evidence that the quadratic term for alcohol % is significant at the 5% level
salary = 48,091.7853 + 8,249.2156 (gender) + 1,061.4521 (age).
of significance. tSTAT = -1.2313, p@value = 0.2201. Do not reject H0.
Holding constant the age of the person, the mean base salary is predicted
There is insufficient evidence that the quadratic term for carbohydrates is
to be $8,249.22 higher for males than for females. Holding constant the
significant at the 5% level of significance. Hence, because the quadratic
gender of the person, for each addition year of age, the mean base salary
term for alcohol is significant, the model in (b) that includes this term is
is predicted to be $1,061.45 higher. The regression model with the two
better. (d) The number of calories in a beer depends quadratically on the
independent variables has F = 118.0925 and a p@value = 0.0000. So,
­alcohol ­percentage but linearly on the number of carbohydrates. The alcohol
you can conclude that at least one of the independent variable makes a
­percentage and number of carbohydrates explain about 96.79% of the
significant contribution to the model to predict base pay. Each independent
­variation in the number of calories in a beer.
variable makes a significant contribution to the regression model given
that the ­other variable is included. (tSTAT = 3.9937, p@value = 0.0001 15.6 (b) price = 18,029.9837 - 1,812.9389 age + 63.2116 age2.
for gender and tSTAT = 14.8592, p@value = 0.0000 for age). Both (c) 18,029.9837 - 1,812.9389(5) + 63.2116(5)2 = $10,545.58.
independent ­variables should be included in the model. 37.01% of the (d) There are no patterns in any of the residual plots. (e) FSTAT =
variation in base salary can be explained by gender and age. There is 243.5061 7 3.27. Reject H0. There is a significant quadratic r­ elationship
no pattern in the residuals and no other violations of the ­assumptions, between age and price. (f) p@value = 0.0000. The probability of
so the model appears to be appropriate. Including an interaction FSTAT = 243.5061 or higher is 0.0000, given the null hypothesis is true.
Self-Test Solutions and Answers to Selected Even-Numbered Problems 787

(g) tSTAT = 4.8631 7 2.0281. Reject H0. (h) The probability of does not reveal any strong patterns. The errors appear to be normally
tSTAT 6 -4.8631 or 7 4.8631 is 0.0000, given the null hypothesis is true. distributed.
(i) r 2 = 0.9312. 93.12% of the variation in price can be explained by the
15.32 (a) Best model: Cp = 2.1558, predicted fair market value =
quadratic relationship between age and price. (j) adjusted r 2 = 0.9273.
260.6791 + 362.8318 land + 0.1109 house size (sq ft) - 1.7543 age.
(k) There is a strong quadratic relationship between age and price.
(b) The adjusted r 2 for the best model in 15.32(a), 15.33(a), and 15.34(a)
15.8 (a) 215.37. (b) For each additional unit of the logarithm of X1, the are, respectively, 0.8242, 0.9047, and 0.8481. The model in 15.33(a) has
logarithm of Y is estimated to increase by 0.9 unit, holding all other the highest explanatory power after adjusting for the number of independ-
variables constant. For each additional unit of the logarithm of X2, the ent variables and sample size.
logarithm of Y is estimated to increase by 1.41 units, holding all other
15.34 (a) Predicted fair maket value = 145.1217 + 149.9337 land +
variables constant.
0.0913 house size (sq. ft.). (b) The adjusted r 2 for the best model in
15.10 (a) 2Yn = 6.2417 + 0.7768X1 + 0.1683X2, where 15.32(a), 15.33(a), and 15.34(a) are, respectively, 0.8242, 0.9047, and
X1 = alcohol % and X2 = carbohydrates. (b) The normal probability 0.8481. The model in 15.33(a) has the highest explanatory power after
plot of the linear model showed departure from a normal distribution, so adjusting for the number of independent variables and sample size.
a square-root transformation of calories was done. FSTAT = 1,720.6801.
15.36 Let Y = fair market value, X1 = land area, X2 = interior size,
Because the p-value is 0.0000, reject H0 at the 5% level of significance.
X3 = age, X4 = number of rooms, X5 = number of bathrooms, X6 =
There is evidence of a significant linear relationship between the square
garage size, X7 = 1 if Glen Cove and 0 otherwise, and X8 = 1 if Roslyn
root of calories and the percentage of alcohol and the number of carbohy-
and 0 otherwise. (a) The VIFs of X2, X3, and X7 are greater than 5.
drates. (d) r 2 = 0.9569. So 95.69% of the variation in the square root of
Dropping X2 with the largest VIF, X3 still has a VIF greater than 5.
calories can be explained by the variation in the percentage of alcohol and
After dropping X2 and X3, all remaining VIFs are less than 5 so there
the number of carbohydrates. (e) Adjusted r 2 = 0.9563. (f) The model in
is no ­reason to suspect collinearity between any pair of variables.
Problem 15.4 is slightly better because it has a higher r 2.
The ­following is the multiple regression model that has the smallest
15.12 (a) Predicted ln(Price) = 9.7771 - 0.10622 Age. Cp(4.3211) and the highest adjusted r 2(0.6815):
(b) $10,573.4350. (c) The model is adequate. (d) tSTAT = -19.4814 6
Fair Market Value = 49.2379 + 579.0105 Land + 109.5767 Baths
-2.0262; reject H0. (e) 91.12%. 91.12% of the variation in the natural log
of price can be explained by the age of the auto. (f) 90.88%. (g) Choose the + 48.2282 Garage + 213.2326 Roslyn
model from Problem 15.6. That model has a higher adjusted r 2 of 92.73%. The individual t test for the significance of each independent variable at the
15.14 1.25. 5% level of significance concludes that only property size, baths, and
the dummy variable Roslyn are significant given that the others are in the
1 model. The following is the multiple regression result for the model chosen
15.16 R21 = 0.0634, VIF1 = = 1.0677, R22 = 0.0634,
1 - 0.0634 by stepwise regression:
1
VIF2 = = 1.0677. There is no evidence of collinearity Fair Market Value = 30.3016 + 611.6910 Land + 130.7788 Baths
1 - 0.0634
becasue both VIFs are 6 5. + 214.2567 Roslyn
15.18 VIF = 1.0066 6 5. There is no evidence of collinearity. All the variables are significant individually at the 5% level of signifi-
cance. Combining the stepwise regression and the best-subsets regres-
15.20 VIF = 1.0105. There is no evidence of collinearity. sion results along with the individual t test results, the most appropriate
15.22 (a) 35.04. (b) Cp 7 3. This does not meet the criterion for multiple regression model for predicting the fair market value is the
­consideration of a good model. stepwise regression model. (b) The estimated fair market value in Roslyn
is $214.2567 thousands above Glen Cove or Freeport for two otherwise
15.24 Let Y = asking price, X1 = lot size, X2 = living space, and identical properties.
X3 = number of bedrooms. X4 = number of bathrooms, X5 = age, and
X6 = fireplace (0 = No, 1 = Yes). Based on a full regression model 15.38 In the multiple regression model with catalyst, pH, pressure,
involving all of the variables, all the VIF values (1.3953, 2.1175, 2.0878, temperature, and voltage as independent variables, none of the varia-
2.3537, 1.7807, and 1.0939, respectively) are less than 5. There is no bles has a VIF value of 5 or larger. The best-subsets approach showed
reason to suspect the existence of collinearity. Based on a best-subsets that only the model containing X1, X2, X3, X4, and X5 should be con-
regression and examination of the resulting Cp values, the best model sidered, where X1 = catalyst, X2 = pH, X3 = pressure, X4 = temp,
appear to be a model with variables X2 and X6, which has Cp = 0.8701. and X5 = voltage. Looking at the p-values of the t statistics for
Models that add other variables do not change the results very much. each slope coefficient of the model that includes X1 through X5 re-
Based on a stepwise regression analysis with all the original variables, veals that pH level is not significant at the 5% level of significance
only variables X2 and X6 make a significant contribution to the model at (p@value = 0.2862). The multiple regression model with pH level
the 0.05 level. Thus, the best model is the model using the living area of deleted shows that all coefficients are significant individually at the
the house (X2) and fireplace X6 should be included in the model. This 5% level of significance. The best linear model is determined to be
was the model developed in Section 14.6. Yn = 3.6833 + 0.1548X1 - 0.04197X3 - 0.4036X4 + 0.4288X5.
The overall model has F = 77.0793, with a p-value that is virtually
15.30 (a) An analysis of the linear regression model with all of the three 0. r 2 = 0.8726, r2adj = 0.8613. The normal probability plot does not
possible independent variables reveals that the highest VIF is only 1.06. suggest possible violation of the normality assumption. A residual
A stepwise regression model selects only the supplier dummy variable analysis reveals a potential nonlinear relationship in temperature. The
for inclusion in the model. A best-subsets regression produces only one p-value of the squared term for temperature (0.1273) in the following
model that has a Cp value less than or equal to k + 1 which is the model quadratic ­transformation of temperature does not support the need for a
that includes pressure and the supplier dummy variable. This model is ­quadratic transformation at the 5% level of significance. The p-value of
Yn = -31.5929 + 0.7879X2 + 13.1029X3. This model has F = 5.1088 the ­interaction term between pressure and temperature (0.0780) indicates
with a p@value = 0.027. r 2 = 0.4816, r2adj = 0.3873. A residual analysis that there is not enough evidence of an interaction at the 5% level of
788 Self-Test Solutions and Answers to Selected Even-Numbered Problems

s­ ignificance. The best model is the one that includes catalyst, pressure, Performance
temperature, and voltage, which explains 87.26% of the variation in Decade (%) MA(3) ES(W = 0.5) ES(W = 0.25)
thickness.
1940s 9.6 8.5333 6.5912 6.5308
15.40 Best subset regression produced several models that 1950s 18.2 12.0333 12.3956 9.4481
had Cp … k + 1. They were X2X3 = 3.9, X2X3X4 = 3.3, and
1960s 8.3 11.0333 10.3478 9.1611
X1X2X3X4 = 4.7. Stepwise regression produced a model that included
only X2 (median home value) and X4 (average commuting time). Because 1970s 6.6 10.5000 8.4739 8.5208
X2 (median home value), X3 (violent crime rate), and average commut- 1980s 16.6 13.6000 12.5370 10.5406
ing time (X4) had a low Cp, this model was chosen for further analysis. 1990s 17.6 11.2333 15.0685 12.3055
The ­residual plot for all the independent variables showed only random
­patterns and no violations in the assumptions. The model is 2000s -0.5 #N/A 7.2842 9.1041

Median Average Annual Salary = 16,830 + 38.256 Median (d) Yn2010 = E2000 = 7.2842 (e) Yn2010 = E2000 = 9.1041. (f) The expo-
nentially smoothed forecast for the 2010s with W = 0.5 is lower than
home value ($000) - 9.534
that with W = 0.25. (g) According to the exponential smoothing with
Violent crime>100,000 residents W = 0.25, there appears to be a general upward trend in the performance
+ 1,053 average commuting of the stocks in the past.
time in minutes 16.8 (b), (c), (e)
The r 2 of this model is 0.847, meaning that 84.7% of the variation in Av- Year IPOs MA 3-Yr ES(W ∙ .50) ES(W ∙ .25)
erage Annual Salary can be explained by variation in median home value, 2001 79 #N/A 79.0000 79.0000
variation in violent crime, and variation in average commuting time.
2002 66 69.3333 72.5000 75.7500
2003 63 100.6667 67.7500 72.5625
CHAPTER 16 2004 173 131.6667 120.3750 97.6719
16.2 (a) 1988. (b) The first four years and the last four years. 2005 159 163.0000 139.6875 113.0039
2006 157 158.3333 148.3438 124.0029
16.4 (b), (c), (e)
2007 159 112.3333 153.6719 132.7522
Hours
Year Per Day MA(3) ES(W ∙ 0.5) ES(W ∙ 0.25) 2008 21 73.6667 87.3359 104.8141
2008 2.2 #N/A 2.2000 2.2000 2009 41 51.0000 64.1680 88.8606
2009 2.3 2.3000 2.2500 2.2250 2010 91 71.0000 77.5840 89.3955
2010 2.4 2.4333 2.3250 2.2688 2011 81 88.3333 79.2920 87.2966
2011 2.6 2.5000 2.4625 2.3516 2012 93 110.3333 86.1460 88.7224
2012 2.5 2.4667 2.4813 2.3887 2013 157 152.3333 121.5730 105.7918
2013 2.3 2.3333 2.3906 2.3665 2014 207 160.3333 164.2865 131.0939
2014 2.2 2.2333 2.2953 2.3249 2015 117 132.3333 140.6432 127.5704
2015 2.2 2.2000 2.2477 2.2937 2016 73 #N/A 106.8216 113.9278
2016 2.2 #N/A 2.2238 2.2702 (d) W = 0.5: Yn2017 = E2016 = 106.8216; W = 0.25: Yn2017 = E2016 =
113.9278. (f) The exponentially smoothed forecast for 2017 with
(d) W = 0.5: Yn2017 = E2016 = 2.2238; W = 0.25: Yn2017 = E2016 =
W = 0.5 is lower than that with W = 0.25.
2.3249. (f) The exponentially smoothed forecast for 2017 with W = 0.5
is slightly lower than that with W = 0.25. A smoothing coefficient of 16.10 (a) The Y intercept b0 = 4.0 is the fitted trend value reflecting the
W = 0.25 smooths out the hours less than W = 0.50. real total revenues (in millions of dollars) during the origin, or base year,
1994. (b) The slope b1 = 1.5 indicates that the real total revenues are
16.6 (b), (c), (e)
increasing at an estimated rate of $1.5 million per year. (c) Year is 2000,
Performance X = 2000 - 1996 = 4, Yn5 = 4.0 + 1.5(4) = 10.0 million dollars.
Decade (%) MA(3) ES(W = 0.5) ES(W = 0.25) (d) Year is 2017, X = 2017 - 1996 = 21, Yn20 = 4.0 + 1.5(21) = 35.5
1830s 2.8 #N/A 2.8000 2.8000 million dollars. (e) Year is 2020, X = 2020 - 1996 = 24, Yn23 = 4.0 +
1840s 12.8 7.4000 7.8000 5.3000 1.5(24) = 40 million dollars.
1850s 6.6 10.6333 7.2000 5.6250 16.12 (b) Linear trend: Yn = 99.5412 + 3.7912X, where X is relative to
1860s 12.5 8.8667 9.8500 7.3438 2000. (c) Quadratic trend: Yn = 75.922 + 13.2389X - 0.5905X 2, where
X is relative to 2000. (d) Exponential trend: log10Yn = 1.9726 + 0.0154X,
1870s 7.5 8.6667 8.6750 7.3828
where X is relative to 2000.
1880s 6.0 6.3333 7.3375 7.0371 (e) Linear trend: Yn2017 = 99.5412 + 3.7912(17) = 163.9912
1890s 5.5 7.4667 6.4188 6.6528 Yn2018 = 99.5412 + 3.7912(18) = 167.7824
1900s 10.9 6.2000 8.6594 7.7146 Quadratic trend: Yn2017 = 75.9220 + 13.2389(17) - 0.5905(17)2
1910s 2.2 8.8000 5.4297 6.3360 = 130.1338
Yn2018 = 75.9220 + 13.2389(18) - 0.5905(18)2 = 122.9059
1920s 13.3 4.4333 9.3648 8.0770
Exponential trend: Yn2017 = 101.9726 + 0.0154(17) = 171.4728
1930s -2.2 6.9000 3.5824 5.5077 Yn2018 = 101.9726 + 0.0154(18) = 177.6593
Self-Test Solutions and Answers to Selected Even-Numbered Problems 789

(f) The quadratic trend model fit the data better than the linear trend 16.30 (a) Because the p@value = 0.515 7 0.05 level of significance, the
or exponential trend models and, hence, that forecast should be used. third-order term can be dropped. (b) Because the p@value = 0.594 7 0.05
level of significance, the second-order term can be dropped. (c) Because the
16.14 (b) Yn = 246.7986 + 71.0028X where X = years relative to 1978.
p-value is 0.0000, the first-order term is significant. (d) The most appropriate
(c) X = 39, Yn = 3,015.907 billion X = 40 Yn = 3,086.91 billion
model for forecasting is the first-order autoregressive model:
(d) There is an upward trend in federal receipts between 1978 and 2016.
Yn2018 = 0.0132 + 1.0473Y2017 = $4.9355 million.
The trend appears to be linear.
16.32 (a) 2.121. (b) 1.50.
16.16 (b) Linear trend: Yn = -6,786.2833 + 1,952X, where X is relative
to 2002. (c) Quadratic trend: Yn = 4,667.05 - 3,333.361 + 377.5824X 2, 16.34 (a) The residuals in the linear, quadratic, and exponential trend
where X is relative to 2002. (d) Exponential trend: log10Yn = 2.3228 + model show strings of consecutive positive and negative values. (b), (c)
0.1401X, where X is relative to 2002. (e) Linear trend: Yn2015 = 22,505.61
million KWh Yn2018 = 24,458.402 million KWh Linear Quadratic Exponential AR2
Quadratic trend: Yn2017 = 39,622.68 million KWh Syx 7,449.3680 3,332.5112 6,481.891 1,484.3969
Yn2018 = 47,994.17 millions of KWh MAD 5,785.5073 2,612.3410 3,199.24 977.5796
Exponential trend: Yn2017 = 26,533.8946 million KWh (d) The residuals in the three trend models show strings of consecutive
Yn2018 = 36,632.706 million KWh. positive and negative values. The autoregressive model performs well for
16.18 (b) Linear trend: Yn = 1.9998 + 0.1389X, where X is relative to the historical data and has a fairly random pattern of residuals. It has
2000. (c) Quadratic trend: Yn = 2.1902 + 0.0675X - 0.0042X 2, where X the smallest values in MAD and SYX. The autoregressive model would
is relative to 2000. (d) Exponential trend: log10Yn = 0.3289 + 0.0191X, be the best model for forecasting.
where X is relative to 2000. 16.36 (b), (c)
The quadratic and exponential models appear to fit the data equally, so
choose the quadratic model because it is simplest. Linear Quadratic Exponential AR1
(f) The forecast using the quadratic model is: Syx 31.3035 29.2719 29.6198 30.1604
Quadratic trend: Yn2018 = 4.7663 millions
MAD 22.3806 21.2215 23.0246 22.3010
16.20 (b) There has been an upward trend in the CPI in the United States
(d) The residuals in the linear and exponential trend models show strings of
over the 52-year period.
consecutive positive and negative values. The quadratic and autoregressive
(c) Linear trend: Yn = 16.5346 + 4.4751X. (d) Quadratic trend:
models have a fairly random pattern of residuals. There is very little dif-
Yn = 18.6219 + 4.2246X + 0.0049X 2. (e) Exponential trend:
ference in MAD and SYX between the quadratic and autoregressive models.
log10Yn = 1.5764 + 0.0180X. (f) Choose the linear model because it is
Either the quadratic or autoregressive model can be chosen for forecasting.
simplest. (g) Linear trend:
For 2017: Yn2017 = 249.2405 16.38 (b), (c)
For 2018: Yn2018 = 253.7156
Linear Quadratic Exponential AR1
16.22 (a) For Time Series I, the graph of Y versus X appears to be more
Syx 0.1654 0.1305 0.1227 0.1245
linear than the graph of log Y versus X, so a linear model appears to be
more appropriate. For Time Series II, the graph of log Y versus X appears MAD 0.1254 0.1009 0.1039 0.0935
to be more linear than the graph of Y versus X, so an exponential model (d) The residuals in the linear and exponential trend models show strings
appears to be more appropriate. of consecutive positive and negative values. The quadratic and autoregres-
(b) Time Series I : Yn = 100.0731 + 14.9776X, where X = years sive models have a fairly random pattern of residuals. The MAD and SYX
relative to 2005 values are similar in the quadratic, exponential, and autoregressive
Time Series II: Yn = 101.9982 + 0.0609X, where X = years relative to 2005. models. The quadratic or autoregressive model would be the best
model for forecasting due to their fairly random pattern of residuals.
(c) X = 12 for year 2017 in all models. Forecasts for the year 2017:
Time Series I: Yn = 100.0731 + 14.9776(12) = 279.8045 16.40 (a) log bn0 = 2, bn0 = 100. This is the fitted value for January 2011
Time Series II: Yn = 101.9982 + 0.0609(12) = 535.6886. prior to adjustment with the January multiplier.
16.24 tSTAT = 2.40 7 2.2281; reject H0. (b) log bn1 = 0.01, bn1 = 1.0233. The estimated monthly compound
growth rate is 2.33%.
16.26 (a) tSTAT = 1.60 6 2.2281; do not reject H0. (c) log bn2 = 0.1, bn2 = 1.2589. The January values in the time series are
16.28 (a) Because the p@value = 0.7509 7 0.05 level of sig- estimated to have a mean 25.89% higher than the December values.
nificance, the third-order term can be dropped. (b) Because the
16.42 (a) log bn0 = 3.0, bn0 = 1,000. This is the fitted value for the first
p@value = 0.3448 7 0.05, the second-order term can be dropped.
quarter of 2013 prior to adjustment by the quarterly multiplier.
(c)
(b) log bn1 = 0.1, bn1 = 1.2589. The estimated quarterly compound
Coefficients Standard Error t Stat p-value growth rate is (bn1 - 1)100% = 25.89%.
Intercept 56.5818 28.2785 2.0009 0.0652 (c) log bn3 = 0.2, bn3 = 1.5849.
YLag1 0.5808 0.2107 2.7564 0.0155 16.44 (a) The retail industry is heavily subject to seasonal variation due to
Because the p-value = 0.0155 6 0.05 the first-order term cannot be the holiday seasons and so are the revenues for Toys R Us.
dropped. (b) There is obvious seasonal effect in the time series.
(d) The most appropriate model for forecasting is the first-order auto­ (c) log10Yn = 3.6522 + 0.0014X - 0.3600Q1 - 0.3604Q2 - 0.3390Q3.
regressive model: (d) log10 bn1 = 0.0014. bn1 = 1.0032. The estimated quarterly compound
growth rate is (bn1 - 1)100% = 0.32%.
Yn2017 = 56.5818 + 0.5808Y2016 = 136.8484.
(e) log10 bn2 = -0.3600. bn2 = 0.4365. (bn2 - 1)100% = -56.35%. The
Yn2018 = 56.5818 + 0.5808Yn2017 = 136.0635. 1st quarter values in the time series are estimated to have a mean 56.35%
790 Self-Test Solutions and Answers to Selected Even-Numbered Problems

below the 4th quarter values. log10bn3 = -0.3604. bn3 = 0.4361. mean tear is 0.45. Thus, you would recommend that a plate gap of less
(bn3 - 1)100% = -56.39%. than 0 be used to minimize tears in the bag.

The 2nd quarter values in the time series are estimated to have a mean 17.4 The r 2 for the regression tree model is 0.789. The first split is based
56.39% below the 4th quarter values. on 831 square feet. Moves of at least 831 sq. ft. have a mean moving time
log10 bn4 = -0.3390. bn 4 = 0.4581. (bn4 - 1)100% = -54.19%. of 51.1875 hours. Moves of less than 831 square feet have a mean moving
time of 22.6071 hours. Among moves of less than 831 sq. ft., moves of
The 3rd quarter values in the time series are estimated to have a mean
less than 486 sq. ft., have a mean moving time of 15.7955 hours. Moves
54.19% below the 4th quarter values. (f) Forecasts for the last three
of less than 344 sq. ft. have a mean moving time of 12.75 hours. Moves of
quarters of 2017 and all of 2018 are 2,577.9471, 2,750.4706, 6,018.1637,
between 344 and 486 sq. ft. have a mean moving time of 18.3333 hours.
2,605.5535, 2,611.5299, 2,752.2508, and 6,026.7614 millions.
Moves of between 486 and 830 sq. ft. have a mean moving time of
16.46 (b) log10(Predicted Y) = 2.2313 + 0.0007X - 0.1871M1 - 27.0147 hours. Moves between 486 and 599 sq. ft. have a mean moving
0.1241M2 - 0.0144M3 - 0.1196M4 - 0.0902M5 + 0.0560M6 - time of 24.825 hours. Moves between 600 and 830 have a mean moving
0.0725M7 - 0.0207M8 + 0.0677M9 - 0.0056M10 - 0.0802M11. time of 30.1429 hours. Moves between 557 and 599 sq. ft. have a mean
(c) 216.5938, 183.2386, 154.5297, 186.1607. (e) 0.1613% moving time of 24.05 hours. Moves between 486 and 557 sq. ft have a
(f) 0.8463(bn8 - 1)100% = -15.37%. The July values in the time series mean moving time of 25.6 hours.
are estimated to have a mean 15.37% below the December values.
17.6 (b) The r 2 for the classification tree model is 0.434. The first split
16.48 (b) log10(Predicted Y) = 0.9640 + 0.0087X + 0.045Q1 + is for the 8 customers who called 50 or more times. Among customers
0.0083Q2 + 0.0130Q3 who called fewer than 50 times, those who called at least seven times and
(c) 2.0234%, after adjusting for the seasonal component. visited two or more times are more likely to churn.
(d) 10.92% above the fourth-quarter values.
17.8 Because half the data will be used for a validation sample, the results
(e) Last quarter, 2016: Y = $25.4407.
will differ depending on which values are in the training sample and
(f) 2017: 28.7858, 26.9885, 27.8299, 27.5524.
which are in the validation sample.
16.60 (b) Linear trend: Yn = 173,789.3351 + 2,459.3332X where X is
17.10  (b) The first two cereals to cluster are Wheaties and Nature’s Path
relative to 1984.
Organic Multigrain Flakes followed by Post Shredded Wheat Vanilla
(c) 2017: Yn2017 = 256,024.235 thousands
Almond and Kellogg’s Mini Wheats. At the two cluster level, one cluster
2018: Yn2018 = 257,406.665.
contains Post Shredded Wheat Vanilla Almond and Kellogg’s Mini
(d) (b) Linear trend: Yn = 116,723.2674 + 1,435.6197X,
Wheats and the other cluster contains the other five cereals.
where X is­­relative to 1984.
(c) 2017: Yn2017 = 164,098.716 thousands. 2018: Yn2018 = 165,534.336 17.12 The optimal number of clusters in the range between three and
thousands. five is 3 (CCC = -1.4223). The first cluster consists of Russia, Poland,
­Lebanon, Malaysia, Argentina, Chile, Venezuela, Turkey, Brazil, and
16.62 (b) Linear trend: Yn = -2.6364 + 0.7247X, where X is relative to
­Mexico. The mean GDP per capita of this cluster is 21,085.4 and the social
1975.
media usage % is 81.4. The second cluster consists of Ukraine, Jordan,
(c) Quadratic trend: Yn = 0.2377 + 0.2935X + 0.0105X 2, where X is
Philippines, Vietnam, Peru, South Africa, Indonesia, Ghana, Kenya,
relative to 1975.
­Senegal, Tanzania, Uganda, Nigeria, and Ethiopia. The mean GDP per
Exponential trend: log10Yn = 0.2115 + 0.0345X, where X is relative to
capita of this cluster is 6,613.36 and the social media usage % is 80.14.
1975.
The third cluster consists of China, India, Pakistan, and Burkina Faso. The
Test of A3: p@value = 0.14 7 0.05. Do not reject H0 that A2 = 0. Third-­
mean GDP per capita of this cluster is 6,768.75 and the social media usage
order term can be deleted. A third-order autoregressive model is not
% is 60. Thus, cluster 1 is characterized by high GDP and high social
appropriate. Test of A2: p@value = 0.0042 6 0.05 Reject H0. The second-­
media usage. Cluster 2 is characterized by low GDP and high social media
order term cannot be deleted. The second-order model is appropriate.
usage. Cluster 3 is characterized by low GDP and high social media usage.
AR(2): Yni = 0.3681 + 1.5164Yi - 1 - 0.5249Yi - 2
17.14 The optimal number of clusters in the range between three and
Linear Quadratic Exponential AR2
eight is 6 (CCC = 2.4411). The first cluster consists of Austria, Canada,
Syx 1.9955 1.4253 3.7962 0.7264 Germany, Hungary, India, Ireland, Israel, New ­Zealand, Poland, Portugal,
MAD 1.7133 0.8943 2.1893 0.4681 Russia, Slovakia, Spain, Taiwan, and Thailand. The mean connection
(h) The residuals in the first three models show strings of consecutive speed is 10.44 Mbps, the mean peak connection speed is 54.9067 Mbps,
positive and negative values. The autoregressive model performs well for 86% are above 4 Mbps, and 36.0667% are above 18.15 Mbps. The second
the historical data and has a fairly random pattern of residuals. It also has cluster consists of Hong Kong and South Korea. The mean connection
the smallest values in the standard error of the estimate and MAD. Based speed is 10.44 Mbps, the mean peak connection speed is 93.85 Mbps,
on the principle of parsimony, the autoregressive model would probably 94% are above 4 Mbps, and 63.5% are above 10 Mbps. The third cluster
be the best model for forecasting. consists only of Singapore. The mean connection speed is 12.5 Mbps, the
mean peak connection speed is 13.5 Mbps, 87 % are above 4 Mbps, and
(i) Yn2015 = $28.3149 billions.
51% are above 10 Mbps.
The fourth cluster consists of Belgium, Czech Republic, Denmark,
Finland, Japan, Netherlands, Norway, Rumania, Sweden, Switzerland,
CHAPTER 17 United Kingdom, and the United States. The mean connection speed
is 14.6167 Mbps, the mean peak connection speed is 60.975 Mbps,
17.2 The r 2 for the regression tree model is 0.373. The first split is based
90.0833% are above 4 Mbps, and 52.75% are above 10 Mbps. The fifth
on a plate gap of 1.8. For those bags with a plate gap less than 1.8, the
cluster consists of Argentina, Bolivia, Brazil, China, Costa Rica, Ecuador,
mean tear is 0.3107. For those bags with a plate gap at least 1.8, the mean
Philippines, South Africa, Venezuela, and VietNam. The mean connection
tear is 1.98. For those bags with a plate gap less than 0.0, the mean tear is
speed is 3.4269 Mbps, the mean peak connection speed is 21.3538 Mbps,
0.06. For those bags with a plate gap less than 1.8 but greater than 0, the
Self-Test Solutions and Answers to Selected Even-Numbered Problems 791

66.25% are above 4Mbps, and 1.3692% are above 10 Mbps. The sixth 17.30 The r 2 of the regression tree model is 0.731. The prime determinant
cluster consists of Australia, Chile, Colombia, France, Italy, Malaysia, of wins is the ERA Teams with an ERA below 4.05 had a mean of 91.6667
Mexico, Peru, Sri Lanka, Turkey, United Arab Republic, and Uruguay. wins while teams with an ERA above 4.05 had a mean of 76.2857 wins.
The mean connection speed is 5.9333 Mbps, the mean peak connection Teams with an ERA above 4.05 who had at least 44 saves had a mean of
speed is 37.9167 Mbps, 66.25% are above 4Mbps, and 8.15% are above 83.5 wins while teams with fewer than 44 saves had a mean of 71.8461
10 Mbps. wins. Teams with an ERA above 4.05 who had fewer than 44 saves and an
Cluster 1 is characterized by moderate mean connection speed, ERA below 4.33 had a mean of 77.4 wins. (Those that had an ERA above
­moderately high mean peak connection speed, high % are above 4.33 had a mean of 68.375 wins.)
4 Mbps, and moderate % are above 10Mbps. Cluster 2 is characterized by
very high mean connection speed, very high mean peak connection speed, 17.32   (c) The first two foods to cluster are Cantonese and American,
very high % are above 4 Mbps, and very high % are above 10 Mbps. followed by French and Mandarin, followed by Spanish and Greek. At the
Cluster 3 (Singapore) is characterized by high mean connection speed, two cluster level, the first cluster includes Japanese, French, ­Mandarin,
extremely high mean peak connection speed, high % are above 4 Mbps, Szechuan, and Mexican. The second cluster includes ­Cantonese,
and high % are above 10 Mbps. ­American, Spanish, Greek, and Italian. Because the stress statistic is
Cluster 4 is characterized by high mean connection speed, 0.0468 in four dimensions, 0.1164 in three dimensions, 0.2339 in two
­moderately high mean peak connection speed, high % are above ­dimensions, and 0.4079 in one dimension, it is reasonable to try to first try
4 Mbps, and moderate % are above 10 Mbps. Cluster 5 is characterized interpret a two-dimensional mapping of the foods. There does not seem
by very low mean connection speed, low mean peak connection speed, to be a clear interpretation of the dimensions along the lines of the three
low % are above 4 Mbps, and very low % are above 10 Mbps. Cluster scales. The two spicy foods, Mexican and Szechuan are close to each other
6 is characterized by low mean connection speed, moderately low as are French and Greek, and Japanese and American. Italian is separated
mean peak connection speed, moderate % are above 4 Mbps, and very by itself as is Spanish.
low % are above 10Mbps. 17.34 The r 2 of the regression tree model is 0.727. The first split is
17.16  The correspondence analysis plot shows that online guests are based on the living space of 2,220 square feet. Houses with a living
associated with purchasing household items while online members are space 6 2,220 square feet have a mean asking price of $558,678.95 while
strongly associated with grocery items and in-store customers more houses with a living space 7 2,220 square feet have a mean asking price
associated with hardlines and apparel than the two other categories. of $ $446,854.76. Houses with a living space 7 2,220 square feet that
Positive comments are most associated with apparel items, while house- have a brick exterior have a mean asking price of $499,616.67 while those
hold items are associated with negative comments. Those that post most without a brick exterior have a mean asking price of $585,938.46. Houses
frequently tend to post positive comments. Managers may want to further without a brick exterior that have a lot size greater than 0.46 (acres) have a
examine the experience of online guests purchasing household items as mean asking price of $634,760 while houses that have a lot size less than
such customers may be among the most disappointed by their shopping 0.46 (acres) have a mean asking price of $555,425.
experience. Houses with a living space 6 2,220 square feet that have a living
space 7 1,197 square feet have a mean asking price of $463,741.18
17.18  (b) Because the stress statistic is 0.0973 in three dimensions, while those with a living space below 1,197 square feet have a mean ask-
0.1308 in two dimensions, and 0.3147 in one dimension, it is reasonable ing price of $375,087.50. Houses with a living space 6 2,220 square feet
to try to ­interpret a two-dimensional mapping of the cereals. Looking at a that have a living space 7 1,197 square feet that have a fireplace have a
45° rotation, one dimension separates Post Shredded Wheat Vanilla mean asking price of $484,643.48 while those that do not have a fireplace
Almond and ­Kellogg’s Mini Wheats based on their higher calorie and have a mean asking price of $420,036.36.
sugar content. A second dimension does not seem to be interpretable. In Houses with a living space 6 2,220 square feet that have a living
addition, All Bran, which has lower calories and higher sugar is separated space 7 1,197 square feet that have a fireplace and a lot size 7 0.19
from the other cereals. acres have a mean asking price of $448,128.57 (those with a brick
­exterior have a mean asking price of $463,928.57 while those without a
17.20 The two-dimensional plot has a stress value of virtually 0.0000.
brick ­exterior have a mean asking price of $432,328.57) while those that
One of the dimensions appears to separate countries with high GDP
have a lot size 6 0.19 acres have a mean asking price of $541,444.44.
from those with low GDP. Many of the sub-Sahara African countries are
Houses with a living space 6 2,220 square feet that have a living
grouped together.
space 7 1,197 square feet that do not have a fireplace and have at least
17.22 The two-dimensional plot has a stress value of 0.0524. Singapore is three bathrooms have a mean asking price of $444,100 while those that
separated from the other countries. The countries opposite Singapore low have fewer than three bathrooms have a mean asking price of $391,160.
mean connection speed, moderately low mean peak connection speed, mod-
17.36 The optimal number of clusters in the range between three
erate % are above 4Mbps, and very low % are above 10Mbps. There appears
and eight is 6 (CCC = 3.97273). The first cluster consists of
to be a grouping of countries that have low mean connection speed, low
­Stonyfield Organic Greek. The second cluster consists of the six
mean peak connection speed, low % are above 4Mbps, and very low % are
regular yogurts. The third cluster consists of the Great Value Greek
above 10Mbps. Many of the countries that are opposite these have moderate
only. The fourth cluster consists of the Organic Valley Greek only. The
mean connection speed, moderately high mean peak connection speed, high
fifth cluster consists of the Trader Joe’s Plain Whole Greek only. The
% are above 4Mbps, and moderate % are above 10Mbps.
sixth cluster consists of ­Dannon Oikos, Wallaby Organic, and Chobani
17.28 Because half the data will be used for a validation sample, the Greek. You can conclude that the regular yogurts are different from
results will differ depending on which values are in the training sample the Greek yogurts and that many of the Greek yogurts are different
and which are in the validation sample. from each other.

You might also like