Professional Documents
Culture Documents
Applied Statistics in Business and Economics 7E Ise 7Th Edition David Doane Full Chapter
Applied Statistics in Business and Economics 7E Ise 7Th Edition David Doane Full Chapter
David P. Doane
Oakland University
Lori E. Seward
University of Colorado
Final PDF to printer
APPLIED STATISTICS
Published by McGraw Hill LLC, 1325 Avenue of the Americas, New York, NY 10121. Copyright © 2022 by
McGraw Hill LLC. All rights reserved. Printed in the United States of America. No part of this publication may
be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without
the prior written consent of McGraw Hill LLC, including, but not limited to, in any network or other electronic
storage or transmission, or broadcast for distance learning.
Some ancillaries, including electronic and print components, may not be available to customers outside the
United States.
This book is printed on acid-free paper.
1 2 3 4 5 6 7 8 9 LWI 24 23 22 21
ISBN 978-1-260-59764-6
MHID 1-260-59764-4
All credits appearing on page or at the end of the book are considered to be an extension of the copyright page.
The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a website does
not indicate an endorsement by the authors or McGraw Hill LLC, and McGraw Hill LLC does not guarantee the
accuracy of the information presented at these sites.
mheducation.com/highered
Lori E. Seward
Lori E. Seward is a teaching professor in The Leeds School of Business at the University
of Colorado in Boulder. She earned her Bachelor of Science and Master of Science degrees
in Industrial Engineering at Virginia Tech. After several years working as a reliability and
quality engineer in the paper and automotive industries, she earned her PhD from Virginia
Tech and joined the faculty at The Leeds School in 1998. Professor Seward has served as the
faculty director of Leeds’ MBA programs since 2017. She currently teaches as well as coor-
dinates the core statistics course for the Leeds full-time, Professional, and Executive MBA
programs. She served as the chair of the INFORMS Teachers’ Workshop for the annual 2004
meeting. Her teaching interests focus on developing pedagogy that uses technology to create
a collaborative learning environment in large undergraduate and MBA statistics courses. Her
most recent article, co-authored with David Doane, was published in the Journal of Statistics
Courtesy of Lori Seward Education (2011).
Dedication
To Robert Hamilton Doane-Solomon
David
v
From the Authors
“How often have you heard people/students say about a particular subject, ‘I’ll never
use this in the real world’? I thought statistics was a bit on the ‘math-geeky’ side at
first. Imagine my horror when I saw α, R2, and correlations on several financial reports
at my current job (an intern position at a financial services company). I realized then
that I had better try to understand some of this stuff.”
—Jill Odette (an introductory statistics student)
As recently as a decade ago our students used to ask us, “How do I use statistics?” Today we
more often hear, “Why should I use statistics?” Applied Statistics in Business and Economics
has attempted to provide real meaning to the use of statistics in our world by using real busi-
ness situations and real data and appealing to your need to know why rather than just how.
With over 50 years of teaching statistics between the two of us, we feel we have something
to offer. Seeing how students have changed as the new century unfolds has required us to
adapt and seek out better ways of instruction. So we wrote Applied Statistics in Business and
Economics to meet four distinct objectives.
Objective 1: Communicate the Meaning of Variation in a Business Context Variation
exists everywhere in the world around us. Successful businesses know how to measure varia-
tion. They also know how to tell when variation should be responded to and when it should be
left alone. We’ll show how businesses do this.
Objective 2: Use Real Data and Real Business Applications Examples, case studies, and
problems are taken from published research or real applications whenever possible. Hypo-
thetical data are used when it seems the best way to illustrate a concept.
Objective 3: Incorporate Current Statistical Practices and Offer Practical Advice
With the increased reliance on computers, statistics practitioners have changed the way they
use statistical tools. We’ll show the current practices and explain why they are used the way
they are. We also will tell you when each technique should not be used.
Objective 4: Provide More In-Depth Explanation of the Why and Let the Software
Take Care of the How It is critical to understand the importance of communicating with
data. Today’s computer capabilities make it much easier to summarize and display data than
ever before. We demonstrate easily mastered software techniques using the common software
available. We also spend a great deal of time on the idea that there are risks in decision mak-
ing and those risks should be quantified and directly considered in every business decision.
Our experience tells us that students want to be given credit for the experience they bring
to the college classroom. We have tried to honor this by choosing examples and exercises set
in situations that will draw on students’ already vast knowledge of the world and knowledge
gained from other classes. Emphasis is on thinking about data, choosing appropriate analytic
tools, using computers effectively, and recognizing limitations of statistics.
vi
∙ Chapter-end Software Supplements showing how to use R for applications in that chapter.
∙ Updated exercises with emphasis on compatibility with Connect®.
∙ Updated test bank questions matched with topics and learning objectives.
∙ New and updated Mini Cases for economics and business.
∙ New and updated exercise data sets, web links, Big Data Sets, and Related Reading.
∙ Many new guided examples on Connect. Students can watch 90 guided examples to aid
their learning.
∙ Connect® supplements including LearningStats demonstrations, illustrations of R calcula-
tions for common tasks, and video tutorials (both PC and Mac).
Software
Excel is used throughout this book because it is available everywhere. Some calculations are
illustrated using MegaStat and Minitab because they offer more capability than Excel’s Data
Analysis Tools. In recognition of growing interest in analytics training beyond Excel, our
textbook now provides an optional introduction to R with illustrations of topics in each chap-
ter. Our support for R is further enhanced with LearningStats modules, tables of R functions,
and R-compatible Excel data sets. To further assist students we provide Connect® tutorials or
demonstrations on using Excel, Minitab, MegaStat, and R. At the end of each chapter is a list
of LearningStats demonstrations that illustrate the concepts from the chapter.
Math Level
The assumed level of mathematics is pre-calculus, though there are rare references to calculus
where it might help the better-trained reader. All but the simplest proofs and derivations are
omitted, though key assumptions are stated clearly. The learner is advised what to do when
these assumptions are not fulfilled. Worked examples are included for basic calculations, but
the textbook does assume that computers will do the calculations after the statistics class is
over, so interpretation is paramount. End-of-chapter references and suggested websites are
given so that interested readers can deepen their understanding.
Exercises
Simple practice exercises are placed within each section. End-of-chapter exercises tend to be
more integrative or to be embedded in more realistic contexts. Attention has been given to
revising exercises so that they have clear-cut answers that are matched to specific learning
objectives. A few exercises invite short answers rather than just quoting a formula. Answers to
odd-numbered exercises are in the back of the book (all of the answers are in the instructor’s
manual).
LearningStats
Connect users can access LearningStats, a collection of Excel spreadsheets, Word documents,
and PowerPoints for each chapter. It is intended to let students explore data and concepts at
their own pace, ignoring material they already know and focusing on things that interest them.
LearningStats includes deeper explanations on topics such as how to write effective reports,
how to perform calculations, or how to make effective charts. It also includes topics that did
not appear prominently in the textbook (e.g., partial F test, Durbin–Watson test, sign test,
bootstrap simulation, and logistic regression). Instructors can use LearningStats PowerPoint
presentations in the classroom, but Connect users also can use them for self-instruction. No
instructor can “cover everything,” but students can be encouraged to explore LearningStats
data sets and/or demonstrations, perhaps with an instructor’s guidance.
David P. Doane
Lori E. Seward
vii
1
CHAPTER Chapter 4 Descriptive Statist
Statistics
Feet (thousands) Feet (thousands) Feet (thousands)
3,570 861 3,240 809 3,160 778
3,410 740 2,660 639 3,310 760
2,690 563 3,160 778 2,930 729
Chapter Contents 3,260
3,130
698
624
3,460
3,340
737
806
3,020
2,320
720
575
Each chapter begins with a short list of CHAPTER
3,460 CONTENTS
737 3,240 809 3,130 785
section topics that are covered in the 3,340 8061.1 What Is Statistics? 2,660 639
1.2 Why Study Statistics?
chapter. 1.3 Applying Statistics in Business
1.4 Statistical Challenges
1.5 Critical Thinking
can focus on material just learned. Despite a low score on3.2the (a)
midterm exam, you
Make a stem-and-leaf aretheright
plot for numberat
of the borderline
defects forforan
per 100 vehicles 8032 (if
these your
brands. (b)
Make a dot plot of the defects data. (c) Describe these two displays. (Hint: Refer to center, vari-
instructor rounds up). The weighted mean is widely used in cost accounting (weights for cost
ability, and shape.)
categories), finance (asset weights in investment portfolios), and other business applications.
2
Defects per 100 Vehicles (alphabetical by brand) JDPower
Analytics in Action
cycles. The length of a contraction is measured from the peak of the previous expansion
Walmart, Big Data, and Retail Analytics
to the beginning of the next expansion based on the real gross domestic product (GDP).
Table 3.3 shows the durations, in months, of 33 U.S. recessions.
These NEW features bring in real- Walmart processes overFrom the dot customer
a million plot in Figure 3.1, we see that
transactions the hour,
each 65-month contraction
which (1873–1879)
translates into
was quite unusual, although four recessions did exceed 30 months. Most recessions have
world examples to illustrate data ana- doa16287_ch04_100-157.indd 141
two to three petabyteslasted
of data each hour. (A petabyte is a million gigabytes!) What to do
less than 20 months. Only 7 of 33 lasted less than 10 months. The 8-month 2001
07/29/20
not making it to the shelf. Identifying the mistake and fixing it in time for customers
to purchase before the big night prevents a loss in profit. Big data mean that real-time
January 1893 (I) June 1894 (II) 17 April 1960 (II) February 1961 (I) 10
December 1895 (IV) June 1897 (II) 18 December 1969 (IV) November 1970 (IV) 11
June 1899 (III) December 1900 (IV) 18 November 1973 (IV) March 1975 (I) 16
September 1902 (IV) August 1904 (III) 23 January 1980 (I) July 1980 (III) 6
Figure 3.1
Dot Plot of Business Cycle Duration (n = 33)
0 10 20 Random
30 Sampling
40 Methods
50 60 70
We will Number
first discuss the four random sampling techniques shown in Table 2.5 and then
of Months
describe three commonly used non-random sampling techniques, summarized in Table 2.8. First Pages
Simple Random Sample Use random numbers to select items from a list (e.g., Visa Table 2.5
cardholders). Random Sampling
3.2 FREQUENCY DISTRIBUTIONS
Systematic Sample
80
Select every kth item from a list or sequence (e.g., restaurant
Applied Statistics in Business and Economics
customers).
Methods
AND HISTOGRAMS Stratified Sample Select randomly within defined strata (e.g., by age,
occupation, 3.7 gender). SCATTER PLOTS
Cluster Sample Select random geographical regions (e.g., zip codes) that
Frequency Distributions LO 3-8 represent Athe scatter plot shows n pairs
population. LOof observations
3-2 (x1, y1), (x2, y2), . . ., (xn, yn) as dots (or some
A frequency distribution is a table formed Make by classifying other symbol) on an X-Y graph. This type of display is so important in statistics that it deserves
and interpretnanumerical data values into k Create a frequency distri-
classes called bins. The table shows the frequency of data values careful attention.
eachA scatter plot is a starting point for bivariate data analysis. We create scatter
by N that fall within bybin.
scatter plot.
We denote the population size and the
plotssample size
to investigate In abution
simple
n.relationship
the
forrandom
a data set.
between twosample,
variables. Typically, we would like to know
Frequencies also can be expressed as relative frequencies or percentages
every item in the population of N items hasifthe of the
same
there
total number
is anchance of being
association chosen
between in the sample
two variables and, if of
so, what kind of association exists. As
of observations. n items. A physical experiment to accomplish we this
did with
wouldunivariate data analysis,
be to write each oflet’s
the look at avalues
N data scatter plot to see what we can observe.
on a poker chip and then to draw n chips from a bowl after stirring it thoroughly. But we can
Examples accomplish the same thing if the N population items appear on a numbered list, by choosing n
Example 3.3
integers between 1 and N that we match up against Figurethe
3.16numbered
shows a scatter plot
list of with
the life expectancy
population items. on the X-axis and birth rates on the
Examples of interest to For stu- Birth Rates
example, suppose and Life
we want studentInatthis
to select oneY-axis. illustration,
random from there
a list seems
of 15 to be an association
students (see between X and Y. That is,
nations with higher birth rates tend to have lower life expectancy (and vice versa). No
dents are taken from publishedFigure 2.5). If youExpectancy
were asked to “use your judgment,” you would probably pick a name in the mid-
cause-and-effect relationship is implied because, in this example, both variables could
research or real applications dle, thereby
to biasing theThedraw
Source: World against those individuals
Factbook 2003. at either
be influenced by aend ofvariable
third the list.that
Instead
is notwe rely on (e.g., GDP per capita).
mentioned
Central Intelligence Agency, 2003.
a random number to “pick” the name. How do we determine the random number? Before com-
illustrate the statistics concept. www.cia.gov.
puters, statisticians relied on published tables ofFigure random 3.16numbers. The process is simpler
doa16287_ch03_056-099.indd 61 07/28/20 today.
10:24 PM
For the most part, examples Most pocket calculators have a key to produce a random decimal in the interval [0, 1] that can be
Scatter Plot of Birth Rates and Life Expectancy (n = 153 nations) BirthLife
are focused on business, but to a random integer. In this example, we used Excel’s function =RANDBETWEEN(1,15) to pick
converted
a random 60
there are also some that are integer between 1 and 15. The number was 12, so Stephanie was selected. There is no
more general and don’t bias because all values from 1 to 15 are equiprobable (i.e., equally50likely to occur). An equivalent
Birth Rate per 1,000
require
R function for choosing a single random integer between 1 and 15 40 is sample(1:15,1,1).
any prerequisite knowledge. 30
And there are some that are Random person 12 20 Figure 2.5
based on student projects. 10
0
Picking on Stephanie
1 Adam 6 Haitham 11 Moira
30 40 50 60 70 80
2 Addie 7 Jackie 12 Stephanie
Life Expectancy (years)
3 Don 8 Judy 13 Stephen
4 Floyd 9 Lindsay 14 Tara
5 Gadis 10 Majda 15 Xander
Figure 3.17 shows some scatter plot patterns similar to those that that you might observe
when you have a sample of (X, Y) data pairs. A scatter plot can convey patterns in data pairs
Sampling without replacement means that that once
wouldannot be has
item apparent
beenfrom a table.
selected Compare
to be includedtheinscatter plots in Figure 3.18 with the
prototypes
the sample, it cannot be considered for the sample and The
again. use your own
Excel words to=RANDBETWEEN(a,b)
function describe the patterns that you see.
uses sampling with replacement. This means that the same random number could show up
more than once. Figure
Using the3.17
bowl analogy, if we throw each chip back in the bowl and stir the
Data Set Icon
Strong Positive Weak Positive No Pattern
contents before the next draw, an item can be chosen again. Instinctively most people believe
Prototype Scatter Plot
that sampling the
A data set icon is used throughout without
text replacement
to identifyisdata
preferred
sets over sampling
used with replacement
in the figures, examples, because
and
Patterns Y Y Y USTrade
allowing duplicates in our sample seems odd. In reality, sampling without replacement can
exercises that are included in Connect for the text.
be a problem when our sample size n is close to our population size N. At some point in the
sampling process, the remaining items in the population will no longer have the same prob-
X X X
ability of being selected as the items we chose at the beginning of the sampling process. This
could lead to a bias (a tendency to overestimate or underestimate
Strong Negative the parameterWeak
we Negative
are trying Nonlinear Pattern
to measure) in our sample results. Sampling with replacement does not lead to bias. In a list of
items to be sampled (a vector x), the R function sample(x, n, 1) will choose a random sample of n
ix
items with replacement or use sample(x, n, 0) toY sample without replacement). Y Y
When should we worry about sampling without replacement? Only when the population is
finite and the sample size is close to the population size. Consider the Russell 3000® Index,
X X X
How Does This Text Reinforce First Pages
First Pages
Chapter Summary
Chapter Summary Chapter 4 Descriptive Statistics 147
Chapter summaries provide The mean and median describe a sample’s center and also indicate the mean absolute deviation or MAD is easy to understand but
an overview of the material Chapter
skewness. TheSummary
mode is useful for discrete data with a small range. lacks nice mathematical properties. Quartiles are meaningful
The trimmed mean eliminates extreme center
values. and
Thealso
geometric even for fairly small deviation
data sets, or
while is easy toareunderstand
percentiles used only but
for
covered in the chapter. The mean and median describe a sample’s indicate the mean
large
absolute MAD
mean mitigates
skewness. high is
The mode extremes butdiscrete
useful for cannot data
be used
with when zeros
a small or
range. lacks data
nicesets. Box plots show
mathematical the quartiles
properties. and are
Quartiles data meaningful
range. The
negative values are present. The midrange is easy
The trimmed mean eliminates extreme values. The geometric to calculate correlation
even for fairly small datameasures
coefficient sets, whilethepercentiles
degree of linearity
are used between
only for
but
mean is mitigates
sensitive to extremes.
high extremesVariability
but cannotis be
typically measured
used when zeros by
or two
largevariables.
data sets.The plots showmeasures
Boxcovariance the degree
the quartiles and datato range.
which The
two
the standard
negative valuesdeviation, whileThe
are present. relative dispersion
midrange is given
is easy by the
to calculate variables move together. We can estimate many common
correlation coefficient measures the degree of linearity between descriptive
coefficient of variation for nonnegative data. Standardized
but is sensitive to extremes. Variability is typically measured by data statistics from The
two variables. grouped
covariance Sample coefficients
data. measures the degree toFirst Pages
of which
skewness
two
reveal or unusual data
outliersdeviation,
the standard whilevalues, anddispersion
relative the Empirical Ruleby
is given offers
the and kurtosis
variables moveallow moreWe
together. precise inferences
can estimate about
many the shape
common of the
descriptive
acoefficient
comparison of with a normal
variation distribution. data.
for nonnegative In measuring dispersion,
Standardized data population being sampled instead of relying on histograms.
statistics from grouped data. Sample coefficients of skewness
reveal outliers or unusual data values, and the Empirical Rule offers and kurtosis allow more precise inferences about the shape of the
a comparison with a normal distribution. In measuring dispersion, population being sampled instead of relying on histograms.
Key Terms
148 Applied Statistics in Business and Economics
Key Terms Center
Key Terms
geometric mean
Variability
Chebyshev’s Theorem
Shape
bimodal distribution
Other
box plot
Commonly Used Formulas in Descriptive Statistics covariance
Key terms are highlighted mean
Center
median
coefficient of variation
Variability
Empirical Rule
kurtosis
Shape
kurtosis coefficient
Other
five-number summary
geometric mean Chebyshev’s Theorem bimodal distribution box plot
and defined within the text. midhinge
mean mean absolute
coefficient
Sample mean: of¯ deviation
1 n
x variation
= __ ∑ xi
leptokurtic
kurtosis interquartile range
covariance
midrange outliers mesokurtic method of medians
They are also listed at the median Empirical Rule n i =1 kurtosis coefficient
multimodal
five-number summary
quartiles
mode
midhinge population
mean absolutevariance _________
deviation leptokurtic distribution interquartile range
ends of chapters to aid in trimmed
midrange mean Geometric range
mean:
outliers G = √
n
x x
1 2 . . . xn negatively
mesokurticskewed sample
method correlation
of medianscoefficient
weighted mean sample variance Pearson 2 skewness coefficient
reviewing. mode population variance __ multimodal distribution quartiles
standard
range deviation √
platykurtic
xn
n−1 __
trimmed mean Growth rate: GR = −1 negatively skewed sample correlation coefficient
weighted mean standardized
sample variance data x1 positively
Pearson 2 skewed
skewness coefficient
two-sum
Range: Range = xmax − xmin Schield’s
formula
standard deviation platykurticRule
z-score
standardized data skewed left
positively skewed
xmax + xminskewed
Schield’sright
Midrange = _________
two-sum formula
Midrange: Rule First Pages
z-score 2 skewness
skewed left
___________ skewness
skewed coefficient
right
√
n
∑ (x − ¯ x)2 symmetric data
skewness
i
Sample standard deviation: s= ___________
i =1
skewness coefficient
n−1
symmetric data
Choosing the Appropriate Statistic or Visual Display
148
Coefficient of variation: Population
Applied Statistics in Business and Economics
Sample
Commonly Used
σ s
Choosing
CVthe
= 100
Nominal ×_
Appropriate StatisticCV
or =
Visual __
100 ×Display
Ordinal ¯
Commonly Used Formulas in Descriptive
μ
Statistics
Data x
Median
Formulas Mode
StandardizedBar
variable:
Chart Population
Nominal
Type?
Sample
Ordinal
Mode
Data Bar Chart
Column Chart 1x − n Median
Some chapters provide a SampleMode
mean: ¯
zxi =
__ i∑μ
= n_____ xi Type? xi − ¯
zi = _____
x Column Chart
Mode
Bar Chart iσ
=1 Interval or s
listing of commonly used Column Chart _________ Ratio
Bar Chart
Geometric mean: G=√ n
x1 x2 . _______
. . x+ Q
Q Column Chart
formulas for the topic under Midhinge: Midhinge = 1 n 3 Interval or
__ 2
Ratio
discussion. √
Center n−1 __ xn Variability
Growth rate:
Mean GR = nx
− 1Describe Range
Median ∑1(xi − ¯ )(y − ¯
xWhat? y)
sXY Interquartile Range
Range:
Sample correlation coefficient:
Mode = _________________________
Range
rCenter = xi =1 − x
_________max min
_________ or r = ____
Variability s s Standard Deviation
Describe
Mean
√ √ Range
n n
Midrange ∑ (x i − ¯ x ) 2 ∑ (y i − ¯ y)2 X Y
Coefficient of
Median i =1 xmax +What?xmin
i =1 Interquartile Range
Midrange:
Geometric Mean Midrange = _________ Variation
Mode 2 Shape Standard Deviation
Midhinge Standardized z-Values
Midrange k k
Coefficient of
Weighted mean:
Histogram x = ∑___________
¯ n j xj where ∑ wj = 1.00
w Histogram
√
Geometric Mean j = 1∑ (x − ¯ x ) 2 j =Shape1 Variation
Box Plot i Box Plot
Midhinge
Sample standard deviation: s = ___________
i =1 Standardized z-Values
Mean vs. Median
Histogram k f m n − 1 Histogram
Grouped mean: ¯
x=∑ ____
j Skewness
j Coefficient
Box Plot j = 1 n Kurtosis Coefficient Box Plot
Coefficient of variation: Population Mean vs. Median Sample
Histogram
Skewness Coefficient
Chapter Review Chapter Review ®
CV = 100 ×Kurtosis
σ Box Plot
_
μ CoefficientCV = 100 × ¯
s
__
x
Histogram
Each chapter has a list of Standardized variable: Population
1. What are descriptive statistics? How do they differ from
Box Plot Sample
7. List strengths and weaknesses of each measure of center and
questions for student self- visual displays of data? xi − μ
zi = _____ zi = i
−¯
give its_____
x Excel
x function (if any): (a) midrange, (b) geometric
mean, ands (c) 10 percent trimmed mean.
review or for discussion. 2. Explain each concept: (a) center, (b) variability, andσ (c) shape.
8. (a) What is variability? (b) Name five measures of variability. List
3. (a) Why is sorting usually the first step in data analysis?
Q1 + Q3
_______
(b) Why is it useful to Midhinge:
begin a data analysis Midhinge = about
by thinking 2
the main characteristics (strengths, weaknesses) of each measure.
how the data were collected? 9. (a) Which standard deviation formula (population, sample) is
doa16287_ch04_100-157.indd 147
4. List strengths and weaknesses of each measure of center
n used most often? Why? (b) When is the coefficient of07:59
07/29/20 varia-
AM
∑ (xiand−¯ x )(y − ¯
y)
write its Excel function: (a) mean, (b) median, and (c)
i = 1mode.
_________________________ tion useful? ____
sXY
Sample correlation coefficient: r = _________ _________ or r =
√ √
n n sX sof
doa16287_ch04_100-157.indd 147
5. (a) Why must the deviations around the mean sum ∑ (to xi − x ) 2 ∑10.
¯
zero? (yi −(a)
y )To
¯ 2 what kind Y data does Chebyshev’s Theorem
07/29/20 apply?
07:59 AM
=1
(b) What is the position of the median in the data iarray when i =1 (b) To what kind of data does the Empirical Rule apply?
n is even? When n is odd? (c) Why is the mode of little use (c) What is an outlier? An unusual data value?
k k
in continuous data? (d) For
Weighted what type of ¯
mean: = ∑iswthe
xdata where ∑11.
j xj mode wj =(a) In a normal distribution, approximately what percent of
1.00
j =1 j =1
most useful? observations are within 1, 2, and 3 standard deviations of the
6. (a) What is a bimodal distribution? (b) Explain ktwo mean? (b) In a sample of 10,000 observations, about how
fj mways to
detect skewness. Grouped mean: x = ∑ ____
¯ j
many observations would you expect beyond 3 standard devi-
j =1 n ations of the mean?
x Chapter Review ®
1. What are descriptive statistics? How do they differ from 7. List strengths and weaknesses of each measure of center and
visual displays of data? give its Excel function (if any): (a) midrange, (b) geometric
2. Explain each concept: (a) center, (b) variability, and (c) shape. mean, and (c) 10 percent trimmed mean.
doa16287_ch04_100-157.indd 148
3. (a) Why is sorting usually the first step in data analysis? 8. (a) What is variability? (b) Name five measures of variability.
07/29/20 07:59List
AM
12. (a) Write the mathematical formula for a standardized vari- 17. What does a correlation coefficient measure? What is its
able. (b) Write the Excel formula for standardizing a data range? Why is a correlation coefficient easier to interpret than
value in cell F17 from an array with mean Mu and standard a covariance?
deviation Sigma. 18. (a) Why is some accuracy lost when we estimate the mean
Student Learning?
13. (a) Why is it dangerous to delete an outlier? (b) When might or standard deviation from grouped data? (b) Why do open-
it be acceptable to delete an outlier? ended classes in a frequency distribution make it impossible
14. (a) Explain how quartiles can measure both center and vari- to estimate the mean and standard deviation? (c) When would
ability. (b) Why don’t we calculate percentiles for small grouped data be presented instead of the entire sample of
samples? raw data?
15. (a) Explain the method of medians for calculating quartiles. 19. (a) What is the skewness coefficient of a normal distribution?
(b) Write the Excel formula for the first quartile of an array A uniform distribution? (b) Why do we need a table for
named XData. sample skewness coefficients that is based on sample size?
16. (a) What is a box plot? What does it tell us? (b) What is the 20. (a) What is kurtosis? (b) Sketch a platykurtic population, a
role of fences in a box plot? (c) Define the midhinge and leptokurtic population, and a mesokurtic population. (c) Why
interquartile range. can’t we rely on a histogram to assess kurtosis?
More Learning mean.names (as they do here), we can DESCRIBING Mean :262.3 DATA Mean :2898 Mean :21.98
If ourdard datadeviations
frame columnsof the have
3rd Qu. :308.0 3rd Qu. :3500 3rd Qu. :26.00
4.58 Bags of statistics
getCHAPTER
summary jelly beans forhave a mean
variables weight
interest of
of Resources 396 the
using gmsummary()
with a 4.64 Below are monthly rents:6200
paid byMax.30 students who live® off
4 More Learning
Resources
Max. :602.0 Max. :31.00
standard deviation of 5 gm. Use Chebyshev’s Theorem to campus. (a) ®Find the mean,
command.
You can For example:
access these LearningStats demonstrations through McGraw-Hill’s Connect to help you median, and mode. (b) Do the
find a lower bound for the number of bags in a sample of You can create of
measures a graph
centraland export itagree?
tendency from the Plots tab
Explain. (c)(lower
Calculateright
understand descriptive statistics.
> summary(VehicleData[c(''Weight'',''Length'')])
200 that weigh between 386 and 406 gm. pane the
in R)standard
to paste it into your(d) written
Sort report. For example, the we can
LearningStats provides a Based on experience,
4.59 Weight the Ball Corporation’s aluminum can
Length create(e)
a simple
deviation.
boxoutliers
Are there plot andorhistogram
and standardize
withvalues?
unusual data optional(f)labels
Using
data.
forthe
the
Topic LearningStats Demonstrations
means for Connect users to Min. manufacturing
:2385
1st Qu.that :3356
the metal
facility :151.1
Min.
thickness
1st
OverviewQu.
in Ft. Atkinson, Wisconsin, knows
of incoming shipments has a mean
:181.9
axes and graph titles:
Empirical Rule, do you think the data could be from a nor-
mal population?
Describing Data Rents
explore data and concepts at Medianof 0.2731
:3662 mmMedian :192.2 deviation of 0.000959 mm.
with a standard > boxplot(VehicleData$Weight, ylab=''Pounds'', main=''Vehicle Weight'')
Using MegaStat
> hist(VehicleData$Weight, xlab=''Pounds'', main=''Vehicle
Mean (a) A:3954 certain Mean
shipment:190.9
has a diameter of 0.2761. Find the Using 730
Minitab 730 730 930 700 570 Weight'')
their own pace. Applications 3rd Qu.standardized
:4661 3rd Qu. for:198.7
z-score this shipment. (b) Is this an outlier? Using 690
R 1,030 740 620 720 670
Max. SAT:5917 Max. :231.9 class of 2010 at Oxnard Uni-
that relate to the material in the 4.60 scores for the entering
Descriptive statistics
versity were normally distributed with a mean of 1340 and
560
Basic Statistic
600 620
740 650
760
660
690
850
710
930
500
chapter are identified by topic a standard deviation of 90. Bob’s SAT score was 1430. (a) Quartiles
730 800 820 840 720 700
Box Plot Simulation
at the end of each chapter. Grouped Data
Significant Digits
ScreenCam Tutorials Using MegaStat
Excel Descriptive Statistics
Excel Scatter Plots
doa16287_ch04_100-157.indd 149 07/29/20 07:59 AM
Key: = PowerPoint = Excel = PDF = ScreenCam Tutorials
Software Supplement
Descriptive Statistics Using Megastat On the new menu, enter the data range (in this case C4:C37) in the Input
You can obtain descriptive statistics (and more) from MegaStat, as range field (or highlight the data block on the worksheet). MegaStat
Exam Review Questions illustrated in Figure 4.33. Click the Add-Ins tab on the top menu, and
Exam
then click Review Questions
on the MegaStat for
icon (left side Chapters
of the top menu in1–4 this
offers you various statistics and visual displays, including a dot plot
and stem-and-leaf. Compare Excel and MegaStat to see similarities
At the end of a group of chapters, example). On the list of MegaStat procedures, click Descriptive Statistics.
1. Which type of statistic (descriptive, inferential) is each of the
and differences in their interfaces and results.
4. Which data type (categorical, numerical) is each of the following?
students can review the material Figure 4.33
following? a. Your current credit card balance.
a. Estimating the default rate on all U.S. mortgages from a b. Your college major.
they covered in those chapters. MegaStat’s
randomDescriptive
sample of 500 loans. c. Your car’s odometer mileage reading today.
Statistics JDPower
This provides them with an oppor- b. Reporting the percent of students in your statistics class
who use Verizon.
5. Give the type of measurement (nominal, ordinal, interval,
ratio) for each variable.
tunity to test themselves on their c. Using a sample of 50 iPhones to predict the average battery
a. Length of time required for a randomly chosen vehicle to
life in typical usage.
grasp of the material. 2. Which is not an ethical obligation of a statistician? Explain.
cross a toll bridge.
b. Student’s ranking of five cell phone service providers.
a. To know and follow accepted procedures. c. The type of charge card used by a customer (Visa, Mastercard,
b. To ensure data integrity and accurate calculations. AmEx, Other).
c. To support client wishes in drawing conclusions from 6. Tell if each variable is continuous or discrete.
the data.
a. Tonnage carried by an oil tanker at sea.
3. “Driving without a seat belt is not risky. I’ve done it for 25 b. Wind velocity at 7 o’clock this morning.
years without an accident.” This best illustrates which fallacy? c. Number of text messages you received yesterday.
a. Unconscious bias. 7. To choose a sample of 12 students from a statistics class of
Source: MegaStat
b. Conclusion from a small sample. 36 students, which type of sample (simple random, systematic,
c. Post hoc reasoning. cluster, convenience) is each of these?
xi
Guided Examples These narrated video walkthroughs provide students with step-by-step guidelines for solving selected
exercises similar to those contained in the text. The student is given personalized instruction on how to solve a problem by
applying the concepts presented in the chapter. The narrated voiceover shows the steps to take to work through an exercise.
Students can go through each example multiple times if needed.
xiv
What Resources are Available for Students?
The following software tools are available to assist students in understanding concepts and solving problems.
LearningStats
LearningStats allows students to explore data and con-
cepts at their own pace. It includes demonstrations,
simulations, and tutorials that can be downloaded from
Connect.
R and RStudio
A sophisticated programming language for statistical computing and graphics plus an integrated development environment.
This textbook offers detailed instructions for downloading, installing, and using free versions of R (https://www.r-project.
org/) and RStudio (https://rstudio.com/).
xv
What Resources are Available for Instructors?
Instructor resources are available through the Connect course at connect.mheducation.com. Resources include a complete
Instructor’s Manual in Word format, the complete Test Bank. Instructor PowerPoint slides, text art files, and more.
New remote proctoring and browser-locking capabilities, hosted by Proctorio within Connect, provide control of the assess-
ment environment by enabling security options and verifying the identity of the student.
Seamlessly integrated within Connect, these services allow instructors to control students’ assessment experience by
restricting browser activity, recording students’ activity, and verifying students are doing their own work.
Instant and detailed reporting gives instructors an at-a-glance view of potential academic integrity concerns, thereby
avoiding personal bias and supporting evidence-based claims.
xvi
Rev.Confirming Pages
Acknowledgments
The authors would like to acknowledge some of the many people who have helped with this book. Thomas W. Lauer and
Floyd G. Willoughby permitted quotation of a case study. Morgan Elliott, Karl Majeske, Robin McCutcheon, Kevin Mur-
phy, John Sase, T. J. Wharton, and Kenneth M. York permitted questionnaires to be administered in their classes. Mark
Isken, Ron Tracy, and Robert Kushler gave generously of their time as expert statistical consultants. Jonathan G. Koomey of
E.O. Lawrence Berkeley National Laboratory offered valuable suggestions on visual data presentation.
We are grateful to Farrukh Abbas for his careful scrutiny of the text and for offering ideas on improving the text and
exercises. Mark Isken has reliably provided Excel expertise and has suggested health care applications for examples and
case studies. John Savio and the Michigan State Employees Credit Union provided ATM data. The Siena Research Institute
has made its poll results available. J.D. Power and Associates generously provided permission to use vehicle quality data.
The Public Interest Research Group in Michigan (PIRGIM) has generously shared data from its field survey of prescription
drug prices.
Phil Rogers has offered numerous suggestions for improvement in both the textbook exercises and Connect. Milo A.
Schield shared his research on “quick rules” for measuring skewness from summarized data. We owe special thanks to
Aaron Kennedy and Dave Boennighausen of Noodles & Company; to Mark Gasta, Anja Wallace, and Clifton Pacaro of
Vail Resorts; to Jim Curtin and Gordon Backman of Ball Corporation; and to Santosh Lakhan from The Verdeo Group for
providing suggestions and access to data for Mini Cases and examples. For reviewing the material on quality, we wish to
thank Kay Beauregard, administrative director at William Beaumont Hospital, and Ellen Barnes and Karry Roberts of Ford
Motor Company. Amy Sheikh provided a new Facebook Friends data set, along with other excellent suggestions and reports
from the “front lines” of her classes.
A special debt of gratitude is due to Noelle Bathurst, Harper Christopher, Amy Gehl, and Ryan McAndrews for their
direction and support and Harvey Yep and Jamie Koch for managing the text and Connect pieces of the project. Thanks
to the many reviewers who provided such valuable feedback including criticism that made the book better, some of whom
reviewed several previous editions of the text. Any remaining errors or omissions are the authors’ responsibility. Thanks
too, to the participants in our focus groups and symposia on teaching business statistics, who have provided teaching ideas
and insights from their experiences with students in diverse contexts. We hope you will be able to see in our book and the
teaching package consideration of those ideas and insights.
Farrukh Abbas, National University of Modern Languages Robert Cutshall, Texas A&M University—Corpus Christi
(NUML), Islamabad, Pakistan Terry Dalton, University of Denver
Heather Adams, University of Colorado—Boulder Douglas Dotterweich, East Tennessee State University
Sung Ahn, Washington State University Jerry Dunn, Southwestern Oklahoma State University
Mostafa Aminzadeh, Towson University Michael Easley, University of New Orleans
Scott Bailey, Troy University Jerry Engeholm, University of South Carolina—Aiken
Hope Baker, Kennesaw State University Mark Farber, University of Miami
Saad Taha Bakir, Alabama State University Soheila Kahkashan Fardanesh, Towson University
Steven Bednar, Elon University Mark Ferris, St. Louis University
Adam Bohr, University of Colorado—Boulder Stergios Fotopoulos, Washington State University
Katherine Broneck, Pima Community College—Downtown Vickie Fry, Westmoreland County Community College
Alan Cannon, University of Texas—Arlington Joseph Fuhr, Widener University
Deborah Carter, Coahoma Community College Bob Gillette, University of Kentucky
Kevin Caskey, SUNY—New Paltz Malcolm Gold, Avila University
Michael Cervetti, University of Memphis Don Gren, Salt Lake City Community College
Paven Chennamaneni, University of Wisconsin—Whitewater Karina Hauser, University of Colorado—Boulder
Alan Chesen, Wright State University Eric Hernandez, Miami Dade College
Wen-Chyuan Chiang, University of Tulsa Clifford Hawley, West Virginia University
Chia-Shin Chung, Cleveland State University Yijun He, Washington State University
Joseph Coleman, Wright State University—Dayton Natalie Hegwood, Sam Houston State University
xvii
Joshua Naranjo, Western Michigan University Rachel Webb, Portland State University
Anthony Narsing, Macon State College Simone A. Wegge, City University of New York
Robert Nauss, University of Missouri–St. Louis Chao Wen, Eastern Illinois University
Pin Ng, Northern Arizona University Alan Wheeler, University of Missouri—St. Louis
Grace Onodipe, Georgia Gwinnett College Anne Williams, Gateway Community College
xviii
Enhancements for Doane/Seward ASBE 7e
Many changes were motivated by advice from reviewers ∙ Updated test bank and updated/expanded Big Data Sets.
and users of the textbook. Besides hundreds of small edits ∙ Updated Related Readings and Web Sources for students
and improved topic organization, these changes were com- who want to “dive deeper.”
mon to most chapters:
∙ Revised LearningStats demonstrations to illustrate
∙ New overall design, colors, figures, and exercise layout concepts beyond what is possible in a textbook (e.g.,
for a brighter and more efficient look. simulations).
∙ New end-of-chapter Software Supplements for R, includ- ∙ Improved illustrations, figures, and tables.
ing two new appendixes (e.g., comparison of R with Excel)
xix
Chapter 11—Analysis of Variance Two new trend interpretation exercises.
New Analytics in Action (Experiments or Big Data?). New Analytics in Action (Trend? Or Bubble?).
Leaner discussion of two-factor ANOVA. Updated 16 exercise data sets (e.g., bird strikes, renewable
Updated Related Readings. energy, PepsiCo, JetBlue, Coca-Cola revenue, revolving
debt, plane shipments, federal budget, Boston Marathon,
New Software Supplement (ANOVA Using R).
leisure industry, snowboarding, airspace delays).
Chapter 12—Simple Regression Updated Related Readings.
New Analytics in Action (Predictive Maintenance and New Software Supplement (Time Trends and Seasonality
Machine Learning). Using R).
Revised discussion of confidence and prediction intervals.
Chapter 15—Chi-Square Tests
New MiniCase 12.4 (exports and imports).
Simplified examples of raw data conversion.
Leaner discussion of ill-conditioned data and spurious
New Analytics in Action (Confusion Matrix for Machine
correlation.
Learning).
New MiniCase 12.6 (assets and market capitalization).
Simplified treatment of GOF tests (Uniform, Normal, ECDF).
Revised data set (U.S. price inflation) and updated Related
New exercise (age and social media preference).
Readings.
Updated data sets (Derby, NL runs).
New Software Supplement (Simple Regression Using R).
New Software Supplement (Chi-Square Tests Using R).
Chapter 13—Multiple Regression
Chapter 16—Nonparametric Tests
Simplified introduction and revised treatment of confidence
and prediction intervals. One new exercise (movie reviews) and updated Related
Readings.
Delete Mini Case 13.4 (cockpit noise).
New Software Supplement (Nonparametric Tests Using R).
New Analytics in Action (People Analytics at Work).
New correlation matrix illustration (vehicle MPG), updated Chapter 17—Quality Management
data set (CPI changes) and a new data set (immunotherapy New Analytics in Action (Big Data Tracks a Virus).
drug prices). Updated discussion of acceptance sampling.
Updated Related Readings. Updated discussion of software (e.g., R CRAN packages).
New Software Supplement (Multiple Regression Using R). Updated Related Readings.
Chapter 14—Time Series Analysis Chapter 18—Simulation
Updated examples (U.S. labor Force, dollar exchange rates). New table for random data in R.
Updated examples of erratic (hurricanes, snowfall) and Deleted bootstrap discussion.
consistent patterns (health spending, Amazon revenue).
Updated Related Readings.
Leaner trend-fitting presentation, new formula for com-
pound growth rate, and example of decomposition using R.
xx
Final PDF to printer
Brief Contents
Chapter 1 Chapter 14
Overview of Statistics 2 Time-Series Analysis 578
Chapter 2 Chapter 15
Data Collection 24 Chi-Square Tests 624
Chapter 3 Chapter 16
Describing Data Visually 56 Nonparametric Tests 662
Chapter 4 Chapter 17
Descriptive Statistics 100 Quality Management 692
Chapter 5 Chapter 18
Probability 158 Simulation 18-1
Chapter 6 Appendixes
Discrete Probability Distributions 200 A Binomial Probabilities 734
B Poisson Probabilities 736
Chapter 7
C-1 Standard Normal Areas 739
Continuous Probability Distributions 238
C-2 Cumulative Standard Normal
Distribution 740
Chapter 8
D Student’s t Critical Values 742
Sampling Distributions and Estimation 278
E Chi-Square Critical Values 743
Chapter 9 F Critical Values of F.10 744
One-Sample Hypothesis Tests 322 G Solutions to Odd-Numbered Exercises 752
H Answers to Exam Review Questions 779
Chapter 10 I Writing and Presenting Reports 781
Two-Sample Hypothesis Tests 370 J Statistics in Excel and R 785
K Using R and RStudio 789
Chapter 11
Analysis of Variance 416 Index 797
Standard Normal Areas 815
Chapter 12
Cumulative Standard
Simple Regression 462
Normal Distribution 816
Student’s t Critical Values 818
Chapter 13
Multiple Regression 522
xxi
xxii
Contents xxiii
xxiv Contents
14.8 Forecasting: Final Thoughts 611 17.7 Other Control Charts 711
Chapter Summary 612 17.8 Patterns in Control Charts 716
Chapter Exercises 614 17.9 Process Capability 718
17.10 Additional Quality Topics (Optional) 721
Chapter 15
Chapter Summary 725
Chapter Exercises 726
Chi-Square Tests 624
15.1 Chi-Square Test for Independence 625 Chapter 18
15.2 Chi-Square Tests for Goodness of Fit 636
15.3 Uniform Goodness-of-Fit Test 639 Simulation 18-1
15.4 Poisson Goodness-of-Fit Test 643
15.5 Normal Chi-Square Goodness-of-Fit Test 648 Appendixes
15.6 ECDF Tests (Optional) 651
Chapter Summary 652 A Binomial Probabilities 734
Chapter Exercises 653
B Poisson Probabilities 736
C-1 Standard Normal Areas 739
Chapter 16
C-2 Cumulative Standard Normal
Nonparametric Tests 662 Distribution 740
16.1 Why Use Nonparametric Tests? 663 D Student’s t Critical Values 742
16.2 One-Sample Runs Test 664
16.3 Wilcoxon Signed-Rank Test 667 E Chi-Square Critical Values 743
16.4 Wilcoxon Rank Sum Test 670 F Critical Values of F.10 744
16.5 Kruskal-Wallis Test for Independent Samples 673
16.6 Friedman Test for Related Samples 678 G Solutions to Odd-Numbered Exercises 752
16.7 Spearman Rank Correlation Test 681 H Answers to Exam Review Questions 779
Chapter Summary 684
Chapter Exercises 685 I Writing and Presenting Reports 781
J Statistics in Excel and R 785
Overview of
Statistics
CHAPTER CONTENTS
2
W
hen managers are well informed about a com-
pany’s internal operations (e.g., sales, production,
inventory levels, time to market, warranty claims) and
competitive position (e.g., market share, customer satisfaction, repeat Juice Images/Getty Images
sales), they can take appropriate actions to improve their business. Managers
need reliable, timely information so they can analyze market trends and adjust
to changing market conditions. Better data also can help a company decide
which types of strategic information it should share with trusted business part-
ners to improve its supply chain. Statistics and statistical analysis permit data-
based decision making and reduce managers’ need to rely on guesswork.
Statistics is a key component of the field of business intelligence, which
encompasses all the technologies for collecting, storing, accessing, and ana-
lyzing data on the company’s operations in order to make better business
decisions. Statistics helps convert unstructured “raw” data (e.g., point-of-sale
data, customer spending patterns) into useful information through online ana-
lytical processing (OLAP) and data mining, terms that you may have encoun-
tered in your other business classes. Statistical analysis focuses attention on
key problems and guides discussion toward issues, not personalities or terri-
torial struggles. While powerful database software and query systems are the
key to managing a firm’s data warehouse, relatively small Excel spreadsheets
are often the focus of discussion among managers when it comes to “bottom
line” decisions. That is why Excel is featured prominently in this textbook.
In short, companies increasingly are using business analytics to support
decision making, to recognize anomalies that require tactical action, or to
gain strategic insight to align business processes with business objectives.
Answers to questions such as “How likely is this event?” or “What if this trend
continues?” will lead to appropriate actions. Businesses that combine mana-
gerial judgment with statistical analysis are more successful.
3
4 Applied Statistics in Business and Economics
Plural or Singular?
Statistics The science of collecting, organizing, analyzing, interpreting, and present-
ing data.
Statistic A single measure, reported as a number, used to summarize a sample data
set.
Many different measures can be used to summarize data sets. You will learn throughout
this textbook that there can be different measures for different sets of data and different mea-
sures for different types of questions about the same data set. Consider, for example, a sample
data set that consists of heights of students in a university. There could be many different uses
for this data set. Perhaps the manufacturer of graduation gowns wants to know how long to
make the gowns; the best statistic for this would be the average height of the students. But an
architect designing a classroom building would want to know how high the doorways should
be and would base measurements on the maximum height of the students. Both the average
and the maximum are examples of a statistic.
You may not have a trained statistician in your organization, but any college graduate is
expected to know something about statistics, and anyone who creates graphs or interprets data
is “doing statistics” without an official title.
There are two primary kinds of statistics:
∙ Descriptive statistics refers to the collection, organization, presentation, and summary of
data (either using charts and graphs or using a numerical summary).
∙ Inferential statistics refers to generalizing from a sample to a population, estimating
unknown population parameters, drawing conclusions, and making decisions.
Figure 1.1 identifies the tasks and the text chapters for each.
Figure 1.1
Overview of Statistics
Statistics
Passados contentamientos
¿qué quereys?
dexadme, no me canseys.
Vi mudado un coraçon,
cansado de assegurarme,
fue forçado aprouecharme,
del tiempo, y de la occasion;
memoria do no ay passion,
¿qué quereys?
dexadme, no me canseys.