Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2-5 Measures of Variation 87

number before the last value is determined. See Exercise 41, which provides con-
crete numbers illustrating that division by n 2 1 is better than division by n. That
exercise shows that if s2 were defined with division by n, it would systematically
underestimate the value of s2, so we compensate by increasing its overall value
by making its denominator smaller (by using n 2 1 instead of n). Exercise 41
shows how division by n 2 1 causes the sample variance s2 to target the value of
the population variance s2, whereas division by n causes the sample variance s2 to
underestimate the value of the population variance s2.
Step 6 in Formula 2-4 for finding a standard deviation is to find a square root.
We take the square root to compensate for the squaring that took place in Step 3.
An important consequence of taking the square root is that the standard deviation
has the same units of measurement as the original values. For example, if cus-
tomer waiting times are in minutes, the standard deviation of those times will also
be in minutes. If we were to stop at Step 5, the result would be in units of “square
minutes,” which is an abstract concept having no direct link to reality.
After studying this section, you should understand that the standard deviation
is a measure of variation among values. Given sample data, you should be able to
compute the value of the standard deviation. You should be able to interpret the
values of standard deviations that you compute. You should know that for typical
data sets, it is unusual for a value to differ from the mean by more than 2 or 3 stan-
dard deviations.

2-5 Basic Skills and Concepts


In Exercises 1–8, find the range, variance, and standard deviation for the given sample
data. (The same data were used in Section 2-4 where we found measures of center. Here
we find measures of variation.)
1. Tobacco Use in Children’s Movies In “Tobacco and Alcohol Use in G-Rated Chil-
dren’s Animated Films,” by Goldstein, Sobel, and Newman (Journal of the American
Stats
Explore Medical Association, Vol. 281, No. 12), the lengths (in seconds) of scenes showing to-
bacco use were recorded for animated movies from Universal Studios. The first six
values included in Data Set 7 from Appendix B are listed below. Do these times ap-
pear to be consistent, or do they vary widely?
0 223 0 176 0 548
Stats
Explore
2. Harry Potter In an attempt to measure the reading level of a book, the Flesch Reading
Ease ratings are obtained for 12 randomly selected pages from Harry Potter and the
Sorcerer’s Stone by J. K. Rowling. Those values, included in Data Set 14 from Ap-
pendix B, are listed below. Given that these ratings are based on 12 randomly selected
pages, is the standard deviation of this sample likely to be a reasonable estimate of the
standard deviation of the reading levels for all pages in the whole book?
85.3 84.3 79.5 82.5 80.2 84.6
79.2 70.9 78.6 86.2 74.0 83.7
Stats
Explore
3. Cereal A dietitian obtains the amounts of sugar (in grams) from 1 gram in each of 16
different cereals, including Cheerios, Corn Flakes, Fruit Loops, Trix, and 12 others.
Those values, included in Data Set 16 from Appendix B, are listed below. Is the stan-
dard deviation of these values likely to be a good estimate of the standard deviation of

An Addison-Wesley product. Copyright © 2004, Pearson Education, Inc.


88 CHAPTER 2 Describing, Exploring, and Comparing Data

the amounts of sugar in each gram of cereal consumed by the population of all Amer-
icans who eat cereal? Why or why not?
0.03 0.24 0.30 0.47 0.43 0.07 0.47 0.13
0.44 0.39 0.48 0.17 0.13 0.09 0.45 0.43
Stats
Explore
4. Body Mass Index As part of the National Health Examination, the body mass index
is measured for a random sample of women. Some of the values included in Data Set
1 from Appendix B are listed below. Is the standard deviation of this sample reason-
ably close to the standard deviation of 6.17, which is the standard deviation for all 40
women included in Data Set 1?
19.6 23.8 19.6 29.1 25.2 21.4 22.0 27.5
33.5 20.6 29.9 17.7 24.0 28.9 37.7
5. Drunk Driving The blood alcohol concentrations of a sample of drivers involved in
fatal crashes and then convicted with jail sentences are given below (based on data
from the U.S. Department of Justice). When a state wages a campaign to “reduce
drunk driving,” is the campaign intended to lower the standard deviation?
0.27 0.17 0.17 0.16 0.13 0.24 0.29 0.24
0.14 0.16 0.12 0.16 0.21 0.17 0.18
6. Motorcycle Fatalities Listed below are ages of motorcyclists when they were fatally
injured in traffic crashes (based on data from the U.S. Department of Transportation).
How does the variation of these ages compare to the variation of ages of licensed
drivers in the general population?
17 38 27 14 18 34 16 42 28
24 40 20 23 31 37 21 30 25
7. Reaction Times The author visited the Reuben H. Fleet Science Museum in San
Diego and repeated an experiment of reaction times. The following times (in hun-
dredths of a second) were obtained. How do the measures of variation reflect the fact
that these times appear to be very consistent?
19 20 17 21 21 21 19 18 19 19
17 17 15 17 18 17 18 18 18 17
8. Bufferin Tablets Listed below are the measured weights (in milligrams) of a sample
of Bufferin aspirin tablets. Given that this medication should be manufactured in a
consistent way so that dosage amounts can be controlled, do the measures of variation
seem to indicate that the variation is at an acceptable level?
672.2 679.2 669.8 672.6 672.2 662.2
662.7 661.3 654.2 667.4 667.0 670.7
In Exercises 9–12, find the range, variance, and standard deviation for each of the two
samples, then compare the two sets of results. (The same data were used in Section 2-4.)
9. Customer Waiting Times Waiting times of customers at the Jefferson Valley Bank
(where all customers enter a single waiting line) and the Bank of Providence (where
customers wait in individual lines at three different teller windows):
Jefferson Valley: 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7
Providence: 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0

An Addison-Wesley product. Copyright © 2004, Pearson Education, Inc.


2-5 Measures of Variation 89

10. Regular > Diet Coke Weights (pounds) of samples of the contents in cans of regular
Coke and diet Coke:
Regular: 0.8192 0.8150 0.8163 0.8211 0.8181 0.8247
Diet: 0.7773 0.7758 0.7896 0.7868 0.7844 0.7861
11. Mickey D vs. Jack When investigating times required for drive-through service, the
following results (in seconds) are obtained (based on data from QSR Drive-Thru
Time Study).
McDonald’s: 287 128 92 267 176 240 192 118 153 254 193 136
Jack in the Box: 190 229 74 377 300 481 428 255 328 270 109 109
12. Skull Breadths Maximum breadth of samples of male Egyptian skulls from 4000 B.C.
and 150 A.D. (based on data from Ancient Races of the Thebaid by Thomson and Ran-
dall-Maciver):
4000 B.C.: 131 119 138 125 129 126 131 132 126 128 128 131
150 A.D.: 136 130 126 126 139 141 137 138 133 131 134 129
In Exercises 13–16, refer to the data sets in Appendix B. Use computer software or a
calculator to find the standard deviations, then compare the results.
Stats
Explore
T 13. Head Circumferences In order to correctly diagnose the disorder of hydrocephalus, a
pediatrician investigates head circumferences of 2-year-old boys and girls. Use the
sample results listed in Data Set 3. Does there appear to be a difference between the
two genders?
Table for Exercise 17
Stats T 14. Clancy, Rowling, Tolstoy A child psychologist investigates differences in reading
Explore
Time Frequency
difficulty and obtains data from The Bear and the Dragon by Tom Clancy, Harry Pot-
ter and the Sorcerer’s Stone by J. K. Rowling, and War and Peace by Leo Tolstoy. Re- 40–49 8
fer to Data Set 14 in Appendix B and use the Flesch-Kincaid Grade Level ratings for 50–59 44
12 pages randomly selected from each of the three books. 60–69 23
70–79 6
Stats
Explore
T 15. Weekend Rainfall In Data Set 11 in Appendix B, use the rainfall amounts in Boston 80–89 107
on Thursday and the rainfall amounts in Boston on Sunday. 90–99 11
Stats
T 16. Tobacco > Alcohol Use in Children’s Movies In “Tobacco and Alcohol Use in G- 100–109 1
Explore

Rated Children’s Animated Films,” by Goldstein, Sobel, and Newman (Journal of the Table for Exercise 18
American Medical Association, Vol. 281, No. 12), the lengths (in seconds) of scenes
Outcome Frequency
showing tobacco use and alcohol use were recorded for animated children’s movies.
In Data Set 7 in Appendix B, use the tobacco times, then the alcohol times. 1 27
In Exercises 17–20, find the standard deviation of the data summarized in the given 2 31
3 42
frequency distribution. (The same frequency distributions were used in Section 2-4.)
4 40
17. Old Faithful Visitors to Yellowstone National Park consider an eruption of the Old 5 28
Faithful geyser to be a major attraction that should not be missed. The given fre- 6 32
quency distribution summarizes a sample of times (in minutes) between eruptions. Table for Exercise 19
18. Loaded Die The author drilled a hole in a die and filled it with a lead weight, then Speed Frequency
proceeded to roll it 200 times. The results are given in the frequency distribution in
the margin. 42–45 25
46–49 14
19. Speeding Tickets The given frequency distribution describes the speeds of drivers 50–53 7
ticketed by the Town of Poughkeepsie police. These drivers were traveling through a 54–57 3
30 mi > h speed zone on Creek Road, which passes the author’s college. 58–61 1

An Addison-Wesley product. Copyright © 2004, Pearson Education, Inc.


90 CHAPTER 2 Describing, Exploring, and Comparing Data

Table for Exercise 20 20. Body Temperatures The accompanying frequency distribution summarizes a sample
Temperature Frequency of human body temperatures. (See the temperatures for midnight on the second day,
as listed in Data Set 4 in Appendix B.)
96.5–96.8 1
96.9–97.2 8 21. Teacher Ages Use the range rule of thumb to estimate the standard deviation of ages
97.3–97.6 14 of all teachers at your college.
97.7–98.0 22 22. Test Scores Use the range rule of thumb to estimate the standard deviation of the
98.1–98.4 19
scores on the first statistics test in your class.
98.5–98.8 32
98.9–99.2 6 23. Leg Lengths For the sample data in Data Set 1 from Appendix B, the sample of 40
99.3–99.6 4 women have upper leg lengths with a mean of 38.86 cm and a standard deviation of
Stats
Explore 3.78 cm. Use the range rule of thumb to estimate the minimum and maximum “usual”
upper leg lengths for women. Is a length of 47.0 cm considered unusual in this con-
text?
24. Heights of Women Heights of women have a mean of 63.6 in. and a standard devia-
tion of 2.5 in. (based on data from the National Health Survey). Use the range rule of
thumb to estimate the minimum and maximum “usual” heights of women. In this con-
text, is it unusual for a woman to be 6 ft tall?
25. Heights of Women Heights of women have a bell-shaped distribution with a mean of
63.6 in. and a standard deviation of 2.5 in. Using the empirical rule, what is the ap-
proximate percentage of women between
a. 61.1 in. and 66.1 in.?
b. 56.1 in. and 71.1 in.?
Stats
Explore
26. Weights of Regular Coke Using the weights of regular Coke listed in Data Set 17
from Appendix B, we find that the mean is 0.81682 lb, the standard deviation is
0.00751 lb, and the distribution is approximately bell-shaped. Using the empirical
rule, what is the approximate percentage of cans of regular Coke with weights
between
a. 0.80931 lb and 0.82433 lb?
b. 0.80180 lb and 0.83184 lb?
27. Heights of Women If heights of women have a mean of 63.6 in. and a standard devi-
ation of 2.5 in., what can you conclude from Chebyshev’s theorem about the percent-
age of women between 58.6 in. and 68.6 in.?

Stats
28. Weights of Regular Coke Using the weights of regular Coke listed in Data Set 17
Explore
from Appendix B, we find that the mean is 0.81682 lb and the standard deviation is
0.00751 lb. What can you conclude from Chebyshev’s theorem about the percentage
of cans of regular Coke with weights between 0.79429 lb and 0.83935 lb?
Stats
Explore
T 29. Coefficient of Variation for Cereal Refer to Data Set 16 in Appendix B. Find the co-
efficient of variation for the calories and find the coefficient of variation for the grams
of sugar per gram of cereal. Compare the results.
Stats
Explore
T 30. Coefficient of Variation for Coke and Pepsi Refer to Data Set 17 in Appendix B. Find
the coefficient of variation for the weights of regular Coke, then find the coefficient of
variation for the weights of regular Pepsi. Compare the results. Does either company
appear to have weights that are significantly more consistent?
31. Equality for All What do you know about the values in a data set having a standard
deviation of s 5 0?

An Addison-Wesley product. Copyright © 2004, Pearson Education, Inc.


2-5 Measures of Variation 91

32. Understanding Units of Measurement If a data set consists of the fines for speeding
(in dollars), what are the units used for standard deviation? What are the units used
for variance?
33. Comparing Car Batteries The Everlast and Endurance brands of car battery are both
labeled as lasting 48 months. In reality, they both have a mean life of 50 months, but
the Everlast batteries have a standard deviation of 2 months, while the Endurance bat-
teries have a standard deviation of 6 months. Which brand is the better choice? Why?
34. Interpreting Outliers A data set consists of 20 values that are fairly close together.
Another value is included, but this new value is an outlier (very far away from the
other values). How is the standard deviation affected by the outlier? No effect? A
small effect? A large effect?

2-5 Beyond the Basics


35. Comparing Data Sets Two different sections of a statistics class take the same quiz
and the scores are recorded below. Find the range and standard deviation for each sec-
tion. What do the range values lead you to conclude about the variation in the two
sections? Why is the range misleading in this case? What do the standard deviation
values lead you to conclude about the variation in the two sections?
Section 1: 1 20 20 20 20 20 20 20 20 20 20
Section 2: 2 3 4 5 6 14 15 16 17 18 19
36. Transforming Data In each of the following, describe how the range and standard de-
viation of a data set are affected.
a. The same constant k is added to each value of the data set.
b. Each value of the data set is multiplied by the same constant k.
c. For the body temperature data listed in Data Set 4 of Appendix B (12 A.M. on day
2), x 5 98.20°F and s 5 0.62°F. Find the values of x and s after each temperature
has been converted to the Celsius scale. [Hint: C 5 5(F 2 32) > 9.]
37. Genichi Taguchi developed a method of improving quality and reducing manufactur-
ing costs through a combination of engineering and statistics. A key tool in the
Taguchi method is the signal-to-noise ratio. The simplest way to calculate this ratio
is to divide the mean by the standard deviation. Find the signal-to-noise ratio for the
cotinine levels of smokers listed in Table 2-1.
38. Skewness In Section 2-4 we introduced the general concept of skewness. Skewness
can be measured by Pearson’s index of skewness:
3sx 2 mediand
I5
s
If I 1.00 or I # 21.00, the data can be considered to be significantly skewed. Find
Pearson’s index of skewness for the cotinine levels of smokers listed in Table 2-1, and
then determine whether there is significant skewness.
39. Understanding Standard Deviation A sample consists of 10 test scores that fall be-
tween 70 and 100 inclusive. What is the largest possible standard deviation?

An Addison-Wesley product. Copyright © 2004, Pearson Education, Inc.


92 CHAPTER 2 Describing, Exploring, and Comparing Data

40. Phony Data? For any data set of n values with standard deviation s, every value must
be within s 2n 2 1 of the mean. A statistics teacher reports that the test scores in her
class of 17 students had a mean of 75.0 and a standard deviation of 5.0. Kelly, the
class’s self-proclaimed best student, claims that she received a grade of 97. Could
Kelly be telling the truth?
41. Why Divide by n21? Let a population consist of the values 3, 6, 9. Assume that sam-
ples of two values are randomly selected with replacement.
a. Find the variance s2 of the population {3, 6, 9}.
b. List the nine different possible samples of two values selected with replacement,
then find the sample variance s2 (which includes division by n 2 1) for each of
them. If you repeatedly select two sample values, what is the mean value of the
sample variance s2?
c. For each of the nine samples, find the variance by treating each sample as if it is a
population. (Be sure to use the formula for population variance, which includes di-
vision by n.) If you repeatedly select 2 sample values, what is the mean value of
the population variances?
d. Which approach results in values that are better estimates of s2: Part (b) or part
(c)? Why? When computing variances of samples, should you use division by n or
n 2 1?
e. The preceding parts show that s2 is an unbiased estimator of s2. Is s an unbiased
estimator of ?
42. Why Not Go MAD? Exercise 41 shows that the sample variance s2 is an unbiased es-
timator of s2. Do the following with the same population of {3, 6, 9} to show that the
mean absolute deviation of a sample is a biased estimator of the mean absolute devia-
tion of a population.
a. Find the mean absolute deviation of the population {3, 6, 9}.
b. List the nine different possible samples of two values selected with replacement,
then find the mean absolute deviation for each of them. If you repeatedly select
two sample values, what is the mean value of the mean absolute deviations?
c. Based on the results of parts (a) and (b), does the mean absolute deviation of a
sample tend to target the mean absolute deviation of the population? Does division
by n 2 1 instead of division by n make the mean absolute deviation an unbiased
estimate of the mean absolute deviation for the population?

An Addison-Wesley product. Copyright © 2004, Pearson Education, Inc.

You might also like