Professional Documents
Culture Documents
ECOE 1302 Spring 2017 2slide
ECOE 1302 Spring 2017 2slide
ECOE 1302 Spring 2017 2slide
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 3
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1
2/13/2017
MEASURES OF CENTER
Section 3.1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Compute the mean of a data set
2. Compute the median of a data set
3. Compare the properties of the mean and median
4. Find the mode of a data set
5. Approximate the mean with grouped data
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2
2/13/2017
OBJECTIVE 1
Compute the mean of a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
3
2/13/2017
Population Mean: 𝝁
Sample Mean: 𝒙
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
∑𝑥𝑖
If 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a sample, then the sample mean is given by 𝑥 =
𝑛
∑𝑥𝑖
If 𝑥1 , 𝑥2 , … , 𝑥𝑁 is a population, then the population mean is given by 𝜇 = 𝑁
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4
2/13/2017
Example - Mean
During a semester, a student took five exams. The population of exam
scores is 78, 83, 92, 68, and 85. Find the mean.
Solution:
∑𝑥𝑖 78 + 83 + 92 + 68 + 85 406
The mean is given by 𝜇 = 𝑁
= 5 = 5 = 81.2.
Note that the mean is rounded to one more decimal place than the
original data. This is generally considered good practice.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Compute the median of a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
5
2/13/2017
Median
The median is another measure of center. The median is a number
that splits the data set in half, so that half the data values are less than
the median and half of the data values are greater than the median.
If n is odd: If n is even:
* * * * * * * * * * * * * * *
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Examples - Median
Example:
During a semester, a student took five exams. The population of exam scores
is 78, 83, 92, 68, and 85. Find the median of the exam scores.
Solution:
Arrange the data values in increasing order: 68 78 83 85 92
The median is the middle number, 83.
Example:
Eight patients undergo a new surgical procedure and the number of days spent
in recovery for each is as follows. Find the median number of days in recovery.
20 15 12 27 13 19 13 21
Solution:
Arrange the data values in increasing order: 12 13 13 15 19 20 21 27.
15 + 19
The median is the average of the two middle numbers: = 17.
2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
6
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Mean
Step 2: Press STAT and highlight the CALC menu.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
7
2/13/2017
OBJECTIVE 3
Compare the properties of the mean and median
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Resistant
A statistic is resistant if its value is not affected much by extreme values (large
or small) in the data set. The median is resistant, but the mean is not.
Example:
Five families have annual incomes of $25,000, $31,000, $34,000, $44,000 and
$56,000. One family, whose income is $25,000, wins a million dollar lottery, so
their income increases to $1,025,000.
Before the lottery win, the mean and median are:
Mean = $38,000 Median = $34,000
After the lottery win, the mean and median are:
Mean = $238,000 Median = $44,000
The extreme value of $1,025,000 influences the mean quite a lot; increasing it
from $38,000 to $238,000. In comparison, the median has been influenced
much less increasing from $34,000 to $44,000. That is, the median is resistant.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
8
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 4
Find the mode of a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
9
2/13/2017
Mode
Another value that is sometimes classified as a measure of center is the mode.
• The mode of a data set is the value that appears most frequently.
• If two or more values are tied for the most frequent, they are all considered
to be modes.
• If the values all have the same frequency, we say that the data set has no
mode.
Example:
Ten students were asked how many siblings they had. The results, arranged in
order, were 0 1 1 1 1 2 2 3 3 6. Find the mode of this data set.
Solution:
The value that appears most frequently is 1. Therefore, the mode is 1.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Measure of Center?
The mode is sometimes classified as a measure of center. However, this isn’t
really accurate. The mode can be the largest value in a data set, or the
smallest, or anywhere in between.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
10
2/13/2017
Example:
Following is a list of the makes of all the cars rented by an automobile rental
company on a particular day. Which make of car is the mode?
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 5
Approximate the mean using grouped data
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
11
2/13/2017
Step 1: Compute the midpoint of each class. The midpoint of a class is found
by taking the average of the lower class limit and the lower limit of the
next larger class.
Step 2: For each class, multiply the class midpoint by the class frequency.
Step 4: Divide the sum obtained in Step 3 by the sum of the frequencies.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The following table presents the number of text messages sent via cell phone
by a sample of 50 high school students. Approximate the mean number of
messages sent.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
12
2/13/2017
Solution
Step 1: Compute the midpoint of each class.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 2: For each class, multiply the class midpoint by the class frequency.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
13
2/13/2017
Solution
Step 3: Add the products (Midpoint)x(Frequency) over all classes.
Frequency (Midpoint)x
(Frequency)
10 250 Σ(Midpoint ×Frequency)
5 375 = 250 + 375 + 1625 + 1925 + 1575 + 1100
13 1625 = 6850
11 1925
7 1575
4 1100
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 4: Divide the sum obtained in Step 3 by the sum of the frequencies.
Frequency (Midpoint)x
(Frequency)
10 250 ∑(Midpoint×Frequency)
Approximate Mean =
∑Frequency
5 375
6850
13 1625 = 50
11 1925 = 137
7 1575
4 1100
50 6850 Sums
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
14
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
15
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
MEASURES OF SPREAD
Section 3.2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
16
2/13/2017
Objectives
1. Compute the range of a data set
2. Compute the variance of a population and a sample
3. Compute the standard deviation of a population and a sample
4. Approximate the standard deviation with grouped data
5. Use the Empirical Rule to summarize data that are unimodal and
approximately symmetric
6. Use Chebyshev’s Inequality to describe a data set
7. Compute the coefficient of variation
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Compute the range of a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
17
2/13/2017
The Range
The range of a data set is the difference between the largest value and
the smallest value.
San Francisco 51 54 55 56 58 60 60 61 63 62 58 52
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Compute the variance of a population and a
sample
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
18
2/13/2017
Variance
When a data set has a small amount of spread, like the San Francisco
temperatures, most of the values will be close to the mean. When a
data set has a larger amount of spread, more of the data values will be
far from the mean.
The variance is a measure of how far the values in a data set are from
the mean, on the average.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Population Variance
∑ 𝑥𝑖 − 𝜇 2
𝜎2 =
𝑁
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
19
2/13/2017
San Francisco 51 54 55 56 58 60 60 61 63 62 58 52
Solution:
Step 1: Compute the population mean 𝜇.
∑𝑥𝑖 51+54+55+56+58+60+60+61+63+62+58+52
𝜇= =
𝑁 12
= 57.5
Step 2: For each population value 𝑥𝑖 compute 𝑥𝑖 − 𝜇. These values are
shown in the second row below.
𝑥𝑖 51 54 55 56 58 60 60 61 63 62 58 52
𝒙𝒊 − 𝝁 –6.5 –3.5 –2.5 –1.5 0.5 2.5 2.5 3.5 5.5 4.5 0.5 –5.5
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑥𝑖 − 𝜇 –6.5 –3.5 –2.5 –1.5 0.5 2.5 2.5 3.5 5.5 4.5 0.5 –5.5
𝟐
𝒙𝒊 − 𝝁 42.25 12.25 6.25 2.25 0.25 6.25 6.25 12.25 30.25 20.25 0.25 30.25
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
20
2/13/2017
Sample Variance
When the data values come from a sample rather than a population, the
variance is called the sample variance. The procedure for computing
the sample variance is a bit different from the one used to compute a
population variance. In the formula, the mean 𝜇 is replaced by the
sample mean 𝑥 and the denominator is 𝑛 − 1 instead of 𝑁. The sample
variance is denoted by 𝑠 2 .
Sample Variance
∑ 𝑥𝑖 − 𝑥 2
𝑠2 =
𝑛−1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why Divide by 𝑛 − 1 ?
When computing the sample variance, we use the sample mean to
compute the deviations. For the population variance we use the
population mean for the deviations.
It turns out that the deviations using the sample mean tend to be a bit
smaller than the deviations using the population mean. If we were to
divide by 𝑛 when computing a sample variance, the value would tend to
be a bit smaller than the population variance.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
21
2/13/2017
Solution:
3+4+6+5+4+2
The sample mean is 𝑥 = = 4.
6
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Compute the standard deviation of a population
and a sample
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
22
2/13/2017
Standard Deviation
Because the variance is computed using squared deviations, the units
of the variance are the squared units of the data. For example, in the
Battery Lifetime example, the units of the data are hours, and the units
of variance are squared hours. In most situations, it is better to use a
measure of spread that has the same units as the data.
𝑠 = 𝑠2 𝜎 = 𝜎2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The population standard deviation is 𝜎 = 𝜎 2 = 14.083 = 3.753.
Example:
The variance of the lifetimes for a sample of six batteries 𝑠 2 = 2. Find the
sample standard deviation.
Solution:
The sample standard deviation is 𝑠 = 𝑠 2 = 2 = 1.414.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
23
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
24
2/13/2017
OBJECTIVE 4
Approximate the standard deviation using
grouped data
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Step 1: Compute the midpoint of each class and approximate the mean of the frequency
distribution.
Step 2: For each class, subtract the mean from the class midpoint to obtain (Midpoint – Mean).
Step 3: For each class square the difference obtained in Step 2 to obtain (Midpoint – Mean)2,
and multiply by the frequency to obtain
(Midpoint – Mean)2 x (Frequency).
Step 4: Add the products (Midpoint – Mean)2 x (Frequency) over all classes.
Step 5: To compute the population variance, divide the sum obtained in Step 4 by 𝑛. To
compute the sample variance, divide the sum obtained in Step 4 by
𝑛 – 1.
Step 6: Take the square root of the variance obtained in Step 5. The result is the standard
deviation.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
25
2/13/2017
Example
The following table presents the number of text messages sent via cell
phone by a sample of 50 high school students. Approximate the sample
standard deviation number of messages sent.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 1: Compute the midpoint of each class. Recall from the last
section that the sample mean was computed as 137.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
26
2/13/2017
Solution
Step 2: For each class, subtract mean from the class midpoint to
obtain (Midpoint – Mean).
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 3: For each class, square the differences obtained in Step 2 to
obtain (Midpoint – Mean)2, and multiply by the frequency to
obtain (Midpoint – Mean)2 x (Frequency).
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
27
2/13/2017
Solution
Step 4: Add the products (Midpoint – Mean)2 x (Frequency) over all
classes.
(Midpoint – Mean)2 x
(Frequency)
125,440 ∑ Midpoint−Mean 2 × Frequency
19,220 = 125,440 + 19,220 + 1,872 + 15,884 + 54,208 + 76,176
1,872
= 292,800
15,884
54,208
76,176
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 5: Since we are computing the sample variance, we divide the
sum obtained in Step 4 by 𝑛 – 1.
Step 6: Take the square root of the variance to obtain the standard
deviation.
𝑠= 𝑠 2 = 5975.51020 = 77.30142
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
28
2/13/2017
Enter the midpoint for each class into L1 and the corresponding frequencies in
L2. Next, select the 1-Var stats command and enter L1 in the List field and
L2 in the FreqList field, if using Stats Wizards. If you are not using Stats
Wizards, you may run the1-Var Stats command followed by L1, comma, L2.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Class Midpoint Frequency The output for the last example on the TI-84
25 10 PLUS Calculator is presented below.
75 5
125 13 The value of s represents the approximate
175 11 sample standard deviation. In this example
s = 77.30142. Therefore the approximate
225 7
standard deviation is 77.30142.
275 4
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
29
2/13/2017
OBJECTIVE 5
Use the Empirical Rule to summarize data that
are unimodal and approximately symmetric
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bell-Shaped Histogram
Many histograms have a single mode near the center of the data, and
are approximately symmetric. Such histograms are often referred to as
bell-shaped.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
30
2/13/2017
• Approximately 68% of the data will be within one standard deviation of the mean.
• Approximately 95% of the data will be within two standard deviations of the mean.
• All, or almost all, of the data will be within three standard deviations of the mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
14.1 14.3 14.4 17.8 12.0 14.9 12.6 13.7 12.8 13.8 13.7 12.4 13.8
14.1 13.3 14.3 16.0 8.1 11.5 14.1 10.2 12.4 13.4 15.6 12.8 13.9
12.3 14.1 15.3 13.0 13.6 10.5 12.4 13.5 13.9 10.7 11.5 14.3 12.7
13.1 12.2 12.4 15.0 12.6 13.6 13.7 15.5 14.6 9.0 12.2 14.0
Solution:
We first note that the histogram is
approximately bell-shaped and we
may use the TI-84 PLUS
calculator, or other technology, to
compute the population mean and
standard deviation. Mean: 𝝁 = 𝟏𝟑. 𝟐𝟒𝟗
Standard Deviation: 𝝈 = 𝟏. 𝟔𝟖𝟐𝟕
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
31
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 6
Use Chebyshev’s Inequality to describe a data
set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
32
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chebyshev’s Inequality
In any data set, the proportion of the data that is within K standard deviations
of the mean is at least 1 – 1/K2. Specifically, by setting K = 2 or K = 3, we
obtain the following results.
• At least 3/4, or 75%, of the data are within two standard deviations of the
mean.
• At least 8/9, or 89%, of the data are within three standard deviations of
the mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
33
2/13/2017
Solution:
We compute the following:
We conclude:
• At least 3/4 (75%) of the people had systolic blood pressures between 100 and 140.
• At least 8/9 (89%) of the people had systolic blood pressures between 90 and 150.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 7
Compute the coefficient of variation
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
34
2/13/2017
Coefficient of Variation
The coefficient of variation (CV for short) tells how large the standard
deviation is relative to the mean. It can be used to compare the spreads
of data sets whose values have different units.
𝝈
CV =
𝝁
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The CV for precipitation is larger than the CV for temperature. Therefore, precipitation
has a greater spread relative to its mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
35
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
MEASURES OF POSITION
Section 3.3
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
36
2/13/2017
Objectives
1. Compute and interpret 𝑧-scores
2. Compute the quartiles of a data set
3. Compute the percentiles of a data set
4. Compute the five-number summary for a data set
5. Understand the effects of outliers
6. Construct boxplots to visualize the five-number summary and
outliers
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Compute and interpret 𝑧-scores
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
37
2/13/2017
𝑍-Score
Who is taller, a man 73 inches tall or a woman 68
inches tall? The obvious answer is that the man is
taller. However, men are taller than women on the
average.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑍-Score
The 𝑧-score of an individual data value tells how many standard
deviations that value is from its population mean.
For example, a value one standard deviation above the mean has a 𝑧-
score of 𝑧 = 1 and a value two standard deviations below the mean has
a 𝑧-score of 𝑧 = –2.
Z-Score
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
38
2/13/2017
Example: Z-Score
A National Center for Health Statistics study states that
the mean height for adult men in the U.S. is 𝜇 = 69.4
inches, with a standard deviation of 𝜎 = 3.1 inches. The
mean height for adult women is 𝜇 = 63.8 inches, with a
standard deviation of 𝜎 = 2.8 inches. Who is taller relative
to their gender, a man 73 inches tall, or a woman 68
inches tall?
Taller, relative to
We compute: the population of
𝑥−𝜇 73−69.4 women’s heights.
𝑧Man’s Height = 𝜎
= 3.1
= 1.16
𝑥−𝜇 68−63.8
𝑧Woman’s Height = = = 1.50
𝜎 2.8
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
39
2/13/2017
OBJECTIVE 2
Compute the quartiles of a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Quartiles
In a previous section, we learned how to compute the mean and
median of a data set as measures of the center. Sometimes, it is useful
to compute measures of position other than the center to get a more
detailed description of the distribution. Quartiles divide a data set into
four approximately equal pieces.
Quartiles
Every data set has three quartiles:
• The first quartile, denoted 𝑄1 separates the lowest 25% of the data from
the highest 75%.
• The second quartile, denoted 𝑄2 separates the lowest 50% of the data
from the highest 50%. 𝑄2 is the same as the median.
• The third quartile, denoted 𝑄3 separates the lowest 75% of the data from
the highest 25%.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
40
2/13/2017
Computing Quartiles
There are several methods for computing quartiles, all of which give similar
results. The following procedure is one fairly straightforward method:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
41
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
42
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Compute the percentiles of a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
43
2/13/2017
Percentiles
Quartiles describe the shape of a distribution by dividing it into fourths.
Sometimes it is useful to divide a data set into a greater number of
pieces to get a more detailed description of the distribution. Percentiles
divide a data set into hundredths.
Percentiles
For a number p between 1 and 99, the
pth percentile separates the lowest p%
of the data from the highest (1– p)%.
p% (100 – p)%
pth percentile
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Computing Percentiles
The following procedure computes the pth percentile of a data set:
Step 2: Let 𝑛 be the number of values in the data set. For the pth percentile,
𝑝
compute 𝐿 = 100 𝑛.
Step 3: If 𝐿 is a whole number, the pth percentile is the average of the number
in position 𝐿 and the number in position 𝐿 + 1.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
44
2/13/2017
0.00 0.08 0.13 0.14 0.16 0.17 0.20 0.29 0.56 0.67 0.70 0.92 1.22 1.30 1.48
1.64 1.72 1.90 2.37 2.58 2.84 3.06 3.12 3.21 3.29 3.54 3.57 3.71 4.13 4.27
4.37 4.64 4.89 4.94 5.54 6.10 6.61 7.89 7.96 8.03 8.87 8.91 11.02 12.75 13.68
Solution:
The data are already in increasing order. There are 𝑛 = 45 values. For the 60th percentile
60
we compute 𝐿 = 45 = 27 . Since 27 is a whole number, the 60th percentile is the
100
average of the numbers in the 27th and 28th positions. We see that the 60th percentile is
𝟑.𝟓𝟕+𝟑.𝟕𝟏
60th Percentile = = 𝟑. 𝟔𝟒
𝟐
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
45
2/13/2017
0.00 0.08 0.13 0.14 0.16 0.17 0.20 0.29 0.56 0.67 0.70 0.92 1.22 1.30 1.48
1.64 1.72 1.90 2.37 2.58 2.84 3.06 3.12 3.21 3.29 3.54 3.57 3.71 4.13 4.27
4.37 4.64 4.89 4.94 5.54 6.10 6.61 7.89 7.96 8.03 8.87 8.91 11.02 12.75 13.68
Solution:
The data are already in increasing order. There are 𝑛 = 45 values in the data set. There
are 17 values less than 1.90. Therefore,
We round the result to 39. The value 1.90 corresponds to the 39th percentile.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 4
Compute the five-number summary for a data set
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
46
2/13/2017
Five-Number Summary
The five-number summary of a data set consists of the median, the
first quartile, the third quartile, the smallest value, and the largest value.
These values are generally arranged in order.
Definition:
The five-number summary of a data set consists of the following quantities:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
0.00 0.08 0.13 0.14 0.16 0.17 0.20 0.29 0.56 0.67 0.70 0.92 1.22 1.30 1.48
1.64 1.72 1.90 2.37 2.58 2.84 3.06 3.12 3.21 3.29 3.54 3.57 3.71 4.13 4.27
4.37 4.64 4.89 4.94 5.54 6.10 6.61 7.89 7.96 8.03 8.87 8.91 11.02 12.75 13.68
Solution:
We previously computed the quartiles:
𝑸𝟏 = 0.92 Med = 𝑸𝟐 = 3.12 𝑸𝟑 = 4.94
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
47
2/13/2017
0.00 0.08 0.13 0.14 0.16 0.17 0.20 0.29 0.56 0.67 0.70 0.92 1.22 1.30 1.48
1.64 1.72 1.90 2.37 2.58 2.84 3.06 3.12 3.21 3.29 3.54 3.57 3.71 4.13 4.27
4.37 4.64 4.89 4.94 5.54 6.10 6.61 7.89 7.96 8.03 8.87 8.91 11.02 12.75 13.68
Solution:
When using the TI-84 PLUS Calculator, the five-number summary is given by the 1-Var
Stats command.
Five-number summary
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 5
Understand the effects of outliers
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
48
2/13/2017
Outliers
An outlier is a value that is considerably larger or considerably smaller than
most of the values in a data set. Some outliers result from errors; for example a
misplaced decimal point may cause a number to be much larger or smaller than
the other values in a data set. Some outliers are correct values, and simply
reflect the fact that the population contains some extreme values.
Example:
The temperature in a downtown location is measured for eight consecutive days
during the summer. The readings, in Fahrenheit, are
81.2 85.6 89.3 91.0 83.2 8.45 79.5 87.8
Which reading is an outlier? Is the outlier an error or is it possible that it is
correct?
Solution:
The outlier is 8.45. It certainly is an error, likely resulting from a misplaced
decimal point. The outlier should be corrected if possible.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Interquartile Range
One method for detecting outliers involves a measure called the
Interquartile Range.
Interquartile Range
Definition:
The interquartile range is found by subtracting the first quartile from the third
quartile.
IQR = 𝑸𝟑 − 𝑸𝟏
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
49
2/13/2017
Step 3: Compute the outlier boundaries. These boundaries are the cutoff
points for determining outliers:
Lower Outlier Boundary = 𝑄1 – 1.5(IQR)
Upper Outlier Boundary = 𝑄3 + 1.5(IQR)
Step 4: Any data value that is less than the lower outlier boundary or
greater than the upper outlier boundary is considered to be an
outlier.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The following table presents the number of students absent in a middle school in
northwestern Montana for each school day in January. Identify any outliers.
65 67 71 57 51 49 44 41 59 49 42 56
45 77 44 42 45 46 100 59 53 51
Solution:
We may use the TI-84 PLUS or other technology to compute
the quartiles. The interquartile range is IQR = 𝑄3 − 𝑄1 = 59 – 41 =14.
There are no values less than the lower boundary of 24. The value 100 is greater than the
upper boundary. Therefore, the value 100 is an outlier.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
50
2/13/2017
OBJECTIVE 6
Construct boxplots to visualize the five-number
summary and outliers
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Boxplot
A boxplot is a graph that presents the five-number summary along with
some additional information about a data set. There are several
different kinds of boxplots. The one we describe here is sometimes
called a modified boxplot.
* *
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
51
2/13/2017
Example – Boxplot
The following table presents the number of students absent in a middle school in
northwestern Montana for each school day in January. Construct a boxplot.
65 67 71 57 51 49 44 41 59 49 42 56
45 77 44 42 45 46 100 59 53 51
Solution:
Step 1:
We may use the TI-84 PLUS or other technology to compute
the quartiles. We see that 𝑄1 = 45, Med = 51, and 𝑄3 = 59.
Step 2:
We draw vertical lines at 45, 51, and 59, then horizontal lines to complete the box.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Boxplot
Step 3:
We compute the outlier boundaries:
Lower Outlier Boundary = 𝑄1 – 1.5(IQR) = 24
Upper Outlier Boundary = 𝑄3 + 1.5(IQR) = 80
Step 4:
The largest data value that is less than the upper boundary is 77. We draw a
horizontal line from 59 up to 77.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
52
2/13/2017
Solution
Step 5:
The smallest data value that is greater than the lower boundary is 41. We draw a
horizontal line from 45 down to 41.
Step 6:
The data value 100 lies outside of the outlier boundaries. Therefore, 100 is an outlier. We
plot this point separately.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
53
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
54
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
55
2/13/2017
Chapter 6
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
RANDOM VARIABLES
Section 6.1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
56
2/13/2017
Objectives
1. Distinguish between discrete and continuous random variables
2. Determine a probability distribution for a discrete random variable
3. Describe the connection between probability distributions and
populations
4. Construct a probability histogram for a discrete random variable
5. Compute the mean of a discrete random variable
6. Compute the variance and standard deviation of a discrete random
variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Distinguish between discrete and continuous
random variables
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
57
2/13/2017
Random Variable
If we roll a fair die, the possible outcomes are the
numbers 1, 2, 3, 4, 5, and 6, and each of these
numbers has probability 1/6. Rolling a die is a
probability experiment whose outcomes are
numbers. The outcome of such an experiment is
called a random variable.
Random Variable
A random variable is a numerical outcome of a probability
experiment.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
58
2/13/2017
OBJECTIVE 2
Determine a probability distribution for a discrete
random variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Probability Distribution
Probability Distribution
A probability distribution for a discrete random variable specifies
the probability for each possible value of the random variable.
Properties:
• 0 ≤ 𝑃 𝑥 ≤ 1 for every possible 𝑥
• ∑𝑃 𝑥 = 1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
59
2/13/2017
𝒙 1 2 3 4
𝑷(𝒙) 0.25 0.65 –0.30 0.11
𝒙 –1 –0.5 0 0.5 1
𝑷(𝒙) 0.17 0.25 0.31 0.22 0.05
This is a probability distribution. All the probabilities are between 0 and 1, and
they add up to 1.
𝒙 1 10 100 1000
𝑷(𝒙) 1.02 0.31 0.90 0.43
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
60
2/13/2017
OBJECTIVE 3
Describe the connection between probability
distributions and populations
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
61
2/13/2017
Example
An airport parking facility contains 1000 parking spaces. Of these, 142 are covered long-
term spaces that cost $2.00 per hour, 378 are covered short-term spaces that cost $4.50
per hour, 423 are uncovered long-term spaces that cost $1.50 per hour, and 57 are
uncovered short-term spaces that cost $4.00 per hour. A parking space is selected at
random. Let 𝑋 represent the hourly parking fee for the randomly sampled space. Find
the probability distribution of 𝑋.
Solution:
To find the probability distribution, we must list the possible values of 𝑋 and then find the
probability of each of them. The possible values of 𝑋 are 1.50, 2.00, 4.00, 4.50. We find
their probabilities.
# of spaces costing $1.50 423 The probability distribution is:
𝑃 1.50 = = = 0.423
total # of spaces 1000 𝒙 𝑷(𝒙)
# of spaces costing $2.00 142
𝑃 2.00 = = = 0.142 1.50 0.423
total #of spaces 1000
# of spaces costing $4.00 57 2.00 0.142
𝑃 4.00 = = = 0.057
total # of spaces 1000
# of spaces costing $4.50 378 4.00 0.057
𝑃 4.50 = = = 0.378
total # of spaces 1000 4.50 0.378
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 4
Construct a probability histogram for a discrete
random variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
62
2/13/2017
Probability Histograms
In an earlier chapter we learned to summarize the data in a sample with a histogram. We
can represent discrete probability distributions with histograms as well. A histogram that
represents a discrete probability distribution is called a probability histogram.
Example:
The following presents the probability distribution and histogram for the number of boys in
a family of five children, using the assumption that boys and girls are equally likely and
that births are independent events.
𝒙 𝑷(𝒙)
0 0.03125
1 0.15625
2 0.31250
3 0.31250
4 0.15625
5 0.03125
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 5
Compute the mean of a discrete random variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
63
2/13/2017
𝜇𝑋 = ∑[𝑥 ∙ 𝑃 𝑥 ]
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The mean is 𝜇𝑋 = 0 0.2 + 1 0.5 + 2 0.2 + 3 0.1 = 1.2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
64
2/13/2017
Expected Value
There are many occasions on which people want to predict how much they are likely to
gain or lose if they make a certain decision or take a certain action. Often, this is done by
computing the mean of a random variable. In such situations, the mean is sometimes
called the “expected value” and is denoted by 𝐸(𝑋). If the expected value is positive, it is
an expected gain, and if it is negative, it is an expected loss.
Example:
A mineral economist estimated that a particular venture had probability 0.4 of a $30 million
loss, probability 0.5 of a $20 million profit, and probability 0.1 of a $40 million profit. Let 𝑋
represent the profit. Find the probability distribution of the profit and the expected value of
the profit. Does this venture represent an expected gain or an expected loss?
Solution:
The probability distribution of 𝑋 is The expected value is
𝐸(𝑋) = (−30)(0.4) + (20)(0.5) + (40)(0.1) = 2.0
𝒙 –30 20 40
𝑷(𝒙) 0.4 0.5 0.1 There is an expected gain of $2 million.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 6
Compute the variance and standard deviation of
a discrete random variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
65
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
66
2/13/2017
Solution:
We first enter values of the random variable and
the associated probabilities into the data editor
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
67
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Determine whether a random variable is binomial
2. Determine the probability distribution of a binomial random variable
3. Compute binomial probabilities
4. Compute the mean and variance of a binomial random variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
68
2/13/2017
OBJECTIVE 1
Determine whether a random variable is binomial
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Binomial Distribution
Suppose that your favorite fast food chain
is giving away a coupon with every purchase
of a meal. Twenty percent of the coupons entitle
you to a free hamburger, and the rest of them
say “better luck next time.” Ten of you order
lunch at this restaurant.
In this section, we will learn that 𝑋 has a distribution called the binomial
distribution, which is one of the most useful probability distributions.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
69
2/13/2017
Binomial Distribution
In the problem just described, each time we examine a coupon, we call it a
“trial,” so there are 10 trials. When a coupon is good for a free hamburger, we
will call it a “success.” The random variable 𝑋 represents the number of
successes in 10 trials.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
70
2/13/2017
OBJECTIVE 2
Determine the probability distribution of a
binomial random variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Now, 𝑃(2) = 𝑃(HHT or HTH or THH) = 3(0.6)2(0.4), by the Addition Rule. Examining this
result, we see the number 3 represents the number of arrangements of two successes
(heads) and one failure (tails). In general, this number will be the number of arrangements of
𝑥 successes in 𝑛 trials, which is 𝑛𝐶𝑥. The number 0.6 is the success probability 𝑝 which has
an exponent of 2, the number of successes 𝑥. The number 0.4 is the failure probability 1 − 𝑝
which has an exponent of 1, which is the number of failures, 𝑛 − 𝑥.
𝑃 𝑥 = 𝑛𝐶𝑥 ∙ 𝑝𝑥 ∙ 1 − 𝑝 𝑛−𝑥
71
2/13/2017
OBJECTIVE 3
Compute binomial probabilities
(Hand Computation)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
a) Find the probability that exactly four of the sampled people own a tablet
computer.
b) Find the probability that fewer than three of the people own a tablet
computer.
c) Find the probability that more than one person owns a tablet computer.
d) Find the probability that the number of people who own a tablet computer is
between 1 and 4, inclusive.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
72
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
73
2/13/2017
P (1 or 2 or 3 or 4) = C
15 1∙(0.3)
1 ∙(1–0.3)15-1 + C
15 2∙(0.3) ∙(1–0.3)
2 15-2
+ C 3 C
15 3∙(0.3) ∙(1–0.3)
15-3 + 15 4∙(0.3)4 ∙(1–0.3)15-4
= 0.0305 + 0.0916 + 0.1700 + 0.2186
= 0.511
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
74
2/13/2017
OBJECTIVE 3
Compute binomial probabilities
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
a) Find the probability that exactly five of the sampled people own a tablet
computer.
b) Find the probability that fewer than four of the people own a tablet
computer.
c) Find the probability that the number of people who own a tablet computer is
between 6 and 8, inclusive.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
75
2/13/2017
𝑃(5) = 0.206
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
76
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Compute binomial probabilities
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
77
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
binomcdf
To compute the probability that the random variable 𝑋 is less
than or equal to the value 𝑥 given the parameters 𝑛 and 𝑝, use
the binomcdf command with the following format:
binomcdf(n,p,x)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
78
2/13/2017
a) Find the probability that exactly four of the sampled people own a tablet
computer.
b) Find the probability that fewer than three of the people own a tablet
computer.
c) Find the probability that more than one person owns a tablet computer.
d) Find the probability that the number of people who own a tablet computer is
between 1 and 4, inclusive.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
79
2/13/2017
The binomcdf command computes the probability that there are less than or
equal to 𝑥 successes. The event “fewer than three” is equivalent to “less than
or equal to two”.
We run the command binomcdf(15, 0.3, 2) to find
that the probability that that fewer than three of the
people own a tablet computer is 0.1268.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
80
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 4
Compute the mean and variance of a binomial
random variable
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
81
2/13/2017
The variance of 𝑋 is
𝜎𝑥2 = 𝑛𝑝(1 − 𝑝)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The probability that a new car of a certain model will require repairs during
the warranty period is 0.15. A particular dealership sells 25 such cars. Let
𝑋 be the number that will require repairs during the warranty period.
Solution:
There are 𝑛 = 25 trials, with success probability 𝑝 = 0.15.
The mean is
𝜇𝑥 = 𝑛𝑝 = 25 0.15
= 3.75
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
82
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 7
The Normal Distribution
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
83
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Use a probability density curve to describe a population
2. Use a normal curve to describe a normal population
3. Find areas under the standard normal curve
4. Find 𝑧-scores corresponding to areas under the normal curve
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
84
2/13/2017
OBJECTIVE 1
Use a probability density curve to describe a
population
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
85
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
86
2/13/2017
OBJECTIVE 2
Use a normal curve to describe a normal
population
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Normal Curves
Probability density curves come in many varieties, depending on the
characteristics of the populations they represent. Many important
statistical procedures can be carried out using only one type of
probability density curve, called a normal curve.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
87
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
88
2/13/2017
OBJECTIVE 3
Find areas under the standard normal curve
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
89
2/13/2017
𝑍-scores
When finding an area under the standard normal curve, we use the letter 𝑧 to
indicate a value on the horizontal axis beneath the curve. We refer to such a
value as a 𝒛-score.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
90
2/13/2017
Solution:
Step 1: Sketch a normal curve, label the point
𝑧 = 1.26, and shade in the area to the
left of it.
Step 2: Consult Table A.2. To look up 𝑧 = 1.26, find the row containing 1.2
and the column containing 0.06. The value in the intersection of the
row and column is 0.8962. This is the area to the left of 𝑧 = 1.26.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
Step 1: Sketch a normal curve, label the point
𝑧 = – 0.58, and shade in the area to
the right of it.
Step 2: Consult Table A.2. To look up 𝑧 = – 0.5 and the column containing
0.08. The value in the intersection of the row and column is 0.2810.
This is the area to the left of 𝑧 = – 0.58.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
91
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Step 2: Use Table A.2 to find the areas to the left of 𝑧 = –1.45 and to the left of
𝑧 = 0.42. The area to the left of 𝑧 = –1.45 is 0.6628 and the area to
the left of 𝑧 = 0.42 is 0.0735.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
92
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Find areas under the standard normal curve
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
93
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑍-scores
When finding an area under the standard normal curve, we use the letter 𝑧 to
indicate a value on the horizontal axis beneath the curve. We refer to such a
value as a 𝒛-score.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
94
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
Note the there is no lower endpoint, therefore we use -1E99 which
represents negative 1 followed by 99 zeroes.
We select the normalcdf command and enter -1E99 as the lower
endpoint, 1.26 as the upper endpoint, 0 as the mean and 1 as the
standard deviation.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
95
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We select the normalcdf command and enter –1.45 as the lower
endpoint, 0.42 as the upper endpoint, 0 as the mean and 1 as the
standard deviation.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
96
2/13/2017
OBJECTIVE 4
Find 𝑧-scores corresponding to areas under the
normal curve
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
97
2/13/2017
Step 2: Look through the body of Table A.2 to find the area closest to 0.26.
This value is 0.2611, which correspond to the 𝑧-score 𝑧 = –0.64.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
Step 1: Sketch a normal curve and shade
in the given area.
Step 2: Determine the area to the left of the 𝑧-score. Since the area to the
right is 0.68, the area to the left is 1 – 0.68 = 0.32.
Step 3: Look through the body of Table A.2 to find the area closest to 0.32.
This value is 0.3192, which corresponds to the 𝑧-score 𝑧 = –0.47.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
98
2/13/2017
Solution:
Step 1: Sketch a normal curve and shade in the
given area. Label the 𝑧-score on the
left 𝑧1 and the 𝑧-score on the right 𝑧2.
Step 2: Find the area to the left of 𝑧1. Since the area in the middle is 0.95,
the area in the two tails combined is 0.05. Half of that area, which is
0.025, is to the left of 𝑧1.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
99
2/13/2017
OBJECTIVE 4
Find 𝑧-scores corresponding to areas under the
normal curve
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
100
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We select the invNorm command and
enter 0.26 as the area to the left, 0 as
the mean, and 1 as the standard deviation.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
101
2/13/2017
Solution:
Since the area to the right is 0.68, the area to the left is 1 – 0.68 = 0.32.
We use the invNorm command with 0.32 as the area to the left, 0 as the mean,
and 1 as the standard deviation.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We first sketch a normal curve and shade in the given
area. Label the 𝑧-score on the left 𝑧1 and the 𝑧-score
on the right 𝑧2.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
102
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
APPLICATIONS OF
THE NORMAL DISTRIBUTION
Section 7.2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
103
2/13/2017
Objectives
1. Convert values from a normal distribution to 𝑧-scores
2. Find areas under a normal curve
3. Find the value from a normal distribution corresponding to a given
proportion
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Convert values from a normal distribution to 𝑧-scores
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
104
2/13/2017
Standardization
Recall that the 𝑧-score of a data value
represents the number of standard
deviations that data value is above or
below the mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Find areas under a normal curve
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
105
2/13/2017
Example:
A study reported that the length of pregnancy from conception to birth is
approximately normally distributed with mean 𝜇 = 272 days and standard
deviation 𝜎 = 9 days. What proportion of pregnancies last longer than 280 days?
Solution:
𝑥−𝜇 280−272
The 𝑧-score for 280 is 𝑧 = 𝜎 =
9
= 0.89. Using Table A.2, we find the area to
the left of 𝑧 = 0.89 to be 0.8133. The area to
the right is therefore 1 – 0.8133 = 0.1867. We
conclude that the proportion of pregnancies
that last longer than 280 days is 0.1867.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
𝑥−𝜇 252−272
The 𝑧-score for 252 is 𝑧 = = = −2.22.
𝜎 9
𝑥−𝜇 298−272
The 𝑧-score for 298 is 𝑧 = 𝜎
= 9
= 2.89.
Using Table A.2, we find that the area to the left of 𝑧 = 2.89 is 0.9981 and the
area to the left of 𝑧 = –2.22 is 0.0132. The area between 𝑧 = − 2.22 and 𝑧 = 2.89
is therefore 0.9981 – 0.0132 = 0.9849.
The proportion of pregnancies that are full-term, between 252 days and 298 days
is 0.9849.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
106
2/13/2017
OBJECTIVE 2
Find areas under a normal curve
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We use the normalcdf command with 280 as the lower endpoint, 1E99 as the
upper endpoint, 272 as the mean, and 9 as the standard deviation. We
conclude that the proportion of pregnancies that last longer than 280 days is
0.1870.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
107
2/13/2017
Solution:
We use the normalcdf command with 252 as the lower endpoint, 298 as the
upper endpoint, 272 as the mean, and 9 as the standard deviation. The
proportion of pregnancies that are full-term, between 252 days and 298 days, is
0.9849.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Find the value from a normal distribution
corresponding to a given proportion
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
108
2/13/2017
Example:
Heights in a group of men are normally distributed with mean 𝜇 = 69 inches and
standard deviation 𝜎 = 3 inches. Find the height whose 𝑧-score is 0.6. Interpret
the result.
Solution:
We want the height with a 𝑧-score of 0.6. Therefore,
𝑥 = 𝜇 + 𝑧 ∙ 𝜎 = 69 + (0.6)(3) = 70.8
We interpret this by saying that a man 70.8 inches tall has a height 0.6
standard deviations above the mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Step 1: Sketch a normal curve, label the mean, label the value 𝑥 to be found,
and shade in and label the given area.
Step 2: If the given area is on the right, subtract it from 1 to get the area on the
left.
Step 3: Look in the body of Table A.2 to find the area closest to the given area.
Find the 𝑧-score corresponding to that area.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
109
2/13/2017
Step 3: The area closest to 0.98 in Table A.2 is 0.9798, which corresponds to a
𝑧-score of 2.05.
Step 4: The IQ score that separates the upper 2% from the lower 98% is
𝑥 = 𝜇 + 𝑧 ∙ 𝜎 = 100 + (2.05)(15) = 130.75
Since IQ scores are generally whole numbers, we will round this to 𝑥 = 131.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Find the value from a normal distribution
corresponding to a given proportion
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
110
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
111
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Construct the sampling distribution of a sample mean
2. Use the Central Limit Theorem to compute probabilities involving
sample means
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
112
2/13/2017
OBJECTIVE 1
Construct the sampling distribution of a sample
mean
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
If several samples are drawn from a population, they are likely to have
different values for 𝑥. Because the value of 𝑥 varies each time a
sample is drawn, 𝒙 is a random variable. For each value of the
random variable, 𝑥, we can compute a probability. The probability
distribution of 𝑥 is called the sampling distribution of 𝑥.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
113
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
114
2/13/2017
Solution:
The mean of 𝑥 is:
𝜇𝑥 = 𝜇 = 10.5
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
115
2/13/2017
Remarkably, it is true that, for any population, if the sample size is large enough, the sample
mean 𝑥 will be approximately normally distributed. For a symmetric population like the
tetrahedral die population, the sample mean is approximately normally distributed even for a
small sample size like 𝑛 = 3.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Below are the probability histograms for the sampling distribution of 𝑥 for samples of size
3, 10, and 30. Note that the shapes of the distributions begin to approximate a normal
curve as the sample size increases.
The size of the sample needed to obtain approximate normality depends mostly on the
skewness of the population. In practice, a sample of size 𝑛 > 30 is large enough.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
116
2/13/2017
Let 𝑥 be the mean of a large (𝑛 > 30) simple random sample from a population
with mean 𝜇 and standard deviation 𝜎. Then 𝑥 has an approximately normal
𝜎
distribution, with mean 𝜇𝑥 = 𝜇 and standard deviation 𝜎𝑥 = 𝑛 .
The Central Limit Theorem applies for all populations. However, for symmetric
populations, a smaller sample size may suffice. If the population itself is
normal, the sample mean 𝒙 will be normal for any sample size.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 2:
A sample of size 8 will be drawn from a normal population with mean 𝜇 = –60 and standard
deviation 𝜎 = 5. Is it appropriate to use the normal distribution to find probabilities for 𝑥?
Solution:
Yes, since the population itself is approximately normal, 𝑥 has an approximately normal
distribution.
Example 3:
A sample of size 24 will be drawn from a population with mean 𝜇 = 35 and standard deviation
𝜎 = 1.2. Is it appropriate to use the normal distribution to find probabilities for 𝑥?
Solution:
No, since the population is not known to be normal and 𝑛 is not greater than 30, we cannot be
certain that 𝑥 has an approximately normal distribution.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
117
2/13/2017
OBJECTIVE 2
Use the Central Limit Theorem to compute
probabilities involving sample means
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The sample size is 125, which is greater than 30.
Therefore, we may use the normal curve.
We compute 𝜇𝑥 and 𝜎𝑥
𝜎 9.5
𝜇𝑥 = 𝜇 = 25 and 𝜎𝑥 = = = 0.85
𝑛 125
The probability that the sample mean age of the students is greater than 26
years is approximately 0.1197.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
118
2/13/2017
Solution:
The sample size is 100, which is greater than 30.
Therefore, we may use the normal curve.
We compute 𝜇𝑥 and 𝜎𝑥
𝜎 97
𝜇𝑥 = 𝜇 = 1135 and 𝜎𝑥 = 𝑛
= 100
= 9.7
This probability is less than 0.05, so it would be unusual for the sample mean to
be less than 1100.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
119
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Construct the sampling distribution for a sample proportion
2. Use the Central Limit Theorem to compute probabilities for sample
proportions
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
120
2/13/2017
OBJECTIVE 1
Construct the sampling distribution for a sample
proportion
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
35
• The sample proportion is 𝑝 = .
100
• The proportion of people in the entire city who own laptops is the population
proportion, 𝑝.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
121
2/13/2017
Sampling Distribution of 𝑝
If several samples are drawn from a population, they are likely to have
different values for 𝑝. Because the value of 𝑝 varies each time a sample
is drawn, 𝒑 is a random variable, and it has a probability distribution.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
122
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The population proportion is 𝑝 = 0.25 and the sample size is 𝑛 = 70.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
123
2/13/2017
When 𝑝 = 0.5, the sampling distribution of 𝑝 is somewhat close to normal even for
a small sample size like 𝑛 = 5. When 𝑝 is close to 0 or close to 1, a larger sample
size is needed before the distribution of 𝑝 is close to normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
124
2/13/2017
Example 2:
A sample of size 55 is drawn from a population with population proportion 𝑝 = 0.8.
Is it appropriate to use the normal distribution to find probabilities for 𝑝?
Solution:
Yes, since 𝑛𝑝 = 55 0.8 = 44 and 𝑛 1 − 𝑝 = 55 0.2 = 11 are both at least 10,
the distribution of 𝑝 is approximately normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Use the Central Limit Theorem to compute
probabilities for sample proportions
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
125
2/13/2017
Solution:
We first check the assumptions. 𝑛𝑝 = 100(0.27)
= 27 ≥ 10 and 𝑛 1 − 𝑝 = 100 1 − 0.27 = 73 ≥ 10.
We may use the normal curve.
We compute 𝜇𝑝 and 𝜎𝑝
𝑝 1−𝑝 0.27 1−0.27
𝜇𝑝 = 𝑝 = 0.27 and 𝜎𝑝 = = = 0.044396
𝑛 100
The probability that the sample proportion of those who prefer chocolate is greater
than 0.30 is 0.2496. [Answer using tables is 0.2483]
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We first check the assumptions. 𝑛𝑝 = 75(0.51)
= 38.25 ≥ 10 and 𝑛 1 − 𝑝 = 75 1 − 0.51 = 36.75
≥ 10. We may use the normal curve.
We compute 𝜇𝑝 and 𝜎𝑝
𝑝 1−𝑝 0.51 1−0.51
𝜇𝑝 = 𝑝 = 0.51 and 𝜎𝑝 = = = 0.057723
𝑛 75
The probability that the sample proportion of those who voted for Barack Obama is
less than 0.40 is 0.0283. It would be unusual for the sample proportion to be less
than 0.40. [Answer using tables is 0.0281]
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
126
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
127
2/13/2017
Objectives
1. Use the normal curve to approximate binomial probabilities
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Use the normal curve to approximate binomial
probabilities
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
128
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Normal Approximation
Binomial probabilities can be computed exactly using the techniques
described earlier. If the number of trials is large, using these methods by hand
is extremely difficult because many terms have to be calculated and added
together. If the following conditions are met, binomial probabilities can be
approximated using a normal distribution.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
129
2/13/2017
Continuity Correction
Suppose a fair coin is tossed 100 times, let 𝑋 represent the number of heads
that result. Then 𝑋 has a binomial distribution with 𝑛 = 100 trials and success
probability 𝑝 = 0.5. If we wanted to compute the probability 𝑋 is between 45
and 55 [ i.e. 𝑃(45 < 𝑋 < 55) ], the probability will differ depending on whether
the endpoints 45 and 55 are included.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
130
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑷(𝒂 ≤ 𝑿 ≤ 𝒃) 𝑷(𝑿 ≤ 𝒃)
𝑷(𝑿 ≥ 𝒂) 𝑷(𝑿 = 𝒂)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
131
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
132
2/13/2017
ASSESSING NORMALITY
Section 7.6
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Use dotplots to assess normality
2. Use boxplots to assess normality
3. Use histograms to assess normality
4. Use stem-and-leaf plots to assess normality
5. Use normal quantile plots to assess normality
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
133
2/13/2017
OBJECTIVE 1
Use dotplots to assess normality
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assessing Normality
Many statistical procedures require that we draw a sample from a
population whose distribution is approximately normal. Often we don’t
know whether the population is approximately normal when we draw
the sample. So the only way we have to assess whether the population
is approximately normal is to examine the sample.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
134
2/13/2017
Assessing Normality
We will reject the assumption that a population is approximately normal
if a sample has any of the following features:
If the sample has none of the preceding features, we will treat the
population as being approximately normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The accuracy of an oven thermostat is being tested. The oven is set to 360
degrees (F), and the temperature when the thermostat turns off is recorded. A
sample of size 7 yields the following results:
358 363 361 355 367 352 368
Is it reasonable to treat this as a sample from an approximately normal
population? Explain.
Solution:
The dotplot does not reveal any outliers.
The plot does not exhibit a large degree of skewness, and there is no evidence
that the population has more than one mode. Therefore, we can treat this as a
sample from an approximately normal population.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
135
2/13/2017
Example
At a recent health fair, several hundred people had their pulse rates measured.
A simple random sample of six records was drawn, and the pulse rates, in
beats per minute, were
68 71 79 98 67 75
Is it reasonable to treat this as a sample from an approximately normal
population? Explain.
Solution:
Using the dotplot, it is clear that the value 98 is an outlier. Therefore, we
should not treat this as a sample from an approximately normal population.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Use boxplots to assess normality
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
136
2/13/2017
Example
An insurance adjuster obtains a sample of 20 estimates, in hundreds of dollars,
for repairs to cars damaged in collisions. Following are the data.
12.1 15.7 14.2 4.6 8.2 11.6 12.9 11.2 14.9 13.7
6.6 7. 2 12.6 9.0 11.9 7.8 9.0 16.2 16.5 12.1
Is it reasonable to treat this as a sample from an approximately normal
population? Explain.
Solution:
A boxplot reveals that there are no outliers. Although the median is not exactly
halfway between the quartiles, the skewness is not great. Therefore, we may
treat this as a sample from an approximately normal population.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
A recycler determines the amount of recycled newspaper, in cubic feet, collected each
week. Following are the results for a sample of 18 weeks.
2129 2853 2530 2054 2075 2011 2162 2285 2668
3194 4834 2469 2380 2567 4117 2337 3179 3157
Is it reasonable to treat this as a sample from an approximately normal population?
Explain.
Solution:
A boxplot reveals that the value 4834 is an outlier. In addition, the upper whisker is much
longer than the lower one, which indicates fairly strong skewness. Therefore, we should not
treat this as a sample from an approximately normal population.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
137
2/13/2017
OBJECTIVE 3
Use histograms to assess normality
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Diameters were measured, in millimeters, for a simple random sample of 20
grade A eggs from a certain farm. The results were
59 60 60 56 59 56 62 58 60 59
61 59 61 61 63 60 56 58 63 58
Is it reasonable to treat this as a sample from an approximately normal
population? Explain.
Solution:
The relative histogram does not reveal
any outliers, nor does it exhibit a large
degree of skewness. There is no evidence
that the population has more than one mode.
Therefore, we can treat this as a sample from
an approximately normal population.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
138
2/13/2017
Example
A shoe manufacturer is testing a new type of leather sole. A simple random
sample of 22 people wore shoes with the new sole for a period of four months.
The amount of wear on the right shoe was measured for each person. The
results, in thousandths of an inch, were
24.1 2.2 11.8 2.7 4.1 13.9 33.6 2.4 36.2 16.8 5.4
4.6 4.5 4.1 6.1 6.3 22.6 29.1 12.2 4.6 15.8 7.7
Is it reasonable to treat this as a sample from an approximately normal
population? Explain.
Solution:
The relative frequency histogram reveals
that the sample is strongly skewed to the
right. We should not treat this as a sample
from an approximately normal population.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 4
Use stem-and-leaf plots to assess normality
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
139
2/13/2017
Example
A psychologist measures the time it takes for each of 20 rats to run a maze.
The times, in seconds, are
54 48 49 54 63 54 66
32 45 52 41 37 56 56
52 53 41 45 48 43
Construct a stem-and-leaf plot for these data. Is it reasonable to treat this as a
random sample from an approximately normal population?
Solution:
The stem-and-leaf plot reveals no outliers, strong skewness,
or multimodality. We may treat this as a sample from an
approximately normal population.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 5
Use normal quantile plots to assess normality
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
140
2/13/2017
A simple random sample of size 𝑛 = 5 is drawn, and we want to determine whether the
population it came from is approximately normal. The five sample values, in increasing
order, are 3.0 3.3 4.8 5.9 7.8.
Step 1: Let 𝑛 be the number of values in the data set. Spread the 𝑛 values evenly over
1
the interval from 0 to 1. This is done by assigning the value to the first sample
2𝑛
3
value, to the second, and so forth. The last sample value will be assigned the
2𝑛
2𝑛−1
value . These values, denoted 𝑎𝑖 , represent areas under the normal curve.
2𝑛
For 𝑛 = 5, the values are 0.1, 0.3, 0.5, 0.7, and 0.9.
𝑖 1 2 3 4 5
𝑥𝑖 3.0 3.3 4.8 5.9 7.8
𝑎𝑖 0.1 0.3 0.5 0.7 0.9
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
141
2/13/2017
Step 4: For Data List, select L1, and for Data Axis,
choose the X option.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The points on a quantile plot do not
closely follow a straight line. The
distribution is not approximately normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
142
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 8
Confidence Intervals
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
143
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Construct and interpret confidence intervals for a population mean
when the population standard deviation is known
2. Find critical values for confidence intervals
3. Describe the relationship between the confidence level and the
margin of error
4. Find the sample size necessary to obtain a confidence interval of a
given width
5. Distinguish between confidence and probability
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
144
2/13/2017
OBJECTIVE 1
Construct and interpret confidence intervals for a
population mean when the population standard
deviation is known
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The sample mean score for the 100 students was 𝑥 = 67.30. The administrators want to
estimate what the mean score would be if the entire population of fourth-graders in the
district had enrolled in the program. The best estimate for the population mean is the
sample mean, 𝑥 = 67.30. The sample mean is a point estimate, because it is a single
number.
It is very unlikely that 𝑥 = 67.30 is exactly equal to the population mean, 𝜇, of all fourth-
graders. Therefore, in order for the estimate to be useful, we must describe how close it
is likely to be. For example, if we think that it could be off by 10 points, we would estimate
𝜇 with the interval 57.30 < 𝜇 < 77.30, which could be written 67.30 ± 10. The plus-or-
minus number is called the margin of error.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
145
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Confidence Level
Based on the sample of 100 fourth-graders
using the new approach to teaching reading, a
95% confidence interval for the mean score
was constructed.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
146
2/13/2017
Terminology
A point estimate is a single number that is used to estimate the value
of an unknown parameter.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Find critical values for confidence intervals
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
147
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
From the bottom row of Table A.3, we see
that the critical value for a 90% confidence
interval is 1.645, so the margin of error is
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
148
2/13/2017
𝑧𝛼 Notation
Sometimes, we may need to find a critical value for a confidence level not given
in Table A.3. To do this, it is useful to learn a notation for a 𝑧-score with a given
area to its right.
𝒛𝜶 Notation
• The notation 𝑧𝛼 refers to the 𝑧-score with an area of 𝛼 to its right.
• The notation 𝑧𝛼/2 refers to the 𝑧-score with an area of 𝛼/2 to its right.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The confidence level is 92%, so 1 − 𝛼 = 0.92.
It follows that 𝛼 = 0.08, or 𝛼 2 = 0.04. The
critical value is 𝑧0.04. Since the area to the
right of 𝑧0.04 is 0.04, the area to the left is
1 – 0.04 = 0.96.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
149
2/13/2017
Assumptions
The method described for confidence interval requires us to assume that the
population standard deviation 𝜎 is known. In practice, 𝜎 is not known. We make
this assumption because it allows us to use the familiar normal distribution. We
will learn how to construct confidence intervals when 𝜎 is unknown in the next
section.
Other assumptions for the method described here for constructing confidence
intervals are:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
150
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
A machine that fills cereal boxes is supposed to put 20 ounces of cereal in each box. A
simple random sample of 6 boxes is found to contain a sample mean of 20.25 ounces of
cereal. It is known from past experience that the fill weights are normally distributed with
a standard deviation of 0.2 ounce. Construct a 90% confidence interval for the mean fill
weight.
Solution:
We press STAT and highlight the TESTS menu and select
Zinterval.
Select Stats as the input option and enter 0.2 for the 𝜎 field,
20.25 for the 𝑥 field, 6 for the 𝑛 field, and 0.9 for the C-Level
field.
Select Calculate.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
151
2/13/2017
OBJECTIVE 3
Describe the relationship between the confidence
level and the margin of error
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
152
2/13/2017
OBJECTIVE 4
Find the sample size necessary to obtain a
confidence interval of a given width
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
153
2/13/2017
Sample Size
We can make the margin of error smaller if we are willing to reduce our
level of confidence, but we can also reduce the margin of error by
increasing the sample size.
𝜎
If we let 𝑚 represent the margin of error, then 𝑚 = 𝑧𝛼 2 · 𝑛
.
𝑧 ∙𝜎 2
Using algebra, we may rewrite this formula as 𝑛 = 𝛼𝑚2 which
represents the minimum sample size needed to achieve the desired
margin of error 𝑚.
Example
Scientists want to estimate the mean weight of mice after they have
been fed a special diet. From previous studies, it is known that the
weight is normally distributed with standard deviation 3 grams. How
many mice must be weighed so that a 95% confidence interval will have
a margin of error of 0.5 grams?
Solution:
Since we want a 95% confidence interval, we use 𝑧𝛼 2 = 1.96. We also
know 𝜎 = 3 and 𝑚 = 0.5. Therefore:
𝑧𝛼 2 ∙𝜎 2 1.96 ∙ 3 2
𝑛= 𝑚
= 0.5
= 138.30; round up to 139
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
154
2/13/2017
OBJECTIVE 5
Distinguish between confidence and probability
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The term “probability” refers to random events, which can come out differently
when experiments are repeated. The numbers 20.12 and 20.38 are fixed, not
random. The population mean is also fixed, even if we do not know precisely
what value it is. The population mean weight is either between 20.12 and
20.38 or it is not. Therefore we say that we have 90% confidence that the
population mean is in this interval.
On the other hand, let’s say that we are discussing a method used to construct
a 90% confidence interval. The method will succeed in covering the population
mean 90% of the time, and fail the other 10% of the time. Therefore it is
correct to say that a method for constructing a 90% confidence interval has
probability 90% of covering the population mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
155
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
156
2/13/2017
Objectives
1. Describe the properties of the Student’s 𝑡 distribution
2. Construct confidence intervals for a population mean when the
population standard deviation is unknown
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Describe the properties of the Student’s 𝑡
distribution
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
157
2/13/2017
Student’s 𝑡 Distribution
When constructing a confidence interval where we know the population standard deviation
𝜎
𝜎, the confidence interval is 𝑥 ± 𝑧𝛼/2 ∙ . The critical value is 𝑧𝛼/2 because the quantity
𝑛
𝑥 −𝜇
has a normal distribution.
𝜎 𝑛
It is rare that we would know the value of 𝜎 while needing to estimate the value of 𝜇. In
practice, it is more common that 𝜎 is unknown. When we don’t know the value of 𝜎, we
may replace it with the sample standard deviation 𝑠. However, we cannot then use 𝑧𝛼 2
𝑥 −𝜇
as the critical value, because the quantity does not have a normal distribution. The
𝑠 𝑛
distribution of this quantity is called the Student’s 𝒕 distribution.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Student’s 𝑡 Distribution
Student’s 𝑡 distributions are symmetric and unimodal, just like the normal distribution.
However, they are more spread out. The reason is that 𝑠 is, on the average, a bit smaller
than 𝜎. Also, since 𝑠 is random, whereas 𝜎 is constant, replacing 𝜎 with 𝑠 increases the
spread. When the number of degrees of freedom is small, the tendency to be more spread
out is more pronounced. When the number of degrees of freedom is large, 𝑠 tends to be
close to 𝜎, so the 𝑡 distribution is very close to the normal distribution.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
158
2/13/2017
The critical value 𝑡𝛼 2 can be found in Table A.3, in the row corresponding
to the number of degrees of freedom and the column corresponding to
the desired confidence level or by technology.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
A simple random sample of size 10 is drawn from a normal population. Find
the critical value 𝑡𝛼 2 for a 95% confidence interval.
Solution:
The sample size is 𝑛 = 10, so the number of degrees of freedom is 𝑛 – 1 = 9.
We consult Table A.3, looking in the row corresponding to 9 degrees of
freedom, and in the column with confidence level 95%. The critical value is
𝑡𝛼 2 = 2.262.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
159
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assumptions
The assumptions for constructing a confidence interval for 𝜇 when 𝜎 is unknown
are:
Assumptions:
When the sample size is small (𝑛 ≤ 30), we must check to determine whether the
sample comes from a population that is approximately normal. A simple method
is to draw a dotplots or boxplot of the sample. If there are no outliers, and if the
sample is not strongly skewed, then it is reasonable to assume the population is
approximately normal and it is appropriate to construct a confidence interval using
the Student’s 𝑡 distribution.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
160
2/13/2017
OBJECTIVE 2
Construct confidence intervals for a population
mean when the population standard deviation is
unknown
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
161
2/13/2017
Solution:
We check the assumptions. We have a simple random sample. Because the sample
size is small, the population must be approximately normal. A dotplot indicates that there
is no evidence of strong skewness and no outliers, therefore we may proceed.
Now, we find the sample mean and sample standard deviation. We have 𝑥 = 115.375
and 𝑠 = 2.8253.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑠 2.853
The margin of error is: 𝑡𝛼/2 ∙ = 2.998 ∙ = 2.9947.
𝑛 8
𝑠
The 98% confidence interval is : 𝑥 ± 𝑡𝛼 2 ∙ = 115.375 ± 2.9947 or 112.4 < 𝜇 < 118.4.
𝑛
We are 98% confident that the mean number of calories per cookie is between 112.4 and
118.4.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
162
2/13/2017
Solution:
We have a simple random sample and the sample size is large. We may proceed. Note
that 𝑥 = 8.20 and 𝑠 = 9.84.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑠
The 96% confidence interval is : 𝑥 ± 𝑡𝛼 2 ∙ = 8.20 ± 1.7603 or 6.44 < 𝜇 < 9.96.
𝑛
We are 95% confident that the mean number of hours per week spent on the Internet by
people 18 – 22 years old is between 6.44 and 9.96.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
163
2/13/2017
OBJECTIVE 2
Construct confidence intervals for a population
mean when the population standard deviation is
unknown
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
164
2/13/2017
We enter the data into list L1 in the data editor. Then we press STAT
and highlight the TESTS menu and select Tinterval.
Select Data as the input method and enter L1 in the List field, 1 in the
Freq field, and 0.98 for the C-Level field. Select Calculate.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We press STAT and highlight the TESTS menu and select Tinterval.
Select Stats as the input method and enter 8.2 for the 𝑥 field, 9.84
for the 𝑠 field, 123 for the 𝑛 field, and 0.95 for the C-Level field.
Select Calculate.
We are 95% confident that the mean number of hours per week spent
on the Internet by people 18 – 22 years old is between 6.44 and 9.96.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
165
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
166
2/13/2017
Objectives
1. Construct a confidence interval for a population proportion
2. Find the sample size necessary to obtain a confidence interval of a
given width
3. Describe a method for constructing confidence intervals with small
samples
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Construct a confidence interval for a population
proportion
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
167
2/13/2017
Guitar Hero
The music organization Little Kids Rock
surveyed 517 music teachers, and 403 of them
said that video games like Guitar Hero and
Rock Band, in which players try to play music in
time with a video image, have a positive effect
on music education.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Notation
We use the following notation:
• 𝑝 is the population proportion of individuals who are in a specified
category.
• 𝑥 is the number of individuals in the sample who are in the
specified category.
• 𝑛 is the sample size.
• 𝑝 is the sample proportion of individuals who are in the specified
𝑥
category. 𝑝 is defined as 𝑝 = 𝑛 .
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
168
2/13/2017
Confidence Interval
To construct a confidence interval, we need a point estimate. The point estimate for the
population proportion 𝑝 is:
𝒙
Point estimate = 𝒑 =
𝒏
We also need the standard error of 𝑝. By the Central Limit Theorem for Proportions, we
have:
𝒑(𝟏−𝒑)
Standard error of 𝒑 =
𝒏
To compute the margin of error, we multiply the standard error by the critical value:
𝒑(𝟏−𝒑)
Margin of error = 𝒛𝜶/𝟐 ∙
𝒏
Assumptions
The method for constructing a confidence interval for a population proportion
requires that the sampling distribution be approximately normal. The following
assumptions ensure this:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
169
2/13/2017
Solution:
We begin by checking the assumptions. We have a simple
random sample. It is reasonable to believe that the
population of music teachers is at least 20 times as large as
the sample. The items in the population can be divided into
two categories. There are 403 teachers who believe that
the games have a positive effect, and 517 − 403 = 114 who
do not, so there are 10 or more items in each category. The
assumptions are met.
𝑥 403
Note that 𝑛 = 517 and 𝑥 = 403, so 𝑝 = = = 0.779497.
𝑛 517
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
We are 95% confident that the proportion of music teachers who believe that the video
games have a positive effect is between 0.744 and 0.815.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
170
2/13/2017
OBJECTIVE 1
Construct a confidence interval for a population
proportion
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Guitar Hero
The music organization Little Kids Rock
surveyed 517 music teachers, and 403 of them
said that video games like Guitar Hero and
Rock Band, in which 442 players try to play
music in time with a video image, have a
positive effect on music education.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
171
2/13/2017
Notation
We use the following notation:
• 𝑝 is the population proportion of individuals who are in a specified
category.
• 𝑥 is the number of individuals in the sample who are in the
specified category.
• 𝑛 is the sample size.
• 𝑝 is the sample proportion of individuals who are in the specified
𝑥
category. 𝑝 is defined as 𝑝 = 𝑛 .
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Confidence Interval
To construct a confidence interval, we need a point estimate. The point estimate for the
population proportion 𝑝 is:
𝒙
Point estimate = 𝒑 =
𝒏
We also need the standard error of 𝑝. By the Central Limit Theorem for Proportions, we
have:
𝒑(𝟏−𝒑)
Standard error of 𝒑 =
𝒏
To compute the margin of error, we multiply the standard error by the critical value:
𝒑(𝟏−𝒑)
Margin of error = 𝒛𝜶/𝟐 ∙
𝒏
172
2/13/2017
Assumptions
The method for constructing a confidence interval for a population proportion
requires that the sampling distribution be approximately normal. The following
assumptions ensure this:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
173
2/13/2017
Solution:
We begin by checking the assumptions. We have a simple
random sample. It is reasonable to believe that the
population of music teachers is at least 20 times as large as
the sample. The items in the population can be divided into
two categories. There are 403 teachers who believe that
the games have a positive effect, and 517 − 403 = 114 who
do not, so there are 10 or more items in each category. The
assumptions are met.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Select Calculate.
We are 95% confident that the proportion of music teachers who believe that the video
games have a positive effect is between 0.744 and 0.815.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
174
2/13/2017
OBJECTIVE 2
Find the sample size necessary to obtain a
confidence interval of a given width
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑝(1−𝑝)
Let 𝑚 represent the margin of error: 𝑚 = 𝑧𝛼 2 ∙ . By rewriting this formula and
𝑛
𝒛𝜶 𝟐 𝟐
solving for 𝑛, we have 𝒏 = 𝒑(𝟏 − 𝒑) . This is the minimum sample size needed to
𝒎
attain a margin of error of size 𝑚. If the value of 𝑛 is not a whole number, round up to the
nearest whole number.
In order to use this formula, we need a value for 𝑚 and 𝑝. We can set the value of 𝑚, but
we don’t know ahead of time what 𝑝 is going to be.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
175
2/13/2017
Solution:
From the bottom row of Table A.3, or by
technology, we see that the critical value for
a 95% confidence interval is 1.96.
𝑥 403
We compute 𝑝 = 𝑛 = 517 = 0.779497. The desired margin of error is 𝑚 = 0.03.
The necessary sample size is
𝑧𝛼 2 2
𝑛 = 𝑝(1 − 𝑝)
𝑚
1.96 2
= (0.779497)(1 − 0.779497) = 733.67
0.03
We round up to 734.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
From the bottom row of Table A.3, or by
technology, we see that the critical value for
a 95% confidence interval is 1.96.
𝑧𝛼 2 2
Since we have no estimate of 𝑝, use the formula 𝑛 = 0.25 . The desired
𝑚
margin of error is 𝑚 = 0.03. The necessary sample size is
𝑧𝛼 2 2 1.96 2
𝑛 = 0.25 𝑚
= 0.25 0.03
= 1067.1
We round up to 1068.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
176
2/13/2017
OBJECTIVE 3
Describe a method for constructing confidence
intervals with small samples
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
177
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
178
2/13/2017
Solution:
We press STAT and highlight the TESTS menu and
select 1-PropZInt.
Select Calculate.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
179
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
180
2/13/2017
Objectives
1. Determine which method to use when constructing a confidence
interval
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Determine which method to use when
constructing a confidence interval
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
181
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
182
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 9
Hypothesis Testing
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
183
2/13/2017
BASIC PRINCIPLES OF
HYPOTHESIS TESTING
Section 9.1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Define the null and alternate hypotheses
2. State conclusions to hypothesis tests
3. Distinguish between Type I and Type II errors
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
184
2/13/2017
OBJECTIVE 1
Define the null and alternate hypotheses
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
185
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The null hypothesis says that there is no difference, so 𝐻0 : 𝜇 = 20.
The inspector thinks that the mean weight may be less than 20, so
𝐻1: 𝜇 < 20.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
186
2/13/2017
Solution:
The null hypothesis says that there is no change, so the null hypothesis
is 𝐻0 : 𝜇 = 800. The real estate agent wants to know whether the mean
is higher, so the alternate hypothesis is 𝐻1 : 𝜇 > 800.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The null hypothesis says that there is no change, so the null hypothesis
is 𝐻0 : 𝜇 = 70. The educator wants to know whether the mean has
changed, without specifying whether it has increased or decreased.
Therefore, the alternate hypothesis is 𝐻1 : 𝜇 ≠ 70.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
187
2/13/2017
The idea behind a hypothesis test is similar to a criminal trial. At Assume the Null
the beginning of a trial, the defendant is assumed to be innocent. Hypothesis is
Then the evidence is presented. If the evidence strongly indicates True
that the defendant is guilty, we abandon the assumption of
innocence and conclude the defendant is guilty. In a hypothesis
test, the null hypothesis is like the defendant in a criminal trial.
Then we look at the evidence, which comes from data that have
been collected.
If the data strongly indicate that the null hypothesis is false, we Decide Whether
abandon our assumption that it is true and believe the alternate to Reject the Null
hypothesis instead. This is referred to as rejecting the null Hypothesis
hypothesis.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
State conclusions to hypothesis tests
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
188
2/13/2017
Stating Conclusions
We may either reject the null hypothesis or fail to reject the null
hypothesis.
Null Hypothesis
Fail to H0: Null Hypothesis “might”
Reject H0 Accept thebe true
Alternate
Reject H0 H1: Alternate Hypothesis Hypothesis
If the null hypothesis is not rejected, we are saying that there is not
enough evidence to conclude that the alternate hypothesis, 𝐻1, is true.
We are not saying the null hypothesis is true. What we are saying is
that the null hypothesis might be true.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
Because the null hypothesis is rejected, we conclude that the alternate
hypothesis is true. We conclude that the mean weight of cereal is less
than 20 ounces.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
189
2/13/2017
Solution:
Because the null hypothesis is not rejected, we do not have sufficient
evidence to conclude that the alternate hypothesis is true. In words, we
state: “There is not enough evidence to conclude that the mean weight
of cereal boxes is less than 20 ounces.”
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 3
Distinguish between Type I and Type II errors
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
190
2/13/2017
There are two ways in which a wrong decision may occur with hypothesis
testing.
1. If 𝐻0 is true, we might mistakenly reject it. A type I error occurs when
we reject 𝐻0 when it is actually true.
2. If 𝐻0 is false, we might mistakenly decide not to reject it. A type II error
occurs when we do not reject 𝐻0 when it is actually false.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The true mean is 𝜇 = $50, 000, so 𝐻0 is true. Because the dean rejects
𝐻0, this is a Type I error.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
191
2/13/2017
Solution:
The true mean is 𝜇 = $55, 000, so 𝐻0 is false. Because the dean
rejects 𝐻0, this is a correct decision.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
The true mean is 𝜇 = $55, 000, so 𝐻0 is false. Because the dean does
not reject 𝐻0, this is a Type II error.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
192
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
193
2/13/2017
Objectives
1. Perform hypothesis tests with the critical value method
2. Perform hypothesis tests with the P-value method
3. Describe the relationship between hypothesis tests and confidence
intervals
4. Describe the relationship between 𝛼 and the probability of error
5. Report the P-value or the test statistic value
6. Distinguish between statistical significance and practical
significance
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Perform hypothesis tests with the critical value
method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
194
2/13/2017
There are two ways to perform hypothesis tests; both methods produce the
same results. The methods are the Critical Value Method and the P-
Value Method.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
We begin with an example. The College
Board reported that the mean math SAT
score in 2009 was 515, with a standard
deviation of 116. Results of an earlier study
suggest that coached students should have
a mean SAT score of approximately 530.
A teacher who runs an online coaching
program thinks that students coached by his
method have a higher mean score than this.
Because the teacher believes that the mean score for his students is
greater than 530, the null and alternate hypotheses are:
𝐻0: 𝜇 = 530
𝐻1: 𝜇 > 530
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
195
2/13/2017
Test Statistic
Suppose now that the teacher draws a random sample Remember
of 100 students who are planning to take the SAT, and 𝐻0 : 𝜇 = 530
enrolls them in the online coaching program. After 𝐻1 : 𝜇 > 530
completing the program, their sample mean SAT score 𝜎 = 116
is 𝑥 = 562. This is higher than the null hypothesis value
of 530, but to determine how strong the disagreement is between the sample
mean and the null hypothesis 𝜇 = 530, we calculate the value of the test
statistic, which is just the 𝑧-score of the sample mean. Assuming that the
population standard deviation is 𝜎 = 116, the test statistic of the sample mean,
𝑥 −𝜇 562 −530
𝑥 is 𝑧 = 𝜎 𝑛0 = 116/ 100 = 2.76.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
196
2/13/2017
Significance Level
One of the most commonly used values for 𝛼 is 0.05. When 𝛼 = 0.05 is used in a
one-tailed test, the critical value is 1.645.
Recall for the math SAT score example, that the test statistic was 𝑧 = 2.76.
Since 𝑧 = 2.76 falls in the critical region, 𝐻0 is rejected at the 𝛼 = 0.05 level.
Remember
𝐻0 : 𝜇 = 530
𝐻1 : 𝜇 > 530
𝑥 = 562
𝜎 = 116
𝑛 = 100
𝑧 = 2.76
We conclude that the mean SAT math score for students completing the online
coaching program is greater than 530.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
197
2/13/2017
The assumptions for performing a hypothesis test about 𝜇 when 𝜎 is known are:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We first check the assumptions. The sample is a simple random sample and the
sample size is large (𝑛 > 30). The assumptions are met.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
198
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Perform hypothesis tests with the P-value
method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
199
2/13/2017
The P-Value
The P-value is the probability that a number drawn from the distribution
of the sample mean would be as extreme as or more extreme than our
observed value of 𝑥.
Unlike the critical value, the P-value tells us exactly how unusual the test
statistic is. For this reason, the P-value method is more often used in
practice, especially when technology is used to conduct a hypothesis
test.
The smaller the P-value, the stronger the evidence against 𝐻0.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Finding P-Values
Consider again the following example. An online coaching program is supposed to increase
the mean SAT math score to a value greater than 530. The null and alternate hypotheses are
𝐻0 : 𝜇 = 530 & 𝐻1 : 𝜇 > 530. Now assume that 100 students are randomly chosen to participate
in the program, and their sample mean score is 𝑥 = 562. Suppose that the population
standard deviation is known to be 𝜎 = 116. Does this provide strong evidence against the null
hypothesis 𝐻0 : 𝜇 = 530?
Recall that we begin by assuming that 𝐻0 is true, therefore we assume that the mean of 𝑥 is 𝜇
= 530. Since the sample size is large, we know that 𝑥 is approximately normally distributed
𝜎 116
with standard deviation = = 11.6.
𝑛 100
𝑥 − 𝜇0 562 −530
The 𝑧-score for 𝑥 is is 𝑧 = = = 2.76.
𝜎 𝑛 116/ 100
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
200
2/13/2017
Example:
Suppose we found a P-value that was P = 0.0122.
a) Do you reject 𝐻0 at 𝛼 = 0.05 level?
b) Do you reject 𝐻0 at 𝛼 = 0.01 level?
Solution:
a) Because P ≤ 0.05, we reject 𝐻0 at the 𝛼 = 0.05 level.
b) Because P > 0.01, we do not reject 𝐻0 at the 𝛼 = 0.01 level.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We first check the assumptions. We have a simple random sample and the
sample size is large (𝑛 > 30). The assumptions are satisfied.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
201
2/13/2017
Using Table A.2 or technology, we find the area to the right of 𝑧 = 0.67 to be 0.2514.
The P-value of 0.2514 is not unusual. Since P > 0.05, we do not reject 𝐻0 at the 𝛼 = 0.05
level.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We first check the assumptions. We have a simple random sample and the sample
size is large (𝑛 > 30). The assumptions are satisfied.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
202
2/13/2017
Using Table A.2 or technology, we find the sum of these areas to be 0.0250. Since P ≤ 0.05,
we reject 𝐻0 at the 𝛼 = 0.05 level.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
203
2/13/2017
Solution:
We first check the assumptions. We have a simple random sample, the sample
size is large (𝑛 > 30), and the population standard deviation is known. The
assumptions are satisfied.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Select Stats as the input option and enter 74 as the null hypothesis
mean 𝜇0 , 8 for the standard deviation 𝜎, 76 for the sample
mean 𝑥 , and 80 for the sample size 𝑛 . Since we have a two-
tailed test, select the ≠ 𝝁𝟎 option.
Select Calculate.
We conclude that the mean score among employees has changed since the adoption of
telecommuting.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
204
2/13/2017
OBJECTIVE 3
Describe the relationship between hypothesis
tests and confidence intervals
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
We note that this interval does not contain the null hypothesis value of
74. The confidence interval for 𝜇 contains all the plausible values for 𝜇 .
Because 74 is not in the confidence interval, 74 is not a plausible value
for 𝜇.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
205
2/13/2017
OBJECTIVE 4
Describe the relationship between 𝛼 and the
probability of error
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
206
2/13/2017
OBJECTIVE 5
Report the P-value or the test statistic value
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
207
2/13/2017
This will tell the reader whether the value of the test statistic was just
barely inside the critical region, or well inside. Also, it provides the
reader an opportunity to choose a different critical value and determine
whether 𝐻0 can be rejected at a different level.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 6
Distinguish between statistical significance and
practical significance
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
208
2/13/2017
For example, a new study program may raise students’ scores by two
points on a 100 point scale. This improvement may have statistical
significance, but is the improvement important enough to offset the cost
of training teachers and the time investment on behalf of the students.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
209
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Test a hypothesis about a mean using the P-value method
2. Test a hypothesis about a mean using the critical value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
210
2/13/2017
OBJECTIVE 1
Test a hypothesis about a mean using the P-
value method
(Tables)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
If we knew the population standard deviation 𝜎, we would be able to compute the 𝑧-score
𝑥 −𝜇
of the sample mean to be 𝑧 = , and use this test statistic to perform a hypothesis
𝜎/ 𝑛
test. In this example, as is usually the case, we do not know the population standard
deviation. To proceed, we replace 𝜎 with the sample standard deviation 𝑠, and use the 𝑡
𝑥 −𝜇
test statistic instead: 𝑡 = . When the null hypothesis is true, the 𝑡 statistic has a
𝑠/ 𝑛
Student’s 𝑡 distribution with 𝑛 − 1 degrees of freedom.
The assumptions for performing a hypothesis test for 𝜇 when the population standard
deviation 𝜎 is unknown are as follows:
1. We have a simple random sample.
2. The sample size is large (𝑛 > 30), or the population is approximately normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
211
2/13/2017
Since we have a simple random sample and the sample size is large, we may proceed with
the test. The issue is whether the mean weight loss 𝜇 is greater than 0. So the null and
alternate hypotheses are 𝐻0 : 𝜇 = 0 versus 𝐻1 : 𝜇 > 0.
𝑥 −𝜇 2.2−0
The test statistic is 𝑡 = = = 3.144. When 𝐻0
𝑠/ 𝑛 6.1/ 76
is true, the test statistic 𝑡 has the Student’s 𝑡 distribution
with 𝑛 − 1 = 76 − 1 = 75 degrees of freedom. This is a
right tail test, so the P-value is the area under the Student’s
𝑡 curve to the right of 𝑡 = 3.144. Using technology, we find
the exact P-value to be P = 0.0012.
Since P < 0.05, we reject 𝐻0 at the 𝛼 = 0.05 level. We conclude that the mean weight loss
of people who adhered to this diet for 12 months is greater than 0.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Computing P-Values
The P-value of the test statistic 𝑡 is the probability, assuming 𝐻0 is true, of
observing a value for the test statistic that disagrees as strongly as or more
strongly with 𝐻0 than the value actually observed. The P-value is an area
under the Student’s 𝑡 curve with 𝑛 − 1 degrees of freedom. The area is in the
left tail, the right tail, or in both tails, depending on the type of alternate
hypothesis.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
212
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
213
2/13/2017
The null and alternate hypotheses are: 𝐻0 : 𝜇 = 3.5 versus 𝐻1 : 𝜇 ≠ 3.5. We compute 𝑥
and 𝑠 from the sample. The values are 𝑥 = 2.9429 and 𝑠 = 0.4995.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
There is not enough evidence to conclude that the mean amount of drug absorbed differs
from 3.5 micrograms. The mean may be equal to 3.5 micrograms.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
214
2/13/2017
OBJECTIVE 1
Test a hypothesis about a mean using the P-
value method
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
If we knew the population standard deviation 𝜎, we would be able to compute the 𝑧-score
𝑥 −𝜇
of the sample mean to be 𝑧 = , and use this test statistic to perform a hypothesis
𝜎/ 𝑛
test. In this example, as is usually the case, we do not know the population standard
deviation. To proceed, we replace 𝜎 is the sample standard deviation 𝑠, and use the 𝑡
𝑥 −𝜇
test statistic instead: 𝑡 = . When the null hypothesis is true, the 𝑡 statistics has a
𝑠/ 𝑛
Student’s 𝑡 distribution with 𝑛 − 1 degrees of freedom.
The assumptions for performing a hypothesis test for 𝜇 when the population standard
deviation 𝜎 is unknown are as follows:
1. We have a simple random sample.
2. The sample size is large (𝑛 > 30), or the population is approximately normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
215
2/13/2017
Since we have a simple random sample and the sample size is large, we may proceed with
the test. The issue is whether the mean weight loss 𝜇 is greater than 0. So the null and
alternate hypotheses are 𝐻0 : 𝜇 = 0 & 𝐻1 : 𝜇 > 0.
𝑥 −𝜇 2.2−0
The test statistic is 𝑡 = = = 3.144. When 𝐻0
𝑠/ 𝑛 6.1/ 76
is true, the test statistic 𝑡 has the Student’s 𝑡 distribution
with 𝑛 − 1 = 76 − 1 = 75 degrees of freedom. This is a
right tail test, so the P-value is the area under the Student’s
𝑡 curve to the right of 𝑡 = 3.144. Using technology, we find
the exact P-value to be P = 0.0012.
Since P < 0.05, we reject 𝐻0 at the 𝛼 = 0.05 level. We conclude that the mean weight loss
of people who adhered to this diet for 12 months is greater than 0.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
216
2/13/2017
Solution:
• We press STAT and highlight the TESTS menu and select T-Test.
• Select Stats as the input option and enter 0 as the null hypothesis
mean 𝜇0 , 2.2 for the sample mean 𝑥, 6.1 for the sample standard
deviation s, and 76 for the sample size n. Since we have a right-
tailed test, select the > 𝝁𝟎 option.
• Select Calculate.
The P-value is 0.0012. Since P < 0.05, we reject 𝐻0 . We conclude that the mean weight
loss of people who adhered to this diet for 12 months is greater than 0.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We first check the assumptions. Because the sample is small, the population must be
approximately normal. We check this with a dotplot of the data. There is no evidence of
strong skewness, and no outliers. We may proceed.
217
2/13/2017
• Press STAT and highlight the TESTS menu and select T-Test.
• Select Data as the input option and enter 3.5 in the 𝝁𝟎 field.
Enter L1 as the List option and 1 as the Freq option.
Since we have a two-tailed test, select the ≠ 𝝁𝟎 option.
• Select Calculate.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Test a hypothesis about a mean using the critical
value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
218
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
A computer software vendor claims that a new version of their
operating system will crash less than six times per year on average. A
system administrator installs the operating system on a random sample
of 41 computers. At the end of a year, the sample mean number of
crashes is 7.1, with a standard deviation of 3.6. Can you conclude that
the vendor’s claim is false? Use the 𝛼 = 0.05 significance level.
Solution:
We first check the assumptions. We have a large (𝑛 > 30) random
sample, so the assumptions are satisfied.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
219
2/13/2017
Solution
We use a significance level of 𝛼 = 0.05 and Table A.3. The number of
degrees of freedom is 41 − 1 = 40. Since this is a right-tailed test, the
critical value is the 𝑡-value with area 0.05 above it in the right tail. Thus,
the critical value is 𝑡𝛼 = 1.684.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Because this is a right-tailed test, we reject 𝐻0 if t ≥ 𝑡𝛼 . Since 𝑡 = 1.957
and 𝑡𝛼 = 1.684, we reject 𝐻0.
We conclude that the mean number of crashes is greater than six per
year.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
220
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
221
2/13/2017
Objectives
1. Test a hypothesis about a proportion using the P-value method
2. Test a hypothesis about a proportion using the critical value
method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Test a hypothesis about a proportion using the P-
value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
222
2/13/2017
Introduction
In a recent GenX2Z American College Student Survey, 90% of female
college students rated the social network site Facebook as “cool.” The
other 10% rated it as “lame.” Assume that the survey was based on a
sample of 500 students. A marketing executive at Facebook wants to
advertise the site with the slogan “More than 85% of female college
students think Facebook is cool.” Before launching the ad campaign,
he wants to be confident that the slogan is true. Can he conclude that
the proportion of female college students who think Facebook is cool is
greater than 0.85?
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Notation
We use the following notation:
• 𝑝 is the population proportion of individuals who are in a specified
category.
• 𝑥 is the number of individuals in the sample who are in the
specified category.
• 𝑛 is the sample size.
• 𝑝 is the sample proportion of individuals who are in the specified
𝑥
category. 𝑝 is defined as 𝑝 = 𝑛 .
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
223
2/13/2017
Assumptions
The method for performing a hypothesis test about a population proportion
requires that the sampling distribution be approximately normal. The following
assumptions ensure this:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
P-Value Method
Step 1: State the null and alternate hypotheses.
Step 2: If making a decision, choose a significance level 𝛼.
𝑝−𝑝0
Step 3: Compute the test statistic 𝑧 = .
𝑝0 1−𝑝0
𝑛
Step 5: Interpret the P-value. If making a decision, reject 𝐻0 if the P-value is less
than or equal to the significance level 𝛼.
Step 6: State a conclusion.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
224
2/13/2017
Example
In a recent GenX2Z American College Student Survey, 90% of female college students
rated the social network site Facebook as “cool.” Assume that the survey was based on
a random sample of 500 students. A marketing executive at Facebook wants to advertise
the site with the slogan “More than 85% of female college students think Facebook is
cool.” Can you conclude that the proportion of female college students who think
Facebook is cool is greater than 0.85? Use the 𝛼 = 0.05 level of significance.
Solution:
We have a simple random sample of students. The members of the population fall into
two categories: those who think that Facebook is cool and those who don’t. The size of
the population of female college students is more than 20 times the sample size of
𝑛 = 500. The proportion specified by the null hypothesis is 𝑝0 = 0.85. Now 𝑛𝑝0 =
(500)(0.85) = 425 > 10 and 𝑛 1 − 𝑝0 = (500)(1 − 0.85) = 75 > 10. The assumptions are
satisfied.
The null and alternate hypotheses are: 𝐻0 : 𝑝 = 0.85 versus 𝐻1 : 𝑝 > 0.85
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Solution (continued): Remember
The sample proportion 𝑝 is 0.90. The value of 𝑝 specified 𝐻0 : 𝑝 = 0.85
by the null hypothesis is 𝑝0 = 0.85. The test statistic is the 𝐻1 : 𝑝 > 0.85
𝑛 = 500
𝑧-score of 𝑝 and is given by:
𝑝 = 0.90
𝑝−𝑝0 0.90 − 0.85
𝑧= = = 3.13
𝑝0 1−𝑝0 0.85 1 − 0.85
𝑛 500
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
225
2/13/2017
OBJECTIVE 1
Test a hypothesis about a proportion using the P-
value method
(TI-84 PLUS)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Introduction
In a recent GenX2Z American College Student Survey, 90% of female
college students rated the social network site Facebook as “cool.” The
other 10% rated it as “lame.” Assume that the survey was based on a
sample of 500 students. A marketing executive at Facebook wants to
advertise the site with the slogan “More than 85% of female college
students think Facebook is cool.” Before launching the ad campaign,
he wants to be confident that the slogan is true. Can he conclude that
the proportion of female college students who think Facebook is cool is
greater than 0.85?
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
226
2/13/2017
Notation
We use the following notation:
• 𝑝 is the population proportion of individuals who are in a specified
category.
• 𝑥 is the number of individuals in the sample who are in the
specified category.
• 𝑛 is the sample size.
• 𝑝 is the sample proportion of individuals who are in the specified
𝑥
category. 𝑝 is defined as 𝑝 = 𝑛 .
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assumptions
The method for performing a hypothesis test about a population proportion
requires that the sampling distribution be approximately normal. The following
assumptions ensure this:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
227
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
In a recent GenX2Z American College Student Survey, 90% of female college students
rated the social network site Facebook as “cool.” Assume that the survey was based on
a random sample of 500 students. A marketing executive at Facebook wants to advertise
the site with the slogan “More than 85% of female college students think Facebook is
cool.” Can you conclude that the proportion of female college students who think
Facebook is cool is greater than 0.85? Use the 𝛼 = 0.05 level of significance.
Solution:
We have a simple random sample of students. The members of the population fall into
two categories: those who think that Facebook is cool and those who don’t. The size of
the population of female college students is more than 20 times the sample size of
𝑛 = 500. The proportion specified by the null hypothesis is 𝑝0 = 0.85. Now 𝑛𝑝0 =
(500)(0.85) = 425 > 10 and 𝑛 1 − 𝑝0 = (500)(1 − 0.85) = 75 > 10. The assumptions are
satisfied.
The null and alternate hypotheses are: 𝐻0 : 𝑝 = 0.85 versus 𝐻1 : 𝑝 > 0.85
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
228
2/13/2017
Select Calculate.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Test a hypothesis about a proportion using the
critical value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
229
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
A nationwide survey of working adults indicates that only 50% of them are
satisfied with their jobs. The president of a large company believes that more
than 50% of employees at his company are satisfied with their jobs. To test his
belief, he surveys a random sample of 100 employees, and 54 of them report
that they are satisfied with their jobs. Can he conclude that more than 50% of
employees at the company are satisfied with their jobs? Use the 𝛼 = 0.05 level
of significance.
Solution:
We have a simple random sample from the population of employees. Each
employee is categorized as being satisfied or not satisfied. The sample size is
𝑛 = 100 and the proportion 𝑝0 specified by 𝐻0 is 0.5. Therefore 𝑛𝑝0 = 100(0.5) =
50 > 10, and n(1 − 𝑝0) = 100(1 − 0.5) = 50 > 10. If the total number of
employees in the company is more than 2000, as we shall assume, then the
population is more than 20 times as large as the sample. All the assumptions
are therefore satisfied.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
230
2/13/2017
Example
Solution (continued): Remember
The null and alternate hypotheses are: 𝑥 = 54
𝐻0 : 𝑝 = 0.5 versus 𝐻1: 𝑝 > 0.5 𝑛 = 100
𝑝0 = 0.5
Since the alternate hypothesis is 𝑝 > 0.5, this is a right-tailed test. The critical
value is 𝑧𝛼 = 1.645.
𝑥 54 𝑝−𝑝0 0.54 −0.5
Recall that 𝑝 = 𝑛 = 100 = 0.54, thus 𝑧 = = = 0.80.
𝑝0 1−𝑝0 0.5(1 −0.5)
𝑛 100
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
231
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Determine which method to use when performing a hypothesis test
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
232
2/13/2017
OBJECTIVE 1
Determine which method to use when performing
a hypothesis test
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
How To Begin
There are a variety of hypothesis tests and the task of selecting the
appropriate test and procedure can be overwhelming.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
233
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
There is more than one procedure for testing a population mean, 𝜇. The tests
are
• 𝑧-test (Section 9.2)
• 𝑡-test (Section 9.3)
The diagram below will help in selecting the appropriate test for population
mean, 𝜇.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
234
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
POWER
Section 9.7
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
235
2/13/2017
Objectives
1. Compute the power of a test
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Compute the power of a test
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
236
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Power
The power of a test is the probability of not making a Type II Error. In
other words, the power of a test is the probability that we reject
𝐻0 when it is false.
The power of a test about a population mean depends on the true value
of the population mean. To compute the power, we specify a value 𝜇1
for the population mean that satisfies the alternate hypothesis. The
power is the probability that the test statistic falls in the critical region
when 𝜇1 is the true value of the population mean.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
237
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The 2008 General Social Survey indicates that Americans watch an average of
2.98 hours of television per day, with a standard deviation of 𝜎 = 2.66 hours. A
sociologist believes that the mean number for college students is less, because
students spend more time on the internet and playing video games. The
sociologist will sample 75 college students and test the hypotheses
𝐻0 : 𝜇 = 2.98 𝐻1 : 𝜇 < 2.98
at the 𝛼 = 0.05 level. Assume the population standard deviation for college
students is also 𝜎 = 2.66. Find the power of the test against the alternate 𝜇1= 2.
Solution:
This is a one-tailed test; therefore, we use the critical value 𝑧𝛼 = 1.645.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
238
2/13/2017
Solution
The graph below presents a normal curve
with mean 2 and the value of 𝑥 ∗ = 2.475
Since this is a left-tailed test, the power is the area to the left of 𝑥 ∗ = 2.475. To
find this area, we find the 𝑧-score for 2.475, using the value 𝜇1 = 2.
2.475 −2
𝑧 = 2.66 = 1.55
75
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
239
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 10
Two-Sample Confidence Intervals
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
240
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Distinguish between independent and paired samples
2. Construct confidence intervals for the difference between two
population means
3. Describe the pooled standard deviation and the known standard
deviation methods
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
241
2/13/2017
OBJECTIVE 1
Distinguish between independent and paired
samples
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Design 1: Two samples of individuals are chosen. One sample is given the old
drug and the other sample is given the new drug. After several months, blood
pressures of the members of both samples are measured. We compare the
blood pressures in the first sample to the blood pressures in the second sample
to determine which drug is more effective.
In this design, the samples are independent. This means that the observations
in one sample do not influence the observations in the other.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
242
2/13/2017
Design 2: A single group of individuals is chosen. They are given the old drug
for a month, then their blood pressures are measured. They then switch to the
new drug for a month, after which their blood pressures are measured again.
This produces two samples of measurements, the first one from the old drug
and the second one from the new drug. We compare the blood pressures in the
first sample to the blood pressures in the second sample to determine which
drug is more effective.
In this design, the samples are paired. Each observation in one sample can be
paired with an observation in the second.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 2
Construct confidence intervals for the difference
between two population means
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
243
2/13/2017
Let 𝜇1 be the population mean reduction for the new drug, and let 𝜇2 be the
population mean reduction for the standard drug. We wish to construct a
confidence interval for the difference 𝜇1 − 𝜇2.
It follows that the point estimate for the difference 𝜇1 − 𝜇2 is the difference
between the sample mean. In other words, the point estimate for 𝜇1 − 𝜇2 is 𝑥1
− 𝑥2.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Standard error of 𝑥1 − 𝑥2
The sample means 𝑥1 and 𝑥2 have variances 𝜎12 𝑛1and 𝜎22 𝑛2. When the
samples are independent, the variance of the difference 𝑥1 − 𝑥2 can be shown
to be the sum of the variances. In other words, the variance of 𝑥1 − 𝑥2 is
𝜎12 𝜎22
+
𝑛1 𝑛2
The standard error of 𝑥1 − 𝑥2 is the square root of the variance. Since we don’t
know the values of 𝜎12 and 𝜎22 , we approximate them with the sample variance
𝑠12 𝑠22
𝑠12and 𝑠22. Therefore, the standard error is 𝑛1
+ 𝑛2
.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
244
2/13/2017
There are two ways to compute the number of degrees of freedom: a simple
method that is easier when computing by hand, and a more complicated method
that is used by software packages and calculator procedures. The simple method
is: Degrees of Freedom = smaller of 𝑛1 − 1 and 𝑛2 − 1.
The margin of error is obtained by multiplying the critical value and the standard
𝑠12 𝑠22
error. The margin of error is 𝑡𝛼 2 ∙ + .
𝑛1 𝑛2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assumptions
Welch’s Method requires some assumptions:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
245
2/13/2017
Example
A drug company has developed a new drug that is designed to reduce high blood
pressure. To test the drug, a sample of 15 patients is recruited to take the drug. Their
systolic blood pressures are reduced by an average of 28.3 mmHg, with a standard
deviation of 12.0 mmHg. In addition, another sample of 20 patients takes a standard
drug. The blood pressures in this group are reduced by an average of 17.1 mmHg with a
standard deviation of 9.0 mmHg. Assume that blood pressure reductions are
approximately normally distributed. Find a 95% confidence interval for the difference
between the population mean reduction for the new drug and that of the standard drug.
Solution:
We first check the assumptions. We have two independent random samples, and the
populations are approximately normally distributed. The assumptions are satisfied.
Solution
The point estimate is: Remember:
𝑥1 − 𝑥2 = 28.3 − 17.1 = 11.2 𝑥1 = 28.3 𝑥2 = 17.1
The degrees of freedom is the smaller of 𝑛1 − 1 = 14 𝑠1 = 12 𝑠2 = 9
and 𝑛2 − 1 = 19, which is 14. Using Table A.3, degrees 𝑛1 = 15 𝑛2 = 20
of freedom of 14 and 95% confidence level, we obtain
a critical value of 𝑡𝛼 2 = 2.145.
𝑠12 𝑠22 12.02 9.02
The standard error is + = + = 3.6946. The margin of error is 𝑡𝛼 2
𝑛1 𝑛2 15 20
𝑠12 𝑠22 12.02 9.02
∙ + = 2.145 ∙ + = 7.925.
𝑛1 𝑛2 15 20
The 95% confidence interval is
𝑠12 𝑠22 𝑠12 𝑠22
𝑥1 − 𝑥2 - 𝑡𝛼 2 ∙ + < 𝜇1 − 𝜇2 < 𝑥1 − 𝑥2 + 𝑡𝛼 2 ∙ +
𝑛1 𝑛2 𝑛1 𝑛2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
246
2/13/2017
𝑠2 2 2
1 +𝑠2
𝑛1 𝑛2
Degrees of freedom = 2 2 2
𝑠2
1 𝑛1 + 𝑠2 𝑛1
𝑛1 −1 𝑛1 −1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
247
2/13/2017
Solution:
We first check the assumptions. We have two independent random samples, and the
populations are approximately normally distributed. The assumptions are satisfied.
Select Stats as the input method and enter the following values:
𝑥1 = 28.3 𝑥2 = 17.1
𝑠1 = 12 𝑠2 = 9
𝑛1 = 15 𝑛2 = 20
Enter 0.95 as the confidence level and select No for the pooled option.
Select Calculate.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
248
2/13/2017
OBJECTIVE 3
Describe the pooled standard deviation and the
known standard deviation methods
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Alternate Methods
In most situations in practice, Welch’s method is the method of choice for
constructing confidence intervals for the difference between two means with
independent samples. There are two other methods that are sometimes used.
They are often not the best to use in practice, however, so we will always use
Welch’s method.
Alternate Methods:
1. Using the pooled standard deviation when two population variances 𝜎12
and 𝜎22 are known to be equal.
2. Using 𝑧𝛼 2 when the population variances 𝜎1 and 𝜎2 are known.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
249
2/13/2017
Degrees of freedom = 𝑛1 + 𝑛2 − 2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Note that this method is the same as Welch’s except that the sample
standard deviations 𝑠1 and 𝑠2 are replaced with the population standard
deviations 𝜎1and 𝜎2, and 𝑡𝛼 2 is replaced with 𝑧𝛼 2 .
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
250
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Section 10.2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
251
2/13/2017
Objectives
1. Construct confidence intervals for the difference between two
proportions
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Construct confidence intervals for the difference
between two proportions
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
252
2/13/2017
Notation
Consider the following example. In a study of the effect of air pollution on lung
function, a random sample of 50 children living in a community with a high level of
ozone pollution had their lung capacities measured, and 14 of them had capacities
that were below normal for their size. A second random sample of 80 children was
drawn from a community with a low level of ozone pollution, and 12 of them had lung
capacities that were below normal for their size. We are interested in studying the
difference between the proportions of individuals in two different categories
(communities).
We begin by associating some notation for the population proportions, the numbers
of individuals in each category, and the sample sizes.
• 𝑝1 and 𝑝2 are the population proportions of the category of interest in the two
populations.
• 𝑥1 and 𝑥2 are the numbers of individuals in the category of interest in the two
samples.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝑝1 1−𝑝1 𝑝2 1−𝑝2
The sample proportions 𝑝1 and 𝑝2 have variances and
𝑛1 𝑛2
respectively. When the samples are independent, the variance of the difference
𝑝1 − 𝑝2 can be shown to be the sum of the variances of the sample proportions.
𝑝 1−𝑝1 𝑝 1−𝑝2
In other words, the variance of 𝑝1 − 𝑝2 is 1 + 2 , therefore, the
𝑛1 𝑛2
𝑝1 1−𝑝1 𝑝 1−𝑝
standard error is 𝑛1
+ 2𝑛 2 .
2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
253
2/13/2017
We multiply the standard error by the critical value to obtain the margin of
error:
𝑝1 1 − 𝑝1 𝑝2 1− 𝑝2
Margin of error = 𝑧𝛼 2 𝑛1
+ 𝑛2
𝑝1 1 − 𝑝1 𝑝2 1− 𝑝2
𝑝1 − 𝑝2 ± 𝑧𝛼 2 +
𝑛1 𝑛2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assumptions
The method for constructing a confidence interval for the difference of
population proportions requires that the sampling distribution be
approximately normal. The following assumptions ensure this:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
254
2/13/2017
Example
A random sample of 50 children living in a community with a high level of ozone pollution
had their lung capacities measured, and 14 of them had capacities that were below
normal. A second random sample of 80 children was drawn from a community with a low
level of ozone pollution, and 12 of them had lung capacities that were below normal.
Construct a 95% confidence interval for the difference between the proportions of
children with diminished lung capacity differ between the two communities.
Solution:
We have two independent random samples. The populations of children are more than
20 times as large as the samples. The individuals are divided into two categories with at
least 10 individuals in each category. The assumptions are satisfied. We summarize
the information:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
The point estimate is: Remember:
𝑥1 𝑥2 14 12 𝑥1 = 14 𝑛1 = 50
𝑝1 − 𝑝2 = − = − = 0.130
𝑛1 𝑛2 50 80 𝑥2 = 12 𝑛2 = 80
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
255
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution:
We have two independent random samples. The populations of children are more than
20 times as large as the samples. The individuals are divided into two categories with at
least 10 individuals in each category. The assumptions are satisfied. We summarize
the information:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
256
2/13/2017
Select Stats as the input method and enter the following values:
𝑥1 = 14 𝑛1 = 50
𝑥2 = 12 𝑛2 = 80
The confidence interval is (–0.17, 0.277). We are 95% confident that the
difference between the proportions is between −0.017 and 0.277. This
confidence interval contains 0. Therefore, we cannot be sure that the
proportions of children with diminished lung capacity differ between the
two communities.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
257
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Construct confidence intervals with paired samples
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
258
2/13/2017
OBJECTIVE 1
Construct confidence intervals with paired
samples
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Paired Samples
Suppose we select sixteen volunteers and they are given a test in
which they had to push a button in response to the appearance of an
image on a screen. Their reaction times are measured. Then the
subjects consumed enough alcohol to raise their blood alcohol level to
0.05%. They then took the reaction test again.
These are paired samples, because each value in one sample can be
paired with the value from the same person in the other sample. The
pairs of data are called matched pairs.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
259
2/13/2017
Notation
We can compute the means of the two 0.05% 0% Difference
original samples as well as the mean of 1 102 103 -1
the sample of differences between each 2 100 99 1
matched pair. 3 77 69 8
4 61 50 11
5 85 96 -11
6 50 26 24
Means of the two original samples: 7 95 71 24
𝑥1 and 𝑥2 8 115 109 6
9 64 53 11
Mean of the sample differences 10 98 89 9
between each matched pair: 𝑑 11 107 103 4
12 44 27 17
13 47 50 -3
14 92 100 -8
The data for our experiment, along with the 15 70 66 4
means, is presented in the table: 16 94 86 8
Sample Mean 81.3 74.8 6.5
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Simple arithmetic shows that the mean of the differences, 𝑑, is the same as the
difference between the sample means. In other words, 𝑑 = 𝑥1 − 𝑥2.
The same relationship holds for the populations. If we let 𝜇1and 𝜇2 represent
the population means and 𝜇𝑑 represent the population mean of the difference,
then 𝜇𝑑 = 𝜇1 − 𝜇2.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
260
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assumptions:
2. Either the sample size is large (𝑛 > 30), or the differences between the
matched pairs come from a population that is approximately normal.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
261
2/13/2017
𝑠𝑑 𝑠𝑑
𝑑 – 𝑡𝛼 2 𝑛 < 𝜇𝑑 < 𝑑 + 𝑡𝛼 2 𝑛
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Suppose we select sixteen volunteers 0.05% 0% Difference
and they are given a test in which they 1 102 103 -1
had to push a button in response to the 2 100 99 1
3 77 69 8
appearance of an image on a screen. 4 61 50 11
Their reaction times are measured. 5 85 96 -11
Then the subjects consumed enough 6 50 26 24
alcohol to raise their blood alcohol level 7 95 71 24
8 115 109 6
to 0.05%. They then took the reaction
9 64 53 11
test again. 10 98 89 9
11 107 103 4
12 44 27 17
Construct a 95% confidence for 𝜇𝑑 , the 13 47 50 -3
mean difference in reaction times. 14 92 100 -8
15 70 66 4
16 94 86 8
Sample Mean 81.3 74.8 6.5
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
262
2/13/2017
Solution
First, we check the assumptions. Since the sample size is small (𝑛 = 16), we
construct a boxplot for the differences to check for outliers or strong skewness.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
𝑠𝑑 9.93311 𝑠𝑑
The standard error is = = 2.48328 and the margin of error is 𝑡𝛼 2 𝑛 =
𝑛 16
2.131(2.48328) = 5.292.
We are 95% confident that the mean difference is between 1.2 and 11.8. In
particular, the confidence interval does not contain 0, and all the values in the
confidence interval are positive. We can be fairly certain that the mean reaction
time is greater when the blood alcohol level is 0.05%.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
263
2/13/2017
Let 𝑠1 denote the sample standard deviation for the blood level 0.05% sample
and let 𝑠2 denote the sample standard deviation for the blood level 0 sample.
We have
𝑠1 = 22.71 𝑠2 = 27.39
If the samples had been independent, the standard error would have been
𝑠12 𝑠2 22.712 27.392
+ 𝑛2 = + 16 = 8.90
𝑛1 2 16
Because we were able to use the sample of differences, the standard error was
only 2.48. This smaller value results in a smaller margin of error.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
264
2/13/2017
Chapter 11
Two-Sample Hypothesis Tests
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Section 11.1
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
265
2/13/2017
Objectives
1. Perform a hypothesis test for the difference between two means
using the P-value method
2. Perform a hypothesis test for the difference between two means
using the critical value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OBJECTIVE 1
Perform a hypothesis test for the difference
between two means using the P-value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
266
2/13/2017
Independent Samples
Scores on the National Assessment of Educational Progress (NAEP)
mathematics test range from 0 to 500. In a recent year, the sample mean
score for students using a computer was 309, with a sample standard
deviation of 29. For students not using a computer, the sample mean was
303, with a sample standard deviation of 32. Assume there were 60
students in the computer sample, and 40 students in the sample that didn’t
use a computer. We can see that the sample mean scores differ by 6
points: 309 − 303 = 6. Now, we are interested in the difference between
the population means, which will not be exactly the same as the difference
between the sample means.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Notation
We use the following notation:
• 𝜇1 and 𝜇2 are the population means.
• 𝑥1 and 𝑥2 are the sample means.
• 𝑠1 and 𝑠2 are the sample standard deviations.
• 𝑛1 and 𝑛2 are the sample sizes.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
267
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Test Statistic
The test statistic is based on the difference between the two sample
means 𝑥1 − 𝑥2. The mean of 𝑥1 − 𝑥2 is 𝜇1 − 𝜇2 . We approximate the
standard deviation of 𝑥1 − 𝑥2 with the standard error derived in the
previous chapter.
𝑠12 𝑠22
Standard error of 𝑥1 − 𝑥2 = +
𝑛1 𝑛2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
268
2/13/2017
Assumptions
The method just described requires the following assumptions:
Assumptions:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Step 5: Interpret the P-value. If making a decision, reject 𝐻0 if the P-value is less
than or equal to the significance level 𝛼.
Step 6: State a conclusion.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
269
2/13/2017
Example
The National Assessment of Educational Progress (NAEP) tested a sample of students
who had used a computer in their mathematics classes, and another sample of students
who had not used a computer. The sample mean score for students using a computer
was 309, with a sample standard deviation of 29. For students not using a computer, the
sample mean was 303, with a sample standard deviation of 32. Assume there were 60
students in the computer sample, and 40 students in the sample that hadn’t used a
computer. Can you conclude that the population mean scores differ? Use the α = 0.05
level.
Solution:
We first check the assumptions. We have two independent random samples with sizes
larger than 30. The assumptions are satisfied. We summarize the relevant information:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
There is not enough evidence to conclude that the mean scores differ between those
students who use a computer and those who do not. The mean scores may be the same.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
270
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
271
2/13/2017
Solution:
We press STAT and highlight the TESTS menu and
select 2-SampTTest.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Select Calculate.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
272
2/13/2017
OBJECTIVE 2
Perform a hypothesis test for the difference
between two means using the critical value
method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
273
2/13/2017
Example
Treatment of wastewater is important to reduce the concentration of undesirable pollutants.
One such substance is benzene, which is used as an industrial solvent. Two methods of
water treatment are being compared. Treatment 1 is applied to five specimens of
wastewater, and treatment 2 is applied to seven specimens. The benzene concentrations,
in units of milligrams per liter, for each specimen are as follows:
Treatment 1: 7.8 7.6 5.6 6.8 6.4
Treatment 2: 4.1 6.5 3.7 7.7 7.3 4.7 5.9
How strong is the evidence that the mean concentration is less for treatment 2 than for
treatment 1? We will test at the α = 0.05 significance level.
Solution:
We first check the assumptions. Because
the samples are small, we must check for
strong skewness and outliers. We construct
dotplots for each sample. There are no
outliers, and no evidence of strong skewness,
in either sample.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
The null and alternate hypotheses are:
𝐻0 : 𝜇1 = 𝜇2 𝐻1 : 𝜇1 > 𝜇2
We will find the critical value in Table A.3. The sample sizes are 𝑛1 = 5 and 𝑛2 =
7. For the number of degrees of freedom, we use the smaller of 5 − 1 = 4 and 7 −
1 = 6, which is 4. Because the alternate hypothesis, 𝜇1 − 𝜇2> 0, is right-tailed, the
critical value is the value with area 0.05 to its right.
We consult Table A.3 with 4 degrees of freedom and find that 𝑡𝛼 = 2.132.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
274
2/13/2017
Solution
To compute the test statistic, we first compute the sample means and standard
deviations. These are
The sample sizes are 𝑛1 = 5 and 𝑛2 = 7. Under the assumption that 𝐻0 is true, 𝜇1
− 𝜇2= 0, the value of the test statistic is
𝑥 1 −𝑥 2 − 𝜇1−𝜇2 6.84 −5.70 − 0
𝑡= = = 1.590
0.89892 1.57062
𝑠2 2
1 +𝑠2 + 7
5
𝑛1 𝑛2
There is not enough evidence to conclude that the mean benzene concentration
with treatment 1 is greater than that with treatment 2. The concentrations may be
the same.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Degrees of freedom = 𝑛1 + 𝑛2 − 2
275
2/13/2017
𝑥1 − 𝑥2 − 𝜇1 − 𝜇2
𝑧=
σ12 σ22
+
𝑛1 𝑛2
The assumptions for this method are the same as for the method using
the Student’s 𝑡 distribution, with the additional assumption that the
population standard deviations are known.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
276
2/13/2017
Section 11.2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Perform a hypothesis test for the difference between two
proportions using the P-value method
2. Perform a hypothesis test for the difference between two
proportions using the critical value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
277
2/13/2017
OBJECTIVE 1
Perform a hypothesis test for the difference
between two proportions using the P-value
method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
We can compute the sample proportions of people who used a computer at work in
259
each of these age groups. Among those 25–40, the sample proportion was
350
384
= 0.740, and among those aged 41–65 the sample proportion was = 0.768. So
500
the sample proportion is larger among older workers.
The question of interest, however, involves the population proportions. There are two
populations involved; the population of all employed people aged 25–40, and the
population of all employed people aged 41–65. The question is whether the population
proportion of people aged 41–65 who use a computer at work is greater than the
population proportion among those aged 25–40.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
278
2/13/2017
Notation
We begin by associating some notation for the population proportions, the
sample proportions, the numbers of individuals in each category, and the sample
sizes.
• 𝑝1 and 𝑝2 are the population proportions of the category of interest in the two
populations.
• 𝑝1 and 𝑝2 are the proportions of the category of interest in the two samples.
• 𝑥1 and 𝑥2 are the numbers of individuals in the category of interest in the two
samples.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
𝐻0: 𝑝1 = 𝑝2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
279
2/13/2017
𝑝1 1−𝑝1 𝑝2 1−𝑝2
Mean = 𝑝1 − 𝑝2 Standard Deviation = +
𝑛1 𝑛2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pooled Proportion
To compute the test statistic, we must find values for the mean and standard
deviation. The mean is straightforward: Under the assumption that 𝐻0 is
true, 𝑝1 – 𝑝2 = 0. The standard deviation is a bit more involved. The standard
deviation depends on the population proportions 𝑝1 and 𝑝2 , which are
unknown. We need to estimate 𝑝1 and 𝑝2 . Under 𝐻0 , we assume that 𝑝1 =
𝑝2 . Therefore, we need to estimate 𝑝1 and 𝑝2 with the same value. The value
to use is the pooled proportion, which we will denote by 𝑝.
The pooled proportion is found by treating the two samples as though they
were one big sample. We divide the total number of individuals in the
category of interest in the two samples by the sum of the two sample sizes.
𝑥1 + 𝑥2
𝑝=
𝑛1 + 𝑛2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
280
2/13/2017
𝑝 1−𝑝 𝑝 1−𝑝 1 1
Standard Error = 𝑛1
+ 𝑛2
= 𝑝 1−𝑝 𝑛1
+𝑛
2
𝑝1 − 𝑝2 − 𝑝1 − 𝑝2 𝑝1 − 𝑝2 − 0 𝑝1 − 𝑝2
𝑧= = =
1 1 1 1
𝑝 1−𝑝 + 𝑝 1−𝑝 + 1 1
𝑛1 𝑛2 𝑛1 𝑛2 𝑝 1−𝑝 𝑛1 + 𝑛2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Assumptions
The method just described for performing a hypothesis test for the
difference of population proportions requires the following assumptions.
Assumptions:
2. Each population is at least 20 times as large as the sample drawn from it.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
281
2/13/2017
Step 5: Interpret the P-value. If making a decision, reject 𝐻0 if the P-value is less
than or equal to the significance level 𝛼.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
The General Social Survey took a poll that asked 350 employed people aged 25–40
whether they used a computer at work, and 259 said they did. They also asked the same
question of 500 employed people aged 41–65, and 384 of them said that they used a
computer at work. Can you conclude that the proportion of people who use a computer at
work is greater among those aged 41–65 than among those aged 25–40? Use the 𝛼 =
0.05 level.
Solution:
We first check the assumptions. We have two independent random samples, and the
populations are more than 20 times as large as the samples. The individuals in each
sample are divided into two categories with more than 10 individuals in each category.
The assumptions are satisfied. We summarize the relevant information:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
282
2/13/2017
Since P > 0.05, we do not reject 𝐻0 at the 𝛼 = 0.05 level. We cannot conclude that the
proportion of workers aged 41–65 who use a computer at work is greater than the
proportion among those aged 25–40.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
283
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Select Calculate.
284
2/13/2017
OBJECTIVE 2
Perform a hypothesis test for the difference
between two proportions using the critical value
method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Step 2: If making a decision, choose a significance level 𝛼 and find the critical
value(s).
𝑝1 − 𝑝2 𝑥1+𝑥2
Step 3: Compute the test statistic 𝑧 = where 𝑝 = .
1 1 𝑛1 +𝑛2
𝑝 1−𝑝 𝑛 +𝑛
1 2
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
285
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Traffic engineers tabulated types of car accidents by drivers of various ages. Out of a total
of 82,486 accidents involving drivers aged 15–24 years, 4243 of them, or 5.1%, occurred in
a driveway. Out of a total of 219,170 accidents involving drivers aged 25–64 years, 10,701
of them, or 4.9%, occurred in a driveway. Can you conclude that accidents involving drivers
aged 15–24 are more likely to occur in driveways than accidents involving drivers aged 25–
64? Use 𝛼 = 0.05.
Solution:
We have two independent samples, and the individuals in each sample fall into two
categories with at least 10 individuals in each category. The assumptions are satisfied. We
summarize the relevant information:
Ages 15-24 Ages 25-64 The null and alternate
hypotheses are:
Sample size 𝑛1 = 82,486 𝑛2 = 219,170
Number of individuals 𝑥1 = 4,243 𝑥2 = 10,701 𝐻0 : 𝑝1 = 𝑝2
4,243 10,701
Sample proportion 𝑝1 = 82,486 = 0.051439 𝑝2 = 219,170 = 0.048825 𝐻1 : 𝑝1 > 𝑝2
Population proportion 𝑝1 (unknown) 𝑝2 (unknown)
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
286
2/13/2017
We conclude that accidents involving drivers aged 15–24 are more likely to occur in a
driveway than accidents involving drivers aged 25–64.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
287
2/13/2017
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objectives
1. Perform a hypothesis test with matched pairs using the P-value
method
2. Perform a hypothesis test with matched pairs using the critical
value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
288
2/13/2017
OBJECTIVE 1
Perform a hypothesis test with matched pairs
using the P-value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Paired Samples
A sample of eight automobiles were run to determine their mileage, in
miles per gallon. Then each car was given a tune-up, and run again to
measure the mileage a second time.
Automobile After Before
1 35.44 33.76
The sample mean mileage was 2 35.17 34.30
higher after tune-up. We would 3 31.07 29.55
4 31.57 30.90
like to determine how strong the
5 26.48 24.92
evidence is that the population 6 23.11 21.78
mean mileage is higher after 7 25.18 24.30
tune-up. 8 32.39 31.25
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
289
2/13/2017
Matched Pairs
When we have paired samples, the pairs are called matched pairs. By
computing the difference between the values in each matched pair, we
construct a sample of differences:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Notation
We use the following notation:
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
290
2/13/2017
Assumptions
The method just described requires the following assumptions:
Assumptions:
2. Either the sample size is large (𝑛 > 30), or the differences between
items in the matched pairs show no evidence of strong skewness and
no outliers. This is required to be sure that 𝑑 will be approximately
normally distributed.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Step 5: Interpret the P-value. If making a decision, reject 𝐻0 if the P-value is less than
or equal to the significance level 𝛼.
Step 6: State a conclusion.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
291
2/13/2017
Example
Using the data about a tune-up improving car Automobile Difference
engine gas mileage, test 𝐻0: 𝜇d = 0 versus 1 1.68
2 0.87
𝐻1 : 𝜇d > 0. Use the 𝛼 = 0.01 significance level.
3 1.52
4 0.67
5 1.56
Solution: 6 1.33
7 0.88
We have a simple random sample of differences.
8 1.14
Because the sample size is small (𝑛 = 8), we must
check for signs of strong skewness or outliers.
Following is a dotplot of the differences.
The dotplot does not reveal any outliers or strong skewness. Therefore we
may proceed.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
The null and alternate hypotheses are 𝐻0: 𝜇d = 0 versus 𝐻1 : 𝜇d > 0.
We compute the sample mean and sample standard deviation of the differences.
These are
𝑑 = 1.20625 𝑠𝑑 = 0.37317
Under the assumption that 𝐻0 is true, 𝜇d = 𝜇0 = 0, and the value of the test
statistic is
𝑑 − 𝜇0 1.20625 −0
𝑡= = = 9.1427.
𝑠𝑑 𝑛 0.37317 8
Also, the test statistic has a 𝑡 distribution with 𝑛 − 1 = 8 − 7 degrees of freedom.
Since this is a right tailed test, the P-value is the area to the right of the observed
value of 𝑡 = 9.1427. Using Table A.3 or technology, we find that P = 0.0000193.
The P-value is nearly 0, which is very strong evidence against 𝐻0. Because P <
0.01, we reject 𝐻0 at the 𝛼 = 0.01 level. We conclude that the gas mileage
increased after a tune-up.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
292
2/13/2017
OBJECTIVE 2
Perform a hypothesis test with matched pairs
using the critical value method
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
293
2/13/2017
Example
For a sample of nine automobiles, Automobile Rear Front Difference
the mileage (in 1000s of miles) at 1 42.7 32.8 9.9
which the original front brake pads 2 36.7 26.6 10.1
3 46.1 35.6 10.5
were worn to 10% of their original
4 46.0 36.4 9.6
thickness was measured, as was
5 39.9 29.2 10.7
the mileage at which the original 6 51.7 40.9 10.8
rear brake pads were worn to 10% 7 51.6 40.9 10.7
of their original thickness. The 8 46.1 34.8 11.3
results are given. 9 47.3 36.6 10.7
The differences in the last column of the table are Rear − Front. Can you
conclude that the mean time for the rear brake pads to wear out is longer than
the mean time for the front pads? Use the α = 0.05 significance level.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Since the sample size is small, we construct a dotplot.
We are interested in determining whether the mean time for the rear pads is
longer than for the front. Therefore, the hypotheses are
𝐻0 : 𝜇d = 0 𝐻1 : 𝜇d > 0
Because this is a right-tailed test, the critical value is the value for which the
area to the right is 0.05. The sample size is 𝑛 = 9, so there are 9 − 1 = 8
degrees of freedom. The critical value is 𝑡𝛼 = 1.860.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
294
2/13/2017
Solution
We compute the sample mean and standard deviation of the differences
𝑑 = 10.1444 𝑠𝑑 = 1.0333
𝑑−0 10.478
The test statistic is 𝑡 = = = 60.28.
𝑠𝑑 𝑛 0.5215 9
We conclude that the mean time for rear brake pads to wear out is longer than
the mean time for front brake pads.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
295