ch07 Mah2

CHAPTER 7
SAMPLING
DISTRIBUTIONS
In Chapter 3, a sample statistic is a numerical summary
measure calculated for sample data. The mean, median,
mode, and standard deviation calculated for sample data
are called sample statistics.
On the other hand, the same numerical summary

measures calculated for population data are called
population parameters.
A population parameter is always a constant, whereas a
sample statistic is always a random variable. Because every
random variable must possess a probability distribution,
each sample statistic possesses a probability distribution.
The probability distribution of a sample statistic is more

commonly called its sampling distribution. In this chapter
we discusses the sampling distributions of the sample mean
and the sample proportion.
POPULATION AND SAMPLING
DISTRIBUTIONS
 PopulationDistribution
 Sampling Distribution
Population Distribution
Definition
The population distribution is the probability

distribution of the population data.
Population Distribution
 Suppose there are only five students in an advanced

statistics class and the midterm scores of these five
students are
70 78 80 80 95
 Let x denote the score of a student

Table 7.1 Population Frequency and Relative
Frequency Distributions
Table 7.2 Population Probability Distribution
Sampling Distribution
Definition
The probability distribution of x is called its sampling

distribution. It lists the various values that x can
assume and the probability of each value of x.
In general, the probability distribution of a sample
statistic is called its sampling distribution.
 Reconsider the population of midterm scores of five

students given in Table 7.1
 Consider all possible samples of three scores each that
can be selected, without replacement, from that
population.
 The total number of possible samples is
5! 5  4  3  2 1
5 C3    10
3!(5  3)! 3  2  1  2  1
 Suppose we assign the letters A, B, C, D, and E to the

scores of the five students so that
 A = 70, B = 78, C = 80, D = 80, E = 95
 Then, the 10 possible samples of three scores each are

ABC, ABD, ABE, ACD, ACE,
ADE, BCD, BCE, BDE, CDE

Table 7.3 All Possible Samples and Their Means
When the Sample Size Is 3
Table 7.4 Frequency and Relative Frequency
Distributions of x When the Sample Size Is 3
Table 7.5 Sampling Distribution of x When the
Sample Size Is 3
SAMPLING AND NONSAMPLING ERRORS
Definition
Sampling error is the difference between the value of a

sample statistic and the value of the corresponding
population parameter. In the case of the mean,
Sampling error = x

assuming that the sample is random, and no non-
sampling error has been made.
 Definition
The errors that occur in the collection,

recording, and tabulation of data are called
non-sampling errors.
Reasons for the Occurrence of Non-sampling
Errors
 1. If a sample is nonrepresentative the sample results may be too

difference from the census results.
 2. The questions may be phrased in such a way that they are not
fully understood by the members of the sample or population.
 3. The respondents may intentionally give false information in
response to some sensitive questions.
 4. The poll taker may make a mistake and enter a wrong number in
the records or make an error while entering the data on a computer.
Example 7-1
 Reconsider the population of five scores given in

Table 7.1. Suppose one sample of three scores is
selected from this population, and this sample
includes the scores 70, 80, and 95. Find the
sampling error.
Example 7-1: Solution
70  78  80  80  95
  80.60
5
70  80  95
x  81.67
3
Sampling error  x    81.67  80.60  1.07
That is, the mean score estimated from the sample is

1.07 higher than the mean score of the population.
Now suppose, when we select the sample of three scores,

we mistakenly record the second score as 82 instead of
80.
As a result, we calculate the sample mean as

70  82  95
x  82.33
3
The difference between this sample mean and the

population mean is
x    82.33  80.60  1.73
This difference does not represent the sampling error.
Only 1.07 of this difference is due to the sampling error.

 The remaining portion represents the non-sampling

error.
 It is equal to 1.73 – 1.07 = 0.66
 It occurred due to the error we made in recording
the second score in the sample
 Also,
Nonsampling error  Incorrect x  Correct x
 82.33  81.67  .66
Figure 7.1 Sampling and nonsampling errors.
MEAN AND STANDARD DEVIATION OF x
 Definition
 The mean and standard deviation of the sampling

distribution of x are called the mean and standard
deviation of x and are denoted by  x and  x ,
respectively.
 Mean of the Sampling Distribution of x
 The mean of the sampling distribution of x
is always equal to the mean of the
population. Thus,
x  
 Standard Deviation of the Sampling Distribution
of x
 The standard deviation of the sampling
distribution of x is

x 
n
 where σ is the standard deviation of the population and n
is the sample size. This formula is used when n /N ≤ 0.05,
where N is the population size.
Two Important Observations
1. The spread of the sampling distribution of x is

smaller than the spread of the corresponding
population distribution, i.e.
x x
2. The standard deviation of the sampling
distribution of x decreases as the sample size
increases.
Example 7-2
The mean wage for all 5000 employees who work at a large
company is $27.50 and the standard deviation is $3.70.
Let be the mean wage per hour for a random sample of certain
x
employees selected from this company. Find the mean and
standard deviation of for a sample size of
x
 (a) 30 (b) 75 (c) 200
(a) N = 5000, μ = $27.50, σ = $3.70.
In this case, n/N = 30/5000 = 0.006 <0.05.
 x    $27.50
 3.70
x    $.676
n 30
(b) N = 5000, μ = $27.50, σ = $3.70.
In this case, n/N = 75/5000 = 0.015 < 0.05.

 x    $27.50
 3.70
x    $.427
n 75
(c) In this case, n = 200 and
n/N = 200/5000 = 0.04, which is less than 0.05.
 x    $27.50
 3.70
x    $.262
n 200
SHAPE OF THE SAMPLING DISTRIBUTION
OF x
 The population from which samples are drawn has a

normal distribution.
 The population from which samples are drawn does
not have a normal distribution.
Sampling From a Normally Distributed
Population
 If the population from which the samples are drawn is

normally distributed with mean μ and standard
deviation σ, then the sampling distribution of the sample
mean, x , will also be normally distributed with the
following mean and standard deviation, irrespective of
the sample size:

 x   and  x 
n
Figure 7.2 Population distribution and sampling
x
distributions of .
Example 7-3
 In a recent SAT, the mean score for all examinees was 1020.
Assume that the distribution of SAT scores of all examinees is
normal with the mean of 1020 and a standard deviation of
153. Let x be the mean SAT score of a random sample of
certain examinees. Calculate the mean and standard deviation
of x and describe the shape of its sampling distribution when
the sample size is
 (a) 16 (b) 50 (c) 1000
(a) μ = 1020 and σ = 153.
 x    1020
 153
x    38.250
n 16
Figure 7.3
(b)
 x    1020
 153
x    21.637
n 50
Figure 7.4
(c)
 x    1020
 153
x    4.838
n 1000
Figure 7.5
Sampling From a Population That Is Not
Normally Distributed
 Central Limit Theorem

 According to the central limit theorem, for a large sample size,
the sampling distribution of x is approximately normal,
irrespective of the shape of the population distribution. The mean
and standard deviation of the sampling distribution of x are

 x   and  x 
n
 The sample size is usually considered to be large if n ≥ 30.
Figure 7.6 Population distribution and sampling
x
distributions of .
Example 7-4
 The mean rent paid by all tenants in a small city is $1550

with a standard deviation of $225. However, the
population distribution of rents for all tenants in this city
is skewed to the right. Calculate the mean and standard
deviation of x and describe the shape of its sampling
distribution when the sample size is
 (a) 30 (b) 100
(a) Let x be the mean rent paid by a sample

of 30 tenants.
 x    $1550
 225
x    $41.079
n 30
Figure 7.7
(b) Let x be the mean rent paid by a sample

of 100 tenants.
 x    $1550
 225
x    $22.500
n 100
Figure 7.8
APPLICATIONS OF THE SAMPLING
DISTRIBUTION OF x
1. If we take all possible samples of the same (large)

size from a population and calculate the mean for
each of these samples, then about 68.26% of the
sample means will be within one standard deviation
of the population mean.
Figure 7.9 P (   1 x  x    1 x )
DISTRIBUTION OF x

sample means will be within two standard deviations
of the population mean.
Figure 7.10P (   2 x  x    2 x )
DISTRIBUTION OF x

sample means will be within three standard
deviations of the population mean.
Figure 7.11P (   3 x  x    3 x )
Example 7-5
Assume that the weights of all packages of a certain

brand of cookies are normally distributed with a mean
of 32 ounces and a standard deviation of 0.3 ounce.
Find the probability that the mean weight, x , of a
random sample of 20 packages of this brand of cookies
will be between 31.8 and 31.9 ounces.
 x    32 ounces
 .3
x    .06708204 ounce
n 20
z Value for a Value of x
The z value for a value of x is calculated

as
x 
z 
x
31.8  32
 For x = 31.8: z  2.98
.06708204
31.9  32
 For x = 31.9: z  1.49
.06708204
 P(31.8 < x < 31.9) = P(-2.98 < z < -1.49)

= P(z < -1.49) - P(z < -2.98)
= .0681 - .0014 = .0667
Figure 7.12
Example 7-6
According to Sallie Mae surveys and Credit Bureau

data, college students carried an average of $3173 credit
card debt in 2008. Suppose the probability distribution
of the current credit card debts for all college students in
the United States is known but its mean is $3173, and
the standard deviation is $750. Let x be the mean credit
card debt of a random sample of 400 U.S. college
students.
Example 7-6
a) What is the probability that the mean of the current

credit card debts for this sample is within $70 of the
population mean?
b) What is the probability that the mean of the current

credit card debts for this sample is lower than the
population mean by $50 or more?
μ = $3173 and σ = $750. The shape of the

probability distribution of the population is
unknown. However, the sampling distribution of
is approximately normal because the sample is large
(n > 30).
(a)
 P($3103 ≤ x≤ $3243)
= P(-1.87 ≤ z ≤ 1.87) = .9693
- .0307 = .9386
Figure 7.13 P ($3103  x  $3243)
Therefore, the probability that the mean of the

current credit card debts for this sample is within
$70 of the population mean is 0.9386.
(b)
 For x = $3123:
3123  3173
z  1.33
37.50
 P( x ≤ 3123) = P (z ≤ -1.33)
= 0.0918
Figure 7.14 P ( x  $3123)
Therefore, the probability that the mean of the

current credit card debts for this sample is lower
than the population mean by $50 or more is
0.0918.
POPULATION AND SAMPLE PROPORTIONS
The population and sample proportions, denoted by p

and p̂ , respectively, are calculated as
X x
p and pˆ 
N n
POPULATION AND SAMPLE
PROPORTIONS
where
 N = total number of elements in the population
 n = total number of elements in the sample
 X = number of elements in the population that possess a
specific characteristic
 x = number of elements in the sample that possess a
specific characteristic
Example 7-7
Suppose a total of 789,654 families live in a

city and 563,282 of them own homes. A sample
of 240 families is selected from this city, and
158 of them own homes. Find the proportion of
families who own homes in the population and
in the sample.
X 563,282
p   .71
N 789,654
x 158
pˆ    .66
n 240
MEAN, STANDARD DEVIATION, AND SHAPE OF
THE SAMPLING DISTRIBUTION OF p̂
 Sampling Distribution of p̂
 Mean and Standard Deviation of p̂
 Shape of the Sampling Distribution of p̂

Sampling Distribution of the Sample Proportion p̂
Definition
The probability distribution of the sample

proportion,
p̂ , is called its sampling
distribution. It gives various values that
p̂ assume and their probabilities.
can
Example 7-8
Boe Consultant Associates has five

employees. Table 7.6 gives the names of
these five employees and information
concerning their knowledge of statistics.
Table 7.6 Information on the Five Employees of
Boe Consultant Associates
Example 7-8
If we define the population proportion, p, as the

proportion of employees who know statistics,
then
p = 3 / 5 = 0.60
Example 7-8
Now, suppose we draw all possible samples of

three employees each and compute the proportion
of employees, for each sample, who know
statistics.
5! 5  4  3  2 1
Total number of samples  5C3    10
3!(5  3)! 3  2  1 2  1
Table 7.7 All Possible Samples of Size 3 and the
Value of p̂ for Each Sample
Table 7.8 Frequency and Relative Frequency
Distribution of p̂ When the Sample Size Is 3
Table 7.9 Sampling Distribution of p̂ When the
Sample Size is 3
Mean and Standard Deviation of p̂
Mean of the Sample Proportion
The mean of the sample proportion, p̂ , is

denoted by  p̂ and is equal to the population
proportion, p. Thus,
 pˆ  p
Mean and Standard Deviation of p̂
Standard Deviation of the Sample Proportion
The standard deviation of the sample proportion, p̂ , is

denoted by  p̂ and is given by the formula
pq
 pˆ 
n
where p is the population proportion, q = 1 – p , and n is the
sample size. This formula is used when n/N ≤ 0.05, where N is
the population size.
Shape of the Sampling Distribution ofp̂
Central Limit Theorem for Sample Proportion
According to the central limit theorem, the sampling

distribution of p̂ is approximately normal for a sufficiently
large sample size. In the case of proportion, the sample
size is considered to be sufficiently large if np and nq are
both greater than 5 that is, if
np > 5 and nq >5

Example 7-9
According to a survey by Harris Interactive conducted in

February 2009 for the charitable agency World Vision,
56% of U.S. teens volunteer time for charitable causes.
Assume that this result is true for the current population
of all U.S. teens. Letp̂ be the proportion of U.S. teens in
a random sample of 1500 who volunteer time for
p̂
charitable causes. Find the mean and standard deviation
of and describe the shape of its sampling distribution.
p  .56 and q  1  p  1  .56  .44

 pˆ  p  .56
pq (.56)(.44)
 pˆ    .0128
n 1500
np  1500(.56)  840 and nq  1500(.44)  660
np and nq are both greater than 5.
Therefore, the sampling distribution of p̂

is approximately normal (by the central limit
theorem) with a mean of 0.56 and a standard
deviation of 0.0128, as shown in Figure 7.15.
Figure 7.15
Applications of the Sampling Distribution of p̂
Example 7-10
According to the BBMG Conscious Consumer Report, 51% of
the adults surveyed said that they are willing to pay more for
products with social and environmental benefits despite the
current tough economic times (USA TODAY, June 8, 2009).
Suppose that this result is true for the current population of adult
Americans. Let p̂ be the proportion in a random sample of
1050 adult Americans who will hold the said opinion. Find the
probability that the value of p̂ is between 0.53 and 0.55.
n =1050, p = 0.51, and q = 1 – p = 1 - 0.51 = 0.49
pq (.51)(.49)
  p  .51 and  pˆ 

p   .0154725
n 1050
np  1050(.51)  535.5  5 and nq  1050(.49)  514.5  5
We can infer from the central limit theorem that the

sampling distribution ofp̂ is approximately normal.
Figure 7.16 P (.53  pˆ  .55)
z Value for a Value of p̂
The z value for a value of p̂ is calculated as

pp
z 
 p
.53  .51
 For p̂ = 0.53: z  .01542725  1.30
.55  .51
 For p̂ = 0.55: z  2.59
.01542725
 P(.53 < p̂ < .55) = P(1.30 < z < 2.59)

= 0.9952 - 0.9032
= 0.0920
Thus, the probability is .0920 that the

proportion of U.S. adults in a random sample of
1050 who will be willing to pay more for
products with social and environmental benefits
despite the current tough economic times is
between .53 and .55.
Figure 7.17 P (.53  pˆ  .55)
Example 7-11
Maureen Webster, who is running for mayor in a

large city, claims that she is favored by 53% of all
eligible voters of that city. Assume that this claim is
true. What is the probability that in a random sample
of 400 registered voters taken from this city, less
than 49% will favor Maureen Webster?
n =400, p = 0.53, and q = 1 – p = 1 - 0.53 = 0.47
pq (.53)(.47)
  p  .53 and  pˆ 

p   .02495496
n 400
.49  .53
z  1.60
.02495496
P( p̂ < 0.49) = P(z < -1.60)

= 0.0548
Hence, the probability that less than 49% of the voters
in a random sample of 400 will favor Maureen
Webster is .0548.
Figure 7.18 P ( pˆ  .49)

ch07 Mah2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ch07 Mah2

Uploaded by

Copyright:

Available Formats

CHAPTER 7

On the other hand, the same numerical summary

The probability distribution of a sample statistic is more

The population distribution is the probability

 Suppose there are only five students in an advanced

 Let x denote the score of a student

The probability distribution of x is called its sampling

 Reconsider the population of midterm scores of five

 Suppose we assign the letters A, B, C, D, and E to the

 Then, the 10 possible samples of three scores each are

ADE, BCD, BCE, BDE, CDE

Sampling error is the difference between the value of a

Sampling error = x

The errors that occur in the collection,

 1. If a sample is nonrepresentative the sample results may be too

 Reconsider the population of five scores given in

That is, the mean score estimated from the sample is

Now suppose, when we select the sample of three scores,

As a result, we calculate the sample mean as

The difference between this sample mean and the

This difference does not represent the sampling error.

Only 1.07 of this difference is due to the sampling error.

 The remaining portion represents the non-sampling

 The mean and standard deviation of the sampling

1. The spread of the sampling distribution of x is

(a) N = 5000, μ = $27.50, σ = $3.70.

In this case, n/N = 30/5000 = 0.006 <0.05.

(b) N = 5000, μ = $27.50, σ = $3.70.

In this case, n/N = 75/5000 = 0.015 < 0.05.

(c) In this case, n = 200 and

n/N = 200/5000 = 0.04, which is less than 0.05.

 The population from which samples are drawn has a

 If the population from which the samples are drawn is

(a) μ = 1020 and σ = 153.

 Central Limit Theorem

 The mean rent paid by all tenants in a small city is $1550

(a) Let x be the mean rent paid by a sample

(b) Let x be the mean rent paid by a sample

1. If we take all possible samples of the same (large)

2. If we take all possible samples of the same (large)

3. If we take all possible samples of the same (large)

Assume that the weights of all packages of a certain

The z value for a value of x is calculated

 P(31.8 < x < 31.9) = P(-2.98 < z < -1.49)

According to Sallie Mae surveys and Credit Bureau

a) What is the probability that the mean of the current

b) What is the probability that the mean of the current

μ = $3173 and σ = $750. The shape of the

Therefore, the probability that the mean of the

Therefore, the probability that the mean of the

The population and sample proportions, denoted by p

Suppose a total of 789,654 families live in a

 Shape of the Sampling Distribution of p̂

The probability distribution of the sample

Boe Consultant Associates has five

If we define the population proportion, p, as the

Now, suppose we draw all possible samples of

Mean of the Sample Proportion

The mean of the sample proportion, p̂ , is

The standard deviation of the sample proportion, p̂ , is

Central Limit Theorem for Sample Proportion

According to the central limit theorem, the sampling