Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 99

CHAPTER 7

SAMPLING
DISTRIBUTIONS
In Chapter 3, a sample statistic is a numerical summary
measure calculated for sample data. The mean, median,
mode, and standard deviation calculated for sample data
are called sample statistics.

On the other hand, the same numerical summary


measures calculated for population data are called
population parameters.
A population parameter is always a constant, whereas a
sample statistic is always a random variable. Because every
random variable must possess a probability distribution,
each sample statistic possesses a probability distribution.

The probability distribution of a sample statistic is more


commonly called its sampling distribution. In this chapter
we discusses the sampling distributions of the sample mean
and the sample proportion.
POPULATION AND SAMPLING
DISTRIBUTIONS
 PopulationDistribution
 Sampling Distribution
Population Distribution

Definition

The population distribution is the probability


distribution of the population data.
Population Distribution

 Suppose there are only five students in an advanced


statistics class and the midterm scores of these five
students are
70 78 80 80 95

 Let x denote the score of a student


Table 7.1 Population Frequency and Relative
Frequency Distributions
Table 7.2 Population Probability Distribution
Sampling Distribution

Definition

The probability distribution of x is called its sampling


distribution. It lists the various values that x can
assume and the probability of each value of x.
In general, the probability distribution of a sample
statistic is called its sampling distribution.
Sampling Distribution

 Reconsider the population of midterm scores of five


students given in Table 7.1
 Consider all possible samples of three scores each that
can be selected, without replacement, from that
population.
 The total number of possible samples is
5! 5  4  3  2 1
5 C3    10
3!(5  3)! 3  2  1  2  1
Sampling Distribution

 Suppose we assign the letters A, B, C, D, and E to the


scores of the five students so that
 A = 70, B = 78, C = 80, D = 80, E = 95

 Then, the 10 possible samples of three scores each are


ABC, ABD, ABE, ACD, ACE,

ADE, BCD, BCE, BDE, CDE


Table 7.3 All Possible Samples and Their Means
When the Sample Size Is 3
Table 7.4 Frequency and Relative Frequency
Distributions of x When the Sample Size Is 3
Table 7.5 Sampling Distribution of x When the
Sample Size Is 3
SAMPLING AND NONSAMPLING ERRORS
Definition

Sampling error is the difference between the value of a


sample statistic and the value of the corresponding
population parameter. In the case of the mean,

Sampling error = x


assuming that the sample is random, and no non-
sampling error has been made.
SAMPLING AND NONSAMPLING ERRORS

 Definition

The errors that occur in the collection,


recording, and tabulation of data are called
non-sampling errors.
Reasons for the Occurrence of Non-sampling
Errors

 1. If a sample is nonrepresentative the sample results may be too


difference from the census results.
 2. The questions may be phrased in such a way that they are not
fully understood by the members of the sample or population.
 3. The respondents may intentionally give false information in
response to some sensitive questions.
 4. The poll taker may make a mistake and enter a wrong number in
the records or make an error while entering the data on a computer.
Example 7-1

 Reconsider the population of five scores given in


Table 7.1. Suppose one sample of three scores is
selected from this population, and this sample
includes the scores 70, 80, and 95. Find the
sampling error.
Example 7-1: Solution
70  78  80  80  95
  80.60
5
70  80  95
x  81.67
3
Sampling error  x    81.67  80.60  1.07

That is, the mean score estimated from the sample is


1.07 higher than the mean score of the population.
SAMPLING AND NONSAMPLING ERRORS

Now suppose, when we select the sample of three scores,


we mistakenly record the second score as 82 instead of
80.

As a result, we calculate the sample mean as


70  82  95
x  82.33
3
SAMPLING AND NONSAMPLING ERRORS

The difference between this sample mean and the


population mean is
x    82.33  80.60  1.73

This difference does not represent the sampling error.

Only 1.07 of this difference is due to the sampling error.


SAMPLING AND NONSAMPLING ERRORS

 The remaining portion represents the non-sampling


error.
 It is equal to 1.73 – 1.07 = 0.66
 It occurred due to the error we made in recording
the second score in the sample
 Also,
Nonsampling error  Incorrect x  Correct x
 82.33  81.67  .66
Figure 7.1 Sampling and nonsampling errors.
MEAN AND STANDARD DEVIATION OF x

 Definition

 The mean and standard deviation of the sampling


distribution of x are called the mean and standard
deviation of x and are denoted by  x and  x ,
respectively.
MEAN AND STANDARD DEVIATION OF x
 Mean of the Sampling Distribution of x
 The mean of the sampling distribution of x
is always equal to the mean of the
population. Thus,
x  
MEAN AND STANDARD DEVIATION OF x
 Standard Deviation of the Sampling Distribution
of x
 The standard deviation of the sampling
distribution of x is

x 
n
 where σ is the standard deviation of the population and n
is the sample size. This formula is used when n /N ≤ 0.05,
where N is the population size.
Two Important Observations

1. The spread of the sampling distribution of x is


smaller than the spread of the corresponding
population distribution, i.e.
x x
2. The standard deviation of the sampling
distribution of x decreases as the sample size
increases.
Example 7-2

The mean wage for all 5000 employees who work at a large
company is $27.50 and the standard deviation is $3.70.

Let be the mean wage per hour for a random sample of certain
x
employees selected from this company. Find the mean and
standard deviation of for a sample size of
x
 (a) 30 (b) 75 (c) 200
Example 7-2: Solution

(a) N = 5000, μ = $27.50, σ = $3.70.

In this case, n/N = 30/5000 = 0.006 <0.05.

 x    $27.50
 3.70
x    $.676
n 30
Example 7-2: Solution

(b) N = 5000, μ = $27.50, σ = $3.70.

In this case, n/N = 75/5000 = 0.015 < 0.05.


 x    $27.50
 3.70
x    $.427
n 75
Example 7-2: Solution

(c) In this case, n = 200 and

n/N = 200/5000 = 0.04, which is less than 0.05.

 x    $27.50
 3.70
x    $.262
n 200
SHAPE OF THE SAMPLING DISTRIBUTION
OF x

 The population from which samples are drawn has a


normal distribution.
 The population from which samples are drawn does
not have a normal distribution.
Sampling From a Normally Distributed
Population

 If the population from which the samples are drawn is


normally distributed with mean μ and standard
deviation σ, then the sampling distribution of the sample
mean, x , will also be normally distributed with the
following mean and standard deviation, irrespective of
the sample size:

 x   and  x 
n
Figure 7.2 Population distribution and sampling
x
distributions of .
Example 7-3

 In a recent SAT, the mean score for all examinees was 1020.
Assume that the distribution of SAT scores of all examinees is
normal with the mean of 1020 and a standard deviation of
153. Let x be the mean SAT score of a random sample of
certain examinees. Calculate the mean and standard deviation
of x and describe the shape of its sampling distribution when
the sample size is
 (a) 16 (b) 50 (c) 1000
Example 7-3: Solution

(a) μ = 1020 and σ = 153.

 x    1020
 153
x    38.250
n 16
Figure 7.3
Example 7-3: Solution

(b)

 x    1020
 153
x    21.637
n 50
Figure 7.4
Example 7-3: Solution

(c)

 x    1020
 153
x    4.838
n 1000
Figure 7.5
Sampling From a Population That Is Not
Normally Distributed

 Central Limit Theorem


 According to the central limit theorem, for a large sample size,
the sampling distribution of x is approximately normal,
irrespective of the shape of the population distribution. The mean
and standard deviation of the sampling distribution of x are

 x   and  x 
n
 The sample size is usually considered to be large if n ≥ 30.
Figure 7.6 Population distribution and sampling
x
distributions of .
Example 7-4

 The mean rent paid by all tenants in a small city is $1550


with a standard deviation of $225. However, the
population distribution of rents for all tenants in this city
is skewed to the right. Calculate the mean and standard
deviation of x and describe the shape of its sampling
distribution when the sample size is
 (a) 30 (b) 100
Example 7-4: Solution

(a) Let x be the mean rent paid by a sample


of 30 tenants.

 x    $1550
 225
x    $41.079
n 30
Figure 7.7
Example 7-4: Solution

(b) Let x be the mean rent paid by a sample


of 100 tenants.

 x    $1550
 225
x    $22.500
n 100
Figure 7.8
APPLICATIONS OF THE SAMPLING
DISTRIBUTION OF x

1. If we take all possible samples of the same (large)


size from a population and calculate the mean for
each of these samples, then about 68.26% of the
sample means will be within one standard deviation
of the population mean.
Figure 7.9 P (   1 x  x    1 x )
APPLICATIONS OF THE SAMPLING
DISTRIBUTION OF x

2. If we take all possible samples of the same (large)


size from a population and calculate the mean for
each of these samples, then about 95.44% of the
sample means will be within two standard deviations
of the population mean.
Figure 7.10P (   2 x  x    2 x )
APPLICATIONS OF THE SAMPLING
DISTRIBUTION OF x

3. If we take all possible samples of the same (large)


size from a population and calculate the mean for
each of these samples, then about 99.74% of the
sample means will be within three standard
deviations of the population mean.
Figure 7.11P (   3 x  x    3 x )
Example 7-5

Assume that the weights of all packages of a certain


brand of cookies are normally distributed with a mean
of 32 ounces and a standard deviation of 0.3 ounce.
Find the probability that the mean weight, x , of a
random sample of 20 packages of this brand of cookies
will be between 31.8 and 31.9 ounces.
Example 7-5: Solution

 x    32 ounces
 .3
x    .06708204 ounce
n 20
z Value for a Value of x

The z value for a value of x is calculated


as

x 
z 
x
Example 7-5: Solution

31.8  32
 For x = 31.8: z  2.98
.06708204
31.9  32
 For x = 31.9: z  1.49
.06708204

 P(31.8 < x < 31.9) = P(-2.98 < z < -1.49)


= P(z < -1.49) - P(z < -2.98)
= .0681 - .0014 = .0667
Figure 7.12
Example 7-6

According to Sallie Mae surveys and Credit Bureau


data, college students carried an average of $3173 credit
card debt in 2008. Suppose the probability distribution
of the current credit card debts for all college students in
the United States is known but its mean is $3173, and
the standard deviation is $750. Let x be the mean credit
card debt of a random sample of 400 U.S. college
students.
Example 7-6

a) What is the probability that the mean of the current


credit card debts for this sample is within $70 of the
population mean?

b) What is the probability that the mean of the current


credit card debts for this sample is lower than the
population mean by $50 or more?
Example 7-6: Solution

μ = $3173 and σ = $750. The shape of the


probability distribution of the population is
unknown. However, the sampling distribution of
is approximately normal because the sample is large
(n > 30).
Example 7-6: Solution
(a)

 P($3103 ≤ x≤ $3243)
= P(-1.87 ≤ z ≤ 1.87) = .9693
- .0307 = .9386
Figure 7.13 P ($3103  x  $3243)
Example 7-6: Solution

Therefore, the probability that the mean of the


current credit card debts for this sample is within
$70 of the population mean is 0.9386.
Example 7-6: Solution
(b)
 For x = $3123:
3123  3173
z  1.33
37.50

 P( x ≤ 3123) = P (z ≤ -1.33)
= 0.0918
Figure 7.14 P ( x  $3123)
Example 7-6: Solution

Therefore, the probability that the mean of the


current credit card debts for this sample is lower
than the population mean by $50 or more is
0.0918.
POPULATION AND SAMPLE PROPORTIONS

The population and sample proportions, denoted by p


and p̂ , respectively, are calculated as
X x
p and pˆ 
N n
POPULATION AND SAMPLE
PROPORTIONS

where
 N = total number of elements in the population
 n = total number of elements in the sample
 X = number of elements in the population that possess a
specific characteristic
 x = number of elements in the sample that possess a
specific characteristic
Example 7-7

Suppose a total of 789,654 families live in a


city and 563,282 of them own homes. A sample
of 240 families is selected from this city, and
158 of them own homes. Find the proportion of
families who own homes in the population and
in the sample.
Example 7-7: Solution

X 563,282
p   .71
N 789,654
x 158
pˆ    .66
n 240
MEAN, STANDARD DEVIATION, AND SHAPE OF
THE SAMPLING DISTRIBUTION OF p̂

 Sampling Distribution of p̂
 Mean and Standard Deviation of p̂

 Shape of the Sampling Distribution of p̂


Sampling Distribution of the Sample Proportion p̂

Definition

The probability distribution of the sample


proportion,
p̂ , is called its sampling
distribution. It gives various values that
p̂ assume and their probabilities.
can
Example 7-8

Boe Consultant Associates has five


employees. Table 7.6 gives the names of
these five employees and information
concerning their knowledge of statistics.
Table 7.6 Information on the Five Employees of
Boe Consultant Associates
Example 7-8

If we define the population proportion, p, as the


proportion of employees who know statistics,
then

p = 3 / 5 = 0.60
Example 7-8

Now, suppose we draw all possible samples of


three employees each and compute the proportion
of employees, for each sample, who know
statistics.
5! 5  4  3  2 1
Total number of samples  5C3    10
3!(5  3)! 3  2  1 2  1
Table 7.7 All Possible Samples of Size 3 and the
Value of p̂ for Each Sample
Table 7.8 Frequency and Relative Frequency
Distribution of p̂ When the Sample Size Is 3
Table 7.9 Sampling Distribution of p̂ When the
Sample Size is 3
Mean and Standard Deviation of p̂

Mean of the Sample Proportion

The mean of the sample proportion, p̂ , is


denoted by  p̂ and is equal to the population
proportion, p. Thus,

 pˆ  p
Mean and Standard Deviation of p̂
Standard Deviation of the Sample Proportion

The standard deviation of the sample proportion, p̂ , is


denoted by  p̂ and is given by the formula
pq
 pˆ 
n
where p is the population proportion, q = 1 – p , and n is the
sample size. This formula is used when n/N ≤ 0.05, where N is
the population size.
Shape of the Sampling Distribution ofp̂

Central Limit Theorem for Sample Proportion

According to the central limit theorem, the sampling


distribution of p̂ is approximately normal for a sufficiently
large sample size. In the case of proportion, the sample
size is considered to be sufficiently large if np and nq are
both greater than 5 that is, if

np > 5 and nq >5


Example 7-9

According to a survey by Harris Interactive conducted in


February 2009 for the charitable agency World Vision,
56% of U.S. teens volunteer time for charitable causes.
Assume that this result is true for the current population
of all U.S. teens. Letp̂ be the proportion of U.S. teens in
a random sample of 1500 who volunteer time for

charitable causes. Find the mean and standard deviation
of and describe the shape of its sampling distribution.
Example 7-9: Solution

p  .56 and q  1  p  1  .56  .44


 pˆ  p  .56
pq (.56)(.44)
 pˆ    .0128
n 1500
np  1500(.56)  840 and nq  1500(.44)  660
Example 7-9: Solution

np and nq are both greater than 5.

Therefore, the sampling distribution of p̂


is approximately normal (by the central limit
theorem) with a mean of 0.56 and a standard
deviation of 0.0128, as shown in Figure 7.15.
Figure 7.15
Applications of the Sampling Distribution of p̂
Example 7-10
According to the BBMG Conscious Consumer Report, 51% of
the adults surveyed said that they are willing to pay more for
products with social and environmental benefits despite the
current tough economic times (USA TODAY, June 8, 2009).
Suppose that this result is true for the current population of adult
Americans. Let p̂ be the proportion in a random sample of
1050 adult Americans who will hold the said opinion. Find the
probability that the value of p̂ is between 0.53 and 0.55.
Example 7-10: Solution

n =1050, p = 0.51, and q = 1 – p = 1 - 0.51 = 0.49

pq (.51)(.49)
  p  .51 and  pˆ 

p   .0154725
n 1050
np  1050(.51)  535.5  5 and nq  1050(.49)  514.5  5

We can infer from the central limit theorem that the


sampling distribution ofp̂ is approximately normal.
Figure 7.16 P (.53  pˆ  .55)
z Value for a Value of p̂

The z value for a value of p̂ is calculated as


pp
z 
 p
Example 7-10: Solution
.53  .51
 For p̂ = 0.53: z  .01542725  1.30

.55  .51
 For p̂ = 0.55: z  2.59
.01542725

 P(.53 < p̂ < .55) = P(1.30 < z < 2.59)


= 0.9952 - 0.9032
= 0.0920
Example 7-10: Solution

Thus, the probability is .0920 that the


proportion of U.S. adults in a random sample of
1050 who will be willing to pay more for
products with social and environmental benefits
despite the current tough economic times is
between .53 and .55.
Figure 7.17 P (.53  pˆ  .55)
Example 7-11

Maureen Webster, who is running for mayor in a


large city, claims that she is favored by 53% of all
eligible voters of that city. Assume that this claim is
true. What is the probability that in a random sample
of 400 registered voters taken from this city, less
than 49% will favor Maureen Webster?
Example 7-11: Solution

n =400, p = 0.53, and q = 1 – p = 1 - 0.53 = 0.47

pq (.53)(.47)
  p  .53 and  pˆ 

p   .02495496
n 400
Example 7-11: Solution
.49  .53
z  1.60
.02495496

P( p̂ < 0.49) = P(z < -1.60)


= 0.0548
Hence, the probability that less than 49% of the voters
in a random sample of 400 will favor Maureen
Webster is .0548.
Figure 7.18 P ( pˆ  .49)

You might also like