Business Statistics: Fourth Canadian Edition

Business Statistics
Fourth Canadian Edition
Chapter 10
Sampling Distributions
Copyright © 2021 Pearson Canada Inc.

Ch. 10: Sampling Distribution
Learning Objectives
1) Understand how variations among multiple samples can
be represented in a sampling distribution
2) Calculate the sampling distribution (mean and variance)
of a proportion
3) Calculate the sampling distribution (mean and variance)
of a mean

10.1 Modeling Sample Proportions (1 of 3)
To learn more about the variability, we have to imagine.
We probably will never know the value of the true proportion
of an event in the population. But it is important to us, so
we’ll give it a label, p for “true proportion.”
Imagine
We see only the sample we actually
drew, but if we imagine the results of
all the other possible samples we
could have drawn (by modelling or
simulating them), we can learn more.

A simulation is when we use a computer to pretend to draw
random samples from some population of values over and
over
A simulation can help us understand how sample proportions
vary due to random sampling

When we have only two possible outcomes for an event,
label one of them “success” and the other “failure”
In a simulation, we set the true proportion of successes to a
known value, draw random samples, and then record the
sample proportion of successes, which we denote by p̂, for
each sample
Even though the p̂ ’s vary from sample to sample, they do so
in a way that we can model and understand

10.2 The Sampling Distribution for
Proportions (1 of 9)
The distribution of proportions over many independent
samples from the same population is called the sampling
distribution of the proportions
For distributions that are bell-shaped and centered at the
true proportion, p, we can use the sample size n to find the
standard deviation of the sampling distribution:
p  1 p pq
SD( pˆ )  
n n

Remember that the difference between sample proportions,
referred to as sampling error is not really an error. It’s just
the variability you’d expect to see from one sample to
another. A better term might be sampling variability.
We have now answered the question

raised at the start of the chapter. To
discover how variable a sample
proportion is, we need to know the
true proportion and the size of the
sample. That’s all.

 pq 
The particular Normal model, N  p, , is a sampling
 n 
distribution model for the sample proportion.
It won’t work for all situations, but it works for most situations
that you’ll encounter in practice
Effect of Sample Size

Because n is in the denominator of SD(p̂),
the larger the sample, the smaller the
standard deviation. We need a small
standard deviation to make sound
business decisions, but larger samples
cost more. That tension is a fundamental
issue in Statistics.
The Sampling Distribution Model for a Proportion
Provided that the sampled values are independent and the sample size
is large enough, the sampling distribution of p̂ is modelled by a Normal
pq
model with mean  ( ˆ
p )  p and standard deviation SD( ˆ
p )  .
n
In the above equation, n is the sample size and q is the proportion of

failures (q = 1 – p). (We use q̂ for its observed value in a sample.)

The sampling distribution model for p̂ is valuable because…
• we don’t need to actually draw many samples and
accumulate all those sample proportions, or even to
simulate them and because…
• we can calculate what fraction of the distribution will be
found in any region

How Good Is the Normal Model?
Samples of size 1 or 2 just aren’t going to work very well, but
the distributions of proportions of many larger samples have
histograms that are remarkably close to a Normal model

Assumptions and Conditions
Independence Assumption: The sampled values must be
independent of each other
Sample Size Assumption: The sample size, n, must be large
enough

Randomization Condition: If your data come from an
experiment, subjects should have been randomly assigned
to treatments
If you have a survey, your sample should be a simple
random sample of the population
If some other sampling design was used, be sure the
sampling method was not biased and that the data are
representative of the population

10% Condition: If sampling has not been made with
replacement, then the sample size, n, must be no larger than
10% of the population.
Success/Failure Condition: The sample size must be big
enough so that both the number of “successes,” np, and the
number of “failures,” nq, are expected to be at least 10.

10.3 The Central Limit Theorem—The
Fundamental Theorem of Statistics (1 of 5)
Simulating the Sampling Distribution of a Mean
Here are the results of a simulated 10,000 tosses of one fair
die:
Figure 10.3 Simple die toss.
This is called the uniform distribution.

Simulating the Sampling Distribution of a Mean
Here are the results of a simulated 10,000 tosses of two fair
dice, averaging the numbers:
Figure 10.4 Two-dice average.
This is called the triangular distribution.

Here’s a histogram of the averages for 10,000 tosses of five dice:
As the sample size (number of dice) gets larger, each sample average
tends to become closer to the population mean
Figure 10.5 Three-dice average.
The shape of the distribution is becoming bell-shaped. In fact, it’s

approaching the Normal model
The Central Limit Theorem
Central Limit Theorem (CLT): The sampling distribution of
any mean becomes Normal as the sample size grows
This is true regardless of the shape of the population
distribution!
However, if the population distribution is very skewed, it may
take a sample size of dozens or even hundreds of
observations for the Normal model to work well

Now we have two distributions
to deal with: the real-world
The Central Limit Theorem (CLT) distribution of the sample, and
The mean of a random sample has
the math-world sampling
a sampling distribution whose distribution of the statistic.
shape can be approximated by Don’t confuse the two
a Normal model. The larger The Central Limit Theorem
the sample, the better the doesn’t talk about the
approximation will be. distribution of the data from the
sample. It talks about the
sample means and sample
proportions of many different
random samples drawn from
the same population
10.4 The Sampling Distribution of the Mean
(1 of 11)
Which would be more surprising, having one person in your

Statistics class who is over two metres tall or having the
mean of 100 students taking the course be over two metres?
The first event is fairly rare, but finding a class of 100 whose
mean height is over two metres tall just won’t happen
Means have smaller standard deviations than individuals

(2 of 11)
The Normal model for the sampling distribution of the mean

has a standard deviation equal to SD  y   σ where σ is the
standard deviation of the population n
To emphasize that this is a standard deviation parameter of

the sampling distribution model for the sample mean, y̅, we
write SD ( y ) or  (y )

(3 of 11)
The Sampling Distribution Model for a Mean

When a random sample is drawn from any population with mean
 and standard deviation σ, its sample mean, y̅, has a sampling
distribution with the same mean  but whose standard deviation
is
   
 and we write  ( y )  SD ( y )  .
n n
No matter what population the random sample comes from,

the shape of the sampling distribution is approximately Normal
as long as the sample size is large enough. The larger the
sample used, the more closely the Normal approximates the
sampling distribution model for the mean.

(4 of 11)
We now have two closely related sampling distribution

models. Which one we use depends on which kind of data
we have.
• When we have categorical data, we calculate a sample

proportion, p̂. Its sampling distribution follows a Normal
model with a mean at the population proportion, p, and a
standard deviation
p  1  p pq
SD( pˆ )  
n n

(5 of 11)
• When we have quantitative data, we calculate a sample
mean, y̅ . Its sampling distribution has a Normal model with


a mean at the population mean, μ, and a standard 
SD y  n
.
deviation

(6 of 11)

Independence Assumption: The sampled values must be
independent of each other
Randomization Condition: The data values must be sampled
randomly, or the concept of a sampling distribution makes no
sense

(7 of 11)
10% Condition: When the sample is drawn without

replacement, the sample size, n, should be no more than
10% of the population
Large Enough Sample Condition: If the population is
unimodal and symmetric, even a fairly small sample is okay.
For highly skewed distributions, it may require samples of
several hundred for the sampling distribution of means to be
approximately Normal. Always plot the data to check

(8 of 11)
Sample Size - Diminishing Returns

The standard deviation of the sampling distribution declines
only with the square root of the sample size
The square root limits how much we can make a sample tell
about the population. This is an example of something that’s
known as the Law of Diminishing Returns

(9 of 11)
Diminishing Returns
Example: The mean weight of boxes shipped by a company
is 12 kg, with a standard deviation of 4 kg. Boxes are
shipped in pallets of 10 boxes. The shipper has a limit of 150
kg for such shipments. What’s the probability that a palette
will exceed that limit?
Asking the probability that the total weight of a sample of 10
boxes exceeds 150 kg is the same as asking the probability
that the mean weight exceeds 15 kg.

(10 of 11)
Example (continued): First we’ll check the conditions.

We will assume that the 10 boxes on the pallet are a random
sample from the population of boxes and that their weights
are mutually independent.
And 10 boxes is surely less than 10% of the population of
boxes shipped by the company.

(11 of 11)
Example (continued): Under these conditions, the CLT says that the
sampling distribution of y̅ has a Normal model with mean 12 and
standard deviation
 4 y  15  12
SD  y     1.26 and z    2.38.
n 10 SD  y  1.26
P  y  150  P  z  2.38  0.0087
So the chance that the shipper will reject a palette is only .0087—less
than 1%. That’s probably good enough for the company.

10.5 Standard Error (1 of 7)
Standard Error
Whenever we estimate the standard deviation of a sampling
distribution, we call it a standard error (SE)
For a sample proportion, p̂, the standard error is:
ˆˆ
 
pq
SE pˆ
n
For the sample mean, y̅ , the standard error is:
SE y  s
n
The proportion and the mean are random quantities. We
can’t know what our statistic will be because it comes from a
random sample.
The two basic truths about sampling distributions are:
1) Sampling distributions arise because samples vary
2) Although we can always simulate a sampling distribution,
the Central Limit Theorem saves us the trouble for
means and proportions

To keep track of how the concepts we’ve seen combine, we
can draw a diagram relating them.
We start with a population model, and label the mean of this
model μ and its standard deviation, σ
We draw one real sample (solid line) of size n and show its
histogram and summary statistics. We imagine many other
samples (dotted lines)
We imagine gathering all the means into a histogram.



The CLT tells us we can model the shape of this histogram
with a Normal model. The mean of this Normal is μ, and the
standard deviation is SD  y   σ .
n

When we don’t know σ, we estimate it with the standard
deviation of the one real sample. That gives us the standard
error, SE  y   s
n

What Can Go Wrong?
• Don’t confuse the sampling distribution with the distribution
of the sample
• Beware of observations that are not independent
• Watch out for small samples when dealing with
proportions.
• Watch out for small samples from skewed populations
when dealing with means.

What Have We Learned? (1 of 3)
• We know that no sample fully and exactly describes the
population; sample proportions and means will vary from
sample to sample – this is called sampling variability
• We’ve learned that sampling variability is not just
unavoidable—it’s predictable!

• We’ve learned how to describe the behaviour of sample
proportions – shape, centre, and spread as long as certain
conditions are met:
• If the sample is random and large enough that we expect
at least 10 successes and 10 failures, then:
– The sampling distribution is shaped like a Normal model
– The mean of the sampling model is the true proportion in the
population
pq
– The standard deviation of the sample proportions is n .

• We’ve learned to describe the behavior of sample means
as well, also based on the Central Limit Theorem.
• If the sample is random and large enough (especially if our
data come from a population that’s not roughly unimodal
and symmetric), then:
– The shape of the distribution of the means of all possible
samples can be described by a Normal model
– The center of the sampling model will be the true mean of
the population

– The standard deviation of the sample means is n

Business Statistics: Fourth Canadian Edition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Statistics: Fourth Canadian Edition

Uploaded by

Copyright:

Available Formats

Business Statistics

Fourth Canadian Edition

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

We have now answered the question

Copyright © 2021 Pearson Canada Inc.

Effect of Sample Size

In the above equation, n is the sample size and q is the proportion of

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Figure 10.3 Simple die toss.

This is called the uniform distribution.

Figure 10.4 Two-dice average.

This is called the triangular distribution.

Figure 10.5 Three-dice average.

The shape of the distribution is becoming bell-shaped. In fact, it’s

Copyright © 2021 Pearson Canada Inc.

Which would be more surprising, having one person in your

Copyright © 2021 Pearson Canada Inc.

The Normal model for the sampling distribution of the mean

To emphasize that this is a standard deviation parameter of

Copyright © 2021 Pearson Canada Inc.

The Sampling Distribution Model for a Mean

No matter what population the random sample comes from,

Copyright © 2021 Pearson Canada Inc.

We now have two closely related sampling distribution

• When we have categorical data, we calculate a sample

Copyright © 2021 Pearson Canada Inc.

• When we have quantitative data, we calculate a sample

mean, y̅ . Its sampling distribution has a Normal model with

Copyright © 2021 Pearson Canada Inc.

Assumptions and Conditions

Copyright © 2021 Pearson Canada Inc.

10% Condition: When the sample is drawn without

Copyright © 2021 Pearson Canada Inc.

Sample Size - Diminishing Returns

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Example (continued): First we’ll check the conditions.

Copyright © 2021 Pearson Canada Inc.

P  y  150  P  z  2.38  0.0087

Copyright © 2021 Pearson Canada Inc.

For the sample mean, y̅ , the standard error is:

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

Copyright © 2021 Pearson Canada Inc.

You might also like