Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Business Statistics

Fourth Canadian Edition

Chapter 10
Sampling Distributions

Copyright © 2021 Pearson Canada Inc.


Ch. 10: Sampling Distribution
Learning Objectives
1) Understand how variations among multiple samples can
be represented in a sampling distribution
2) Calculate the sampling distribution (mean and variance)
of a proportion
3) Calculate the sampling distribution (mean and variance)
of a mean

Copyright © 2021 Pearson Canada Inc.


10.1 Modeling Sample Proportions (1 of 3)
To learn more about the variability, we have to imagine.
We probably will never know the value of the true proportion
of an event in the population. But it is important to us, so
we’ll give it a label, p for “true proportion.”

Imagine
We see only the sample we actually
drew, but if we imagine the results of
all the other possible samples we
could have drawn (by modelling or
simulating them), we can learn more.

Copyright © 2021 Pearson Canada Inc.


10.1 Modeling Sample Proportions (2 of 3)
A simulation is when we use a computer to pretend to draw
random samples from some population of values over and
over
A simulation can help us understand how sample proportions
vary due to random sampling

Copyright © 2021 Pearson Canada Inc.


10.1 Modeling Sample Proportions (3 of 3)
When we have only two possible outcomes for an event,
label one of them “success” and the other “failure”
In a simulation, we set the true proportion of successes to a
known value, draw random samples, and then record the
sample proportion of successes, which we denote by p̂, for
each sample
Even though the p̂ ’s vary from sample to sample, they do so
in a way that we can model and understand

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (1 of 9)
The distribution of proportions over many independent
samples from the same population is called the sampling
distribution of the proportions
For distributions that are bell-shaped and centered at the
true proportion, p, we can use the sample size n to find the
standard deviation of the sampling distribution:

p  1 p pq
SD( pˆ )  
n n

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (2 of 9)
Remember that the difference between sample proportions,
referred to as sampling error is not really an error. It’s just
the variability you’d expect to see from one sample to
another. A better term might be sampling variability.

We have now answered the question


raised at the start of the chapter. To
discover how variable a sample
proportion is, we need to know the
true proportion and the size of the
sample. That’s all.

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (3 of 9)
 pq 
The particular Normal model, N  p, , is a sampling
 n 
distribution model for the sample proportion.
It won’t work for all situations, but it works for most situations
that you’ll encounter in practice

Effect of Sample Size


Because n is in the denominator of SD(p̂),
the larger the sample, the smaller the
standard deviation. We need a small
standard deviation to make sound
business decisions, but larger samples
cost more. That tension is a fundamental
issue in Statistics.
Copyright © 2021 Pearson Canada Inc.
10.2 The Sampling Distribution for
Proportions (4 of 9)
The Sampling Distribution Model for a Proportion

Provided that the sampled values are independent and the sample size
is large enough, the sampling distribution of p̂ is modelled by a Normal
pq
model with mean  ( ˆ
p )  p and standard deviation SD( ˆ
p )  .
n

In the above equation, n is the sample size and q is the proportion of


failures (q = 1 – p). (We use q̂ for its observed value in a sample.)

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (5 of 9)
The sampling distribution model for p̂ is valuable because…
• we don’t need to actually draw many samples and
accumulate all those sample proportions, or even to
simulate them and because…
• we can calculate what fraction of the distribution will be
found in any region

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (6 of 9)
How Good Is the Normal Model?
Samples of size 1 or 2 just aren’t going to work very well, but
the distributions of proportions of many larger samples have
histograms that are remarkably close to a Normal model

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (7 of 9)
Assumptions and Conditions
Independence Assumption: The sampled values must be
independent of each other
Sample Size Assumption: The sample size, n, must be large
enough

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (8 of 9)
Assumptions and Conditions
Randomization Condition: If your data come from an
experiment, subjects should have been randomly assigned
to treatments
If you have a survey, your sample should be a simple
random sample of the population
If some other sampling design was used, be sure the
sampling method was not biased and that the data are
representative of the population

Copyright © 2021 Pearson Canada Inc.


10.2 The Sampling Distribution for
Proportions (9 of 9)
Assumptions and Conditions
10% Condition: If sampling has not been made with
replacement, then the sample size, n, must be no larger than
10% of the population.
Success/Failure Condition: The sample size must be big
enough so that both the number of “successes,” np, and the
number of “failures,” nq, are expected to be at least 10.

Copyright © 2021 Pearson Canada Inc.


10.3 The Central Limit Theorem—The
Fundamental Theorem of Statistics (1 of 5)
Simulating the Sampling Distribution of a Mean
Here are the results of a simulated 10,000 tosses of one fair
die:

Figure 10.3 Simple die toss.

This is called the uniform distribution.


Copyright © 2021 Pearson Canada Inc.
10.3 The Central Limit Theorem—The
Fundamental Theorem of Statistics (2 of 5)
Simulating the Sampling Distribution of a Mean
Here are the results of a simulated 10,000 tosses of two fair
dice, averaging the numbers:

Figure 10.4 Two-dice average.

This is called the triangular distribution.


Copyright © 2021 Pearson Canada Inc.
10.3 The Central Limit Theorem—The
Fundamental Theorem of Statistics (3 of 5)
Here’s a histogram of the averages for 10,000 tosses of five dice:
As the sample size (number of dice) gets larger, each sample average
tends to become closer to the population mean

Figure 10.5 Three-dice average.

The shape of the distribution is becoming bell-shaped. In fact, it’s


approaching the Normal model
Copyright © 2021 Pearson Canada Inc.
10.3 The Central Limit Theorem—The
Fundamental Theorem of Statistics (4 of 5)
The Central Limit Theorem
Central Limit Theorem (CLT): The sampling distribution of
any mean becomes Normal as the sample size grows
This is true regardless of the shape of the population
distribution!
However, if the population distribution is very skewed, it may
take a sample size of dozens or even hundreds of
observations for the Normal model to work well

Copyright © 2021 Pearson Canada Inc.


10.3 The Central Limit Theorem—The
Fundamental Theorem of Statistics (5 of 5)
Now we have two distributions
to deal with: the real-world
The Central Limit Theorem (CLT) distribution of the sample, and
The mean of a random sample has
the math-world sampling
a sampling distribution whose distribution of the statistic.
shape can be approximated by Don’t confuse the two
a Normal model. The larger The Central Limit Theorem
the sample, the better the doesn’t talk about the
approximation will be. distribution of the data from the
sample. It talks about the
sample means and sample
proportions of many different
random samples drawn from
the same population
Copyright © 2021 Pearson Canada Inc.
10.4 The Sampling Distribution of the Mean
(1 of 11)

Which would be more surprising, having one person in your


Statistics class who is over two metres tall or having the
mean of 100 students taking the course be over two metres?
The first event is fairly rare, but finding a class of 100 whose
mean height is over two metres tall just won’t happen
Means have smaller standard deviations than individuals

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(2 of 11)

The Normal model for the sampling distribution of the mean


has a standard deviation equal to SD  y   σ where σ is the
standard deviation of the population n

To emphasize that this is a standard deviation parameter of


the sampling distribution model for the sample mean, y̅, we
write SD ( y ) or  (y )

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(3 of 11)

The Sampling Distribution Model for a Mean


When a random sample is drawn from any population with mean
 and standard deviation σ, its sample mean, y̅, has a sampling
distribution with the same mean  but whose standard deviation
is
   
 and we write  ( y )  SD ( y )  .
n n

No matter what population the random sample comes from,


the shape of the sampling distribution is approximately Normal
as long as the sample size is large enough. The larger the
sample used, the more closely the Normal approximates the
sampling distribution model for the mean.

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(4 of 11)

We now have two closely related sampling distribution


models. Which one we use depends on which kind of data
we have.

• When we have categorical data, we calculate a sample


proportion, p̂. Its sampling distribution follows a Normal
model with a mean at the population proportion, p, and a
standard deviation

p  1  p pq
SD( pˆ )  
n n

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(5 of 11)

• When we have quantitative data, we calculate a sample

mean, y̅ . Its sampling distribution has a Normal model with



a mean at the population mean, μ, and a standard 
SD y  n
.

deviation

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(6 of 11)

Assumptions and Conditions


Independence Assumption: The sampled values must be
independent of each other
Randomization Condition: The data values must be sampled
randomly, or the concept of a sampling distribution makes no
sense

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(7 of 11)

10% Condition: When the sample is drawn without


replacement, the sample size, n, should be no more than
10% of the population
Large Enough Sample Condition: If the population is
unimodal and symmetric, even a fairly small sample is okay.
For highly skewed distributions, it may require samples of
several hundred for the sampling distribution of means to be
approximately Normal. Always plot the data to check

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(8 of 11)

Sample Size - Diminishing Returns


The standard deviation of the sampling distribution declines
only with the square root of the sample size
The square root limits how much we can make a sample tell
about the population. This is an example of something that’s
known as the Law of Diminishing Returns

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(9 of 11)

Diminishing Returns
Example: The mean weight of boxes shipped by a company
is 12 kg, with a standard deviation of 4 kg. Boxes are
shipped in pallets of 10 boxes. The shipper has a limit of 150
kg for such shipments. What’s the probability that a palette
will exceed that limit?
Asking the probability that the total weight of a sample of 10
boxes exceeds 150 kg is the same as asking the probability
that the mean weight exceeds 15 kg.

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(10 of 11)

Example (continued): First we’ll check the conditions.


We will assume that the 10 boxes on the pallet are a random
sample from the population of boxes and that their weights
are mutually independent.
And 10 boxes is surely less than 10% of the population of
boxes shipped by the company.

Copyright © 2021 Pearson Canada Inc.


10.4 The Sampling Distribution of the Mean
(11 of 11)

Example (continued): Under these conditions, the CLT says that the
sampling distribution of y̅ has a Normal model with mean 12 and
standard deviation

 4 y  15  12
SD  y     1.26 and z    2.38.
n 10 SD  y  1.26

P  y  150  P  z  2.38  0.0087

So the chance that the shipper will reject a palette is only .0087—less
than 1%. That’s probably good enough for the company.

Copyright © 2021 Pearson Canada Inc.


10.5 Standard Error (1 of 7)
Standard Error
Whenever we estimate the standard deviation of a sampling
distribution, we call it a standard error (SE)
For a sample proportion, p̂, the standard error is:

ˆˆ
 
pq
SE pˆ
n

For the sample mean, y̅ , the standard error is:

SE y  s
n
Copyright © 2021 Pearson Canada Inc.
10.5 Standard Error (2 of 7)
The proportion and the mean are random quantities. We
can’t know what our statistic will be because it comes from a
random sample.
The two basic truths about sampling distributions are:
1) Sampling distributions arise because samples vary
2) Although we can always simulate a sampling distribution,
the Central Limit Theorem saves us the trouble for
means and proportions

Copyright © 2021 Pearson Canada Inc.


10.5 Standard Error (3 of 7)
To keep track of how the concepts we’ve seen combine, we
can draw a diagram relating them.
We start with a population model, and label the mean of this
model μ and its standard deviation, σ
We draw one real sample (solid line) of size n and show its
histogram and summary statistics. We imagine many other
samples (dotted lines)
We imagine gathering all the means into a histogram.

Copyright © 2021 Pearson Canada Inc.


10.5 Standard Error (4 of 7)

Copyright © 2021 Pearson Canada Inc.


10.5 Standard Error (5 of 7)

Copyright © 2021 Pearson Canada Inc.


10.5 Standard Error (6 of 7)
The CLT tells us we can model the shape of this histogram
with a Normal model. The mean of this Normal is μ, and the
standard deviation is SD  y   σ .
n

Copyright © 2021 Pearson Canada Inc.


10.5 Standard Error (7 of 7)
When we don’t know σ, we estimate it with the standard
deviation of the one real sample. That gives us the standard
error, SE  y   s
n

Copyright © 2021 Pearson Canada Inc.


What Can Go Wrong?
• Don’t confuse the sampling distribution with the distribution
of the sample
• Beware of observations that are not independent
• Watch out for small samples when dealing with
proportions.
• Watch out for small samples from skewed populations
when dealing with means.

Copyright © 2021 Pearson Canada Inc.


What Have We Learned? (1 of 3)
• We know that no sample fully and exactly describes the
population; sample proportions and means will vary from
sample to sample – this is called sampling variability
• We’ve learned that sampling variability is not just
unavoidable—it’s predictable!

Copyright © 2021 Pearson Canada Inc.


What Have We Learned? (2 of 3)
• We’ve learned how to describe the behaviour of sample
proportions – shape, centre, and spread as long as certain
conditions are met:
• If the sample is random and large enough that we expect
at least 10 successes and 10 failures, then:
– The sampling distribution is shaped like a Normal model
– The mean of the sampling model is the true proportion in the
population
pq
– The standard deviation of the sample proportions is n .

Copyright © 2021 Pearson Canada Inc.


What Have We Learned? (3 of 3)
• We’ve learned to describe the behavior of sample means
as well, also based on the Central Limit Theorem.
• If the sample is random and large enough (especially if our
data come from a population that’s not roughly unimodal
and symmetric), then:
– The shape of the distribution of the means of all possible
samples can be described by a Normal model
– The center of the sampling model will be the true mean of
the population

– The standard deviation of the sample means is n

Copyright © 2021 Pearson Canada Inc.

You might also like