Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

12/12/21, 9:51 PM 5.

2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

5.2: Sampling Distribution of x ̅


As already stated, we wish to use a sample mean x̄ to make inferences about the population mean
μ. Repeating the sampling process allows us to develop a sampling distribution for all possible
values of the sample mean x̄ . This distribution is called the sampling distribution of the sample
mean x̄ .

We will now consider some properties of the sampling distribution of x̄ , including the expected value
of x̄ , the standard deviation of x̄ , and the shape of the sampling distribution itself. The knowledge we
gain in this section will provide a basis for making probability statements about the error involved
when using x̄ to estimate μ.

Expected Value of x̄
Suppose you have a large population of potatoes that have a mean weight of 8 ounces and a
standard deviation of 1 ounce. Furthermore, the weight of the potatoes is normally distributed.
Suppose you take a sample of 10 potatoes and find the mean weight of the 10. Suppose you take
another sample of 10 potatoes and find the mean weight of those is 10. If you continue to do this, you
might find something like this:

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 1/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

You continue to do this until you have found the mean weight every possible combination of 10
potatoes. Now, suppose you calculate the mean of these means. You will find that this mean of the
means is the same as the mean weight of all of the potatoes:

We would have found the same thing if we had done this experiment with bags of four potatoes per
bag or 6 potatoes per bag or 100 potatoes per bag. We will get the same result for any bag size of n
potatoes.

We can formally state this result:


https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 2/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

Suppose that the population is normally distributed. Then for any sample size n:

E(x̄ ) = μ or µx̄ = μ

where:

E(x̄ ) or µx̄ is the expected value of the random sample value x̄

μ = the population mean

In statisticians' terms, the formula above allows us to state that x̄ is an unbiased estimator of μ.

Standard Deviation of x̄
As with every other sampling distribution, the sampling distribution of x̄ has a standard deviation.

Let the following terms be defined for use in calculating the standard deviation of the sampling
distribution of x̄ .

For simple random sampling, the standard deviation is calculated differently depending upon whether
the population is finite or infinite. The following are the formulas for calculating the standard deviation
for each type of population:

These two equations are very similar except that the finite population equation has the expression:

This expression is referred to as the finite population correction factor. If the population is not large
compared to the sample size, this correction factor will be much less than 1 and have the effect of
reducing the size of σx̄ . If the population is very large compared to the sample size, the finite
population correction factor will be very close to 1 (see Table 5.1). Therefore, whenever the
population is very large compared to the sample size, the finite population equation will give
approximately the same value as the infinite population equation.

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 3/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

Table 5.1

Consider, Table 5.1. Notice that whenever the population is relatively small and the sample size
increases, the finite population correction factor decreases rapidly in value. If the sample size
happens to match the size of the population, the correction factor will be zero and the sample
standard deviation would be zero. This should be expected because if the sample size is the same
as the size of the population, then the sample is the entire population!

On the other hand, notice that as the population grows larger, the sample size has less of an effect
on the formula. Note that if you know the population size, you may use the finite population equation
whenever you like. However, we must be more careful when using the infinite population equation.
The following guidelines may be used to determine if we may use the infinite population formula.

We may use the infinite population formula if you have one of these three conditions:

1) The population is infinite.

2) You do not know the actual population size, but you are told that it is “large.”

3) The population size, N, is known and the sample size is less than or equal to 5% of the
population size: 

The following example utilizes the third condition.

Example 5.1 Company A has 400 employees. In order to use the infinite standard deviation formula,
the sample size would be:

Example 5.2: Company B has 6,000 employees. In order to use the infinite standard deviation
formula, the sample size would be:
https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 4/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

The standard deviation of the sampling distribution, σx̄ will help determine how far the sample mean
is from the population mean. In this context, we will often refer to σx̄ as the standard error of the
mean.

Now, let us calculate some sample means and sample standard deviations.

Example 5.3. Suppose you have a large population of potatoes with a mean weight of 8 ounces and
a standard deviation of 1 ounce. You choose all possible combinations of 15 potatoes and find the
mean weight of each sample. What is the expected value and standard deviation of all of these
samples?

Solution. We have sample sizes of 15. So, n = 15. We know:

E(x̄ ) = µx̄ = 8 ounces

Since the sample is large, we will use the standard deviation equation for an infinite population:

Central Limit Theorem


Now, we wish to determine the form of the probability distribution of x̄ . The probability distribution can
either be:

1) one which the population distribution is unknown or

2) one which the population distribution is known to be normally distributed.

When the population distribution is unknown, we use one of the most important theorems in
statistics: the central limit theorem. The central limit theorem states that in selecting random
samples of size n from a population, the sampling distribution of the sample mean x̄ can be
approximated by a normal probability distribution as long as the sample size is large enough.

The following illustration compares three different distributions to the normal distribution and shows
how the central limit theorem works.

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 5/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

Graph 5.1

The first column represents the normal distribution, the second is a uniform distribution, the third is
an exponential distribution, and the last is called a parabolic distribution. In each distribution, the area
under the line is equal to 1.

Now let's look at each row and consider the impact that sample size has on the sample distribution:

• The first row of the picture shows the populations distributions.

• The second row shows the sampling distributions of x when n = 2. Note that each sample
distribution looks very different from its related population distribution.

• The third row shows the sampling distribution of x when n = 5. Note that each sample
distribution is taking on a bell-shaped appearance.

• The fourth row shows the sampling distribution of x when n = 30. Note that each of the three
sampling distributions is approximately the same as the normal probability distribution.

This example illustrates that for sufficiently large sample sizes, the sampling distribution of x̄ can be
approximated by the normal probability distribution.

This leads to the question: How large of a sample size do we need in order to assume that the
sampling distribution is normally distributed? We may assume that the sampling distribution is
normally distributed:

whenever the population is normally distributed (i.e., is mound shaped and symmetrical), no
matter how small the sample size.

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 6/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

if the population distribution is non-normal, larger sample sizes are needed. The general
statistical practice is to assume that for most distributions, the sampling distribution of x̄ can be
approximated by a normal probability distribution if the sample size is 30 or more.

In essence, we can assume a sample size of n > 30 will satisfy the large-sample condition of the
central limit theorem. We may state this formally:

The sampling distribution of x̄ can be approximated by a normal probability distribution


whenever the sample size is greater than 30, i.e., n > 30.

The central limit theorem is the key in identifying the form of the sample distribution of x̄ whenever
the population distribution is unknown.

Let's also restate the first bullet point above for emphasis:

Whenever the population is known or assumed to have a normal probability distribution, the
sampling distribution of x̄ is a normal probability distribution for any sample size.

Practical Value of the Sampling Distribution of x̄


At this point, we may ask why we're so interested in the sampling distribution. The answer is as
follows: We will usually be selecting a sample from the population. We will use this sample to
represent the entire population. We will study this sample to learn things about the entire population.
Therefore, the more that we know about the sampling distribution, the better we can understand the
relationship between the characteristics of the sample and the characteristics of the population.

In particular, we will use the mean of the sample, x̄ , to help us estimate the population mean, μ. We
don’t expect the mean of the sample, x̄ , to be equal to the population mean μ. We will expect some
difference between the sample mean and the population mean. Therefore, knowledge about the
sampling distribution will help us to estimate the error that we might expect whenever we use the
sample mean to estimate the population mean.

This error is called the sampling error (sometime written as SE). The sampling error is the absolute
value of the difference between the value of the sample mean x̄ and the value of the population
mean μ. Mathematically, it is written as:

We will also use the z-score in combination with the sampling distribution in order to determine the
probability that a specific sample will have a mean within a certain distance of the population mean.
This is done in the next example.

Example 5.4. Suppose a pharmaceutical company wants to do a study of the commissions of its
sales force. Let's assume that there are 4,300 sales people and the population mean for the sales
force is $52,400 in commissions and has a population standard deviation of $3,500.
https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 7/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

The company will do the study by taking a random sample of 30 sales people. The company would
like the sample to fairly represent the entire sales force. So, a researcher at the company has
determined that a sample will be a fair representation of the entire sales force if the mean of the
sample is within $500 of the mean of the entire sales force. Therefore, what is the probability that a
simple random sample of 30 members of the sales force will have a mean commission within $500 of
the population mean?

Solution.

Here are the basic properties of the sampling distribution of x̄ :

Graph 5.2

We calculate the standard deviation of the sample distribution using the standard deviation of the x̄
formula we learned earlier in this module.

We now have the necessary information needed to determine the probability that the sample mean x̄
will be between $51,900 and $52,900 (this is the range that is within $500 of the population mean μ
of $52,400).

To calculate the probability in question, we need to use the z-score. Recall from the last module, the
formula for calculating the z-score is:

We are going to adapt it a little to suit our current problem:

Using this formula, we will calculate the two z-scores we will use to answer our question.
https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 8/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

We want the following probability:

Now we know that there is a 0.5646 probability that a simple random sample of 30 members of the
sales force will provide a sample mean that is within the $500 of the sample mean. Conversely, there
is a 1 - .5646 or .4354 probability the sample mean will not be within the $500 range.

The following graph visually represents the previous example:

Graph 5.3

In essence, our researcher has a little better than 55 / 45 chance of selecting a successful sample. In
the next section, you will explore how our researcher could improve the odds of selecting a
successful sample.

Relationship between the Sample Size and the Sampling


Distribution of x̄
In the last section, we found that if the researcher desired a random simple sample having a sample
mean x̄ within $500 of the population mean, a sample size of n = 30 would provide little better than a
https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 9/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

0.56 probability of being in the desired range. We will now see how the sample size impacts this
probability.

Before moving forward, let's take a moment to review a couple of things:

1. Remember that the E(x̄ ) = μ regardless of the sample size. Therefore, it is accepted that the
mean of all possible values of x̄ is equal to the population mean regardless of sample size.
2. Now, let's look again at the standard error of the mean formula:

Notice that the standard error of the mean is a function square root of the sample size. Specifically,
as the sample size increases, the standard error decreases. The table below explains this principle:

Table 5.2

From the previous section, we know the standard error of the mean for n = 30 was 639.01:

From the table above, we see that the standard error of the mean when n = 100 is 350:

Note that with n =100, the sampling distribution has a smaller standard error. This means that the
values of x̄ , with n = 100, will have less variation and will tend to be closer to the population mean
than the values of x̄ with n = 30.

The graph below compares the sampling distribution of x̄ when n = 30 and n = 100.

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 10/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

Graph 5.4

We can use the sampling distribution of x̄ when n = 100 to calculate the probability that a simple
random sample of 100 sales people will provide a mean commission value that is within $500 of the
population mean. Since the sampling distribution is a normal probability distribution with a mean of
52,400 and a standard deviation of 350, we can again use the standard normal distribution table to
find the area of probability. At x̄ = 52,400, we get the following z-scores for x̄ - 500 and x̄ + 500:

The following graph shows the area:

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 11/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

So, by increasing the sample size from n = 30 to n = 100 the probability of selecting a sample
distribution of x̄ that is within $500 of the population mean has increased from 0.56 to 0.84.

This example shows that as the sample size increases, the standard error of the mean is decreased.
Remember, variation in the sample distribution of x̄ decreases as the number in the sample
increases; thus, the larger sample size provides a higher probability that the value of the sample
mean will be within a specified range from the population mean.

In the previous commission example, the population mean and population standard deviation were
given. In most cases, however, the population mean and population standard deviation will be
unknown. In module 6, we will use the sample mean and the sample standard deviation to estimate
the population mean.

Practice Problems:
1. Describe, in your own words the sampling distribution of x̄ .

2. For the sampling distribution of x̄ , is it necessary that the samples used in the distribution be all of
the same size?

3. What is E(x̄ )? What is E(x̄ ) equal to?

4. Suppose you take a random sample and find the sample mean, x̄ . Based on the previous problem,
can you say that x̄ is equal to the mean of the entire population µ?

5. A nursing home has 2000 patients. You sample 170 patients. Can you use the infinite standard
deviation formula?

6. Suppose that you take a sample of size 40 from a population that is normally distributed. Can the
sampling distribution of x̄ be approximated by a normal probability distribution? What if the population

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 12/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

was not normally distributed, can the sampling distribution of x̄ be approximated by a normal
probability distribution?

7.  Suppose that in a certain large city, the mean 100 meter dash time for male high school seniors
was reported to be 13.3 seconds with a standard deviation of 1.9 seconds. We may assume that the
population is normally distributed.

a) What is the probability that 20 male high school seniors selected at random will have a mean
100 meter dash time of 12.5 seconds or less?

b) What is the probability that 40 male high school seniors selected at random will have a mean
100 meter dash time of 12.5 seconds or less?

c) What is the probability that 35 male high school seniors selected at random will have a mean
100 meter dash time between 13 seconds and 14 seconds?

Answer Key:
1. This is the sampling distribution of all possible values of the sample mean, x̄ , computed from a
sample of size n. It is important that the same sample size, n, be used for all of the samples.

2. It is important that all of the samples be of the same size, n.

3. E(x̄ ) is the expected value of x̄ . E(x̄ ) is equal to the population mean, µ.

4. The random sample mean, x̄ is not, in general, equal to the population mean, µ. x̄ only estimates
µ. The previous problem tells us that the expected value of x̄ is equal to µ. It does not say that the
sample mean of one sample is equal to µ.

5. In order to use infinite standard deviation formula, we must have:

We have:

So, our value is greater than .05. We cannot use the infinite standard deviation formula in this case.

 6. Since the sample size is greater than 30, the sampling distribution of x̄ can be approximated by a
normal probability distribution in both cases (both for normally distributed populations and for
populations that are not normally distributed).
https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 13/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

7. 

a) We calculate the standard deviation of the sample distribution:

Calculate the z-score:

So, we want to find P(Z < -1.88) on the standard normal probability distribution table. We find that,

P(Z < -1.88) = .03005.

Therefore, there is a 0.3005 probability that a simple random sample of 20 male high school seniors
will run a 100 meter dash with a mean faster than 12.5 seconds. (In other words, there is a 0.3005
probability that these seniors will have a mean time of 12.5 seconds or less.)

b) We calculate the standard deviation of the sample distribution:

Calculate the z-score:

So, we want to find P(Z < -2.66) on the standard normal probability distribution table. We find that,

P(Z < -2.66) = .00391.

Therefore, there is a 0.00391 probability that a simple random sample of 40 male high school seniors
will run a 100 meter dash with a mean faster than 12.5 seconds.

c) We calculate the standard deviation of the sample distribution:

Calculate the z-score for 13 seconds:

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 14/15
12/12/21, 9:51 PM 5.2: Sampling Distribution of x ̅: Introduction to Statistics-2021- Lagios

Calculate the z-score for 14 seconds:

So, we want to find P(-.93< Z < 2.18) on the standard normal probability distribution table. Recall that

P(-.93< Z < 2.18)=P(Z<2.18)-P(Z<-.93)=.98537-.17619=.80918.

Therefore, there is a 0.80918 probability that a simple random sample of 35 male high school seniors
will run a 100 meter dash between 13 and 14 seconds.

https://portagelearning.instructure.com/courses/944/pages/5-dot-2-sampling-distribution-of-x?module_item_id=90856 15/15

You might also like