Module 7 Sampling Distribution

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

MODULE 7

MODULE IN SAMPLING DISTRIBUTION OF THE MEAN

[Discussions on some of the topics in this module are based from my notes when I take my Master of
Statistics in UP Diliman and my PhD in Social Science Research in Leyte Normal University. Others were
based from my personal knowledge and experience when dealing with the situation that requires my
personal decision and from my readings and researches from different references which I also
mentioned and acknowledged in this material. Some authors and books are mention in this material for
the students to take a look for further readings and references.]

In order for you to understand more about this topic, let us have a short review of our previous topics by
solving this problem:

A sample of size n =2 is to be drawn from a finite population of size N = 5 whose elements are the
numbers 1, 3, 5, 7 & 9.

a) How many possible samples can be drawn?


b) Find the mean of each sample.
c) Find the mean of all the sample means.
d) Find the population mean.
e) Find the standard deviation of the sample means.
f) Find the standard deviation of the population.

Solution:

a) Recall from our previous topic in probability that we discuss three types of problems in
probability; the permutation, combination and problems that could be solved using the
theorem. The sample 1 & 3 is just the same as sample 3 % 1 hence this problem is a combination
problem. The total number of samples that could be drawn could be solved using the
combination formula since this type of problem is a combination problem. The total number of
samples is equal to:

!
5C2 = = = 10 and the possible samples are:
! ! ( )

1,3 ; 1,5 ; 1,7 ; 1,9 ; 3,5 ; 3,7 ; 3,9 ; 5,7 ; 5,9 ; 7,9
b)
Solving the mean of each sample:

The below above shows the ten possible samples and the means of the ten possible
samples if we draw two numbers from the population of five numbers (1,3,5,7,&9). The
mean of sample 1 & 3 is 2, the mean of sample 1 & 5 is 3 and so on.
Sample ȳ
1,3 2
1,5 3
1,7 4
1,9 5
3,7 5
3,9 6
5,7 6
5,9 7
7,9 8

c) We will now solve and compute the mean of all the sample means by adding all the sample
means and dividing it by ten, the total number of sample means. The mean of all the sample
means is denoted by µȳ.

Sample ȳ The mean of all the sample means is denoted by µȳ which is


1,3 2 equal to:
1,5 3 µȳ = =5
1,7 4
1,9 5
3,5 4
3,7 5
3,9 6
5,7 6
5,9 7
7,9 8
TOTAL 50

d) To find the population mean µ :

µ = = =5

( µ)
e) We have to use the same formula σ = in solving the standard deviation of all the
sample means. Since we are solving the standard deviation of all the sample means which is
(ȳ µȳ)
denoted by σȳ , we will use the formula σȳ = (µȳ in this formula is actually µȳ)

(ȳ µȳ)
σȳ = = σȳ = = 1.73
Sample ȳ ȳ - µȳ (ȳ - µȳ )2
1,3 2 -3 9
1,5 3 -2 4
1,7 4 -1 1
1,9 5 0 0
3,5 4 -1 1
3,7 5 0 0
3,9 6 1 1
5,7 6 1 1
5,9 7 2 4
7,9 8 3 9
TOTAL 50 30

f) To solve the standard deviation of the population:

X X-µ (X - µ )2 µ= = =5
1 -4 16
( µ)
3 -2 4 σ= = = 2.83
5 0 0
7 2 4
9 4 16
25 40

There are ten possible samples that could be drawn if we take two samples from a population of five
numbers which are 1, 3, 5, 7, & 9. Since each sample has a corresponding mean so there are ten means
in this particular case. A simple frequency distribution of all the sample means will look like this:

ȳ f
2 1
3 1
4 2
5 2
6 2
7 1
8 1

TOTAL 10
This frequency distribution of all the sample means is what we called the sampling distribution of the
means which is our topic in this module. We will also discuss some important characteristics of a
sampling distribution of the mean. Take note from our example that there are ten possible samples that
could be drawn from these 5 elements in our population and we are just taking two samples at a time.
How many possible samples could be drawn if we want to take ten samples from a population of fifty?
! !
nCr =( = = 10,272,278,170
)! ! ( )! !

There are more than ten billion possible samples that could be drawn if we take ten samples from a
population of fifty and getting the means of all these samples is impossible. How much more if the size
of the population is more than this value likewise the sample size that you want to take is also
increased. In analyzing the mean of your sample, you need to determine the mean of all the sample
means and the standard deviation of all the sample means. Recall the standardized normal formula Z =
µ
. This formula is used if you want to determine the probability for the occurrence of an event given
that the distribution is normal and is based on the value of X. You can also use this standardize normal
formula in analyzing the mean of your sample. Revising this formula in order to be used in sampling
distribution of the sample means:
ȳ µȳ
Zα/2 = (µȳ in the formula is actually µȳ and ȳ σȳ)
ȳ

We will use the notation Zα/2 if we are analyzing the mean of all the sample means and Z if analyzing a
particular value of a normal distribution. Since we are analyzing a sample mean in sampling distribution
of the means, we will use this notation Zα/2.

Note that in order for you to use the formula, you need to find the value of the mean of all the sample
means µȳ and the standard deviation of all the sample means σȳ which is impossible to get through
manual computations. You could determine these two values from this theorem:

Theorem:

For random samples of size n taken from a population having the mean µ and the standard deviation σ,
the theoretical sampling distribution of the ȳ has the mean µ and the standard deviation:

σȳ = for finite population


σȳ = for infinite population


let us now try to interpret this theorem. It says in the first part of the theorem that the theoretical
sampling distribution of the sample means has a mean µ. Meaning, that the mean of all the sample
means is equal to the mean of the population µ (µȳ = µ). It can be shown from my example that the
mean of the ten sample means is 5 which is also the same as the mean of the population.
Let us now compute the standard deviation of all the sample means based from this theorem. Since the

population is finite with size N = 5 we use the first formula σȳ = .


.
σȳ = = 1.73

This theorem will help find the two values µȳ and σȳ which are needed in the analysis. The standard
deviation of all the sample means σȳ is also called the standard error of the means while this portion in

the formula in computing the standard error of the mean which is is called the finite correction
factor. This value is approximately equal to 1 if the size of the population is large.

You have to be careful in using this theorem because there is a condition in order for this theorem to be
applicable. It clearly says that “for random samples” which mean that your sample should be taken at
random otherwise you cannot use this theorem.

You might also wonder why we use the standardize normal formula for this sampling distribution of the
means. This is because of the so called Central Limit Theorem which says “ if n (the sample size) is large,
the sampling distribution of the means can be approximated closely with a normal distribution”. Going
back to my illustration using the example of getting two samples from a population of five elements, ten
possible samples could be drawn. Taking a look at the frequency distribution of these samples and if we
construct a histogram or a bar graph for this case, you will notice that the important characteristics of a
normal distribution is already present here. The center of the graph is the mean which is equal to 5 and
the graph is symmetric. If you take ten samples out of a population of fifty, there are more than ten
billion possible samples. If you will construct the histogram of this frequency distribution with more than
ten billion sample means, it would be more approximately normal with all the characteristics of a
normal distribution. How much more if you have more samples? Another important consideration in
order for us to use the central limit theorem is the condition which says that “if the sample size is large”.
Meaning, you can only use this standardized normal formula if the sample size is large and how large is
large. We consider a sample to be large if the sample size is more than 30. So, in order for us to use the
Central Limit Theorem, the sample size should be more than thirty.

You can use the Chebyshev’s Theorem if this condition is not satisfied. Chebyshev’s theorem say that
“we can assert with a probability of at least 1 - (denominator is actually K2) that the mean of a
random sample of size n will differ from the mean of the population from which it came by less than
Kσȳ”. These two theorems, the Central Limit Theorem and the Chebyshev’s Theorem are used to
determine the degree of precession of the estimate and the size of error in estimating the mean of the
population based from the mean of the sample. The Chebyshev’s Theorem could be used for any sample
size while the Central Limit Theorem could only be used for large sample size.
Example:

The mean of a random sample of size n = 100 is used to estimate the mean of a population having a
standard deviation σ = 25. With what probability can we assert that the error will be less than 5?

Solution:

This problem could be solved using the Chebyshev’s Theorem. Probability is represented by P = 1 -
(denominator is actually K2) while the error is represented by e = k σȳ .

σȳ = = = 2.5
√ √

e = k σȳ ; K = = =2
ȳ .

P= 1- = 1-( )
= 0.75 = 75%

We have to change this figures back to the statement of the Chebyshev’s Theorem and we now say in
our analyses and interpretation that “ we are least 75% sure that the mean of the random sample will
differ from the mean of the population by less than 5”. We just change these inequalities into equations
to simplify our solution but we have to change it back to the original statement of the theorem.

This problem could also be solved using the Central Limit Theorem since the sample size is more than 30
and we consider this as a large sample. We use the normal distribution and the normal curve when we
use the Central Limit Theorem. Recall that there are two scales in a normal distribution, the Z scale and
the X scale. Based from our discussion in the last module, we could determine the area under the
normal curve based from the Z scale using the normal distribution table or the Z table. When the point
of reference is on X scale, we have to convert it into Z scale using the standardized normal formula in
order to solve the area which represents probability.

In sampling distribution of the mean, the reference point is represented by ȳ and not X while the
standard scale Z is represented by Zα/2 since we are not dealing with an individual value of X but with a
sample mean ȳ. Imagine that you have a normal curve with center equal to µȳ and any point in the curve
is represented by ȳ. Since the mean of all the sample means µȳ is equal to the mean of the population µ
(µȳ = µ), the center of the curve is now the population mean µ while any point along the line is ȳ. The
difference between the sample mean and the population mean is the error of estimate. This error of
estimate is represented by the distance from the center(the population mean) to ȳ. Let us say that the
true population mean is equal to 500 (µ = 500) which you do not know and you want to estimate this
value by using the mean of your sample. If the mean of your sample is 505 and you say that this is the
mean of the population, there is an error of 5 units in your estimate. This is a case of over estimation.
There is also an error of under estimation. If the mean of the sample is 495, there is also 5 units error in
estimation. Error of estimation could occur in both directions.
Using the normal curve in sampling distribution, the probability that the difference between the
population mean and the sample mean will be less than any value is represented by the area from ȳ to
the center. However, you have to change this reference point of ȳ into its standard value denoted by Zα/2
in order use the normal distribution table and solve the area required. Using the standardized normal
formula:
ȳ µȳ
Zα/2 = ȳ
(µȳ in the formula is actually µȳ and ȳ σȳ)

But based from the theorem presented above:

µȳ = µ and σȳ = (for infinite population or large sample)


So the formula now is:


ȳ µ
Zα/2 = /√

The result of ȳ - µ is positive if the sample mean ȳ is more than the population mean (for over
estimation) and negative if the sample mean ȳ is less than the population mean (for under estimation).
The area representing the probability that sample mean will differ from the population mean is the area
from -Zα/2 (under estimation) to Zα/2(over estimation). Take note also that the difference between the
sample mean and the population mean (ȳ - µ) is the error of estimate (e). Substituting this in the
formula:
ȳ µ
Zα/2 =
/√

Zα/2 = /√

This is now the formula that you will use for the Central Limit Theorem.

Solving this problem using the Central Limit Theorem:

e = 5, σ = 25, n = 100

Zα/2 = /√
= /√
=2 -Zα/2 = -2

v = 0.9772 v = 0.0228

p = 0.9772 – 0.0228

p = 0.9545 = 95.45%

There is a 95.45% probability that the error will be less than 5.


Both theorems could be applied to this case but have two different figures representing the degree of
precision of the estimate. Using the Chebyshev’s Theorem, we can assert that there is at least 75%
percent probability that the error will be less than 5. But if we use the Central Limit Theorem, we can
assert that there is a 95.45% probability that the error will be less than 5. This is the reason why most
statisticians use the Central Limit Theorem because of the higher degree of precision shown in
estimating the population mean based from the sample mean. This is also why most statisticians take
samples more than 30. We never use the Chebyshev’s Theorem if the sample size is more than 30.
However, there are studies wherein it is difficult or even impossible to get samples of more than 30.
That is the time that this Chebyshev’s Theorem is used.

Example 2:

A researcher wants to estimate the average time required to finish a certain task. He conducted a survey
and asked 225 workers about the time it will take them to finish that certain task. If the standard
deviation of the time it will take to finish the task can be assumed to be equal to 3 hours, with what
probability can he assert that his error will be less than fifteen minutes using

a) Chebyshev’s Theorem
b) Central Limit Theorem

Solution:

The first step in solving this problem is to identify the given values. You have to change the error terms
to hours since the standard deviation σ is in hours. The mean, the standard deviation and the error term
should be with the same units.

n = 225, σ = 3, e = =0.25 hours


/

a) Using the Chebyshev’s Theorem:

σȳ = = σȳ = = 0.2
√ √

.
e = K σȳ ; K = = = 1.25
ȳ .

p=1- =1-( . )
=1- .
= 1 – 0.64 = 0.36

p = 36%

The researcher can assert that he is at least 36% sure that his error is less than fifteen minutes if he will
use the mean of his sample in estimating the true average time required to finish this task.
b) Using the Central Limit Theorem:
.
Zα/2 = /√
= /√
= 1.25 -Zα/2 = -1.25

v = 0.8944 v = 0.1056
p = 0.8944 – 0.1056
p = 0.7888
p = 78.88%

The researcher can assert with a probability of 78.88% that his error is less than fifteen minutes if will
use the mean of his sample in estimating the true average time required to perform the task.

The formula based from the Central Limit Theorem (Zα/2 = ) could also be used to determine the
/√
sample size needed if you want to estimate the mean of the population based from the mean of the
sample.

/ / 2
Zα/2 = /√
; = /
; √ = ; n=

Example:

A manufacturer wants to be 95% sure that he is error by less than 1 minute and 45 seconds in estimating
the true average time it takes to perform a certain task. If the standard deviation of the time it takes to
perform the task can be assumed to be equal to 10 minutes, on how large a sample should he base his
estimate.

Solution:

Using the Central Limit Theorem:

At .95 probability; Zα/2 = 1.96 (p =0.95 is the area from -Zα/2 to Zα/2 ; the area to the right of Zα/2 plus the
area to the left of -Zα/2 is equal to 1 – p or 1 – 0.95 = 0.05 since the entire area under the normal curve is
1 or 100% of itself. The sum of these two remaining areas outside p is usually denoted by α. So, p + α = 1
or α = 1 – p and the area to the right of Zα/2 is one half of alpha or α/2 while the area to the left of -Zα/2 is
also one half alpha or α/2. Note that the table that we are using gave the area to the left of Z which is
denoted by v or the tabular value. The area between -Zα/2 and Zα/2 which is equal to p plus the area to
the left of -Zα/2 is equivalent to the tabular value of Zα/2; p + α/2 = v)

At 0.95 probability,
P = 0.95
α = 0.05
α/2 = 0.25
v = p + α/2 = 0.95 + 0.25 = 0.975
Zα/2 = 1.96
σ = 10; e = 1.75 minutes
/ 2
n=

( ) . 2
n= = 125.44
.

The manufacturer needs at least 125.44 samples in order to be 95% sure that his error is less than 1
minute and 45 seconds. Since the sample size should always be an integer, the manufacturer should
take a sample size of 126, the next upper integer. The manufacturer cannot guarantee that his error will
be less than 1 minute and forty five seconds If the sample size will be less than 125.44. This formula is
just a guide in determining the sample size needed in estimating the mean of the population but there
should always be a factor of safety. In actual situation, I would like to recommend that you get a sample
size of at least five percent more than the result of this formula. In case the result of this formula is less
than 30, you have to get a sample of at least 32 in order to satisfy the condition required for this
formula.

You will notice in my examples above that the standard deviation of the population is known or given.
You do not have this information in actual situation. You always have to remember that there is a
significant difference between the standard deviation of the sample (s) and the population standard
deviation (σ) If the sample size is 30 or less. So you cannot use the sample standard deviation in the
formula where the population standard deviation is required. However, there is no significant difference
between the population standard deviation and the sample standard deviation if the sample size is more
than 30. So you can use the sample standard deviation where the population standard deviation is
required.

Example:

A student who conducted a pre test recorded the time required for the respondents to finish answering
his questionnaire. He conducted his data gathering procedures using his questionnaire to 40 people
getting the average time or the mean of 18.65 minutes with a standard deviation of 2.40 minutes to
finish answering the said questionnaire. What can he say about the possible size of his error in
estimating as 18.65 minutes the true average it takes to answer the questionnaire at 0.95 probability?

Solution:

At 0.95 probability, α = 0.05; α/2 = 0.025; v = p + α/2; v = 0.95 + 0.25 = 0.975

When v = 0.975, Zα/2 = 1.96

Since n > 30, s is approximately equal to σ

Substituting these values to the formula:

( / )
Zα/2 = ; e=
/√ √

( . ) .
e= = 0.74 minutes or 0.74 x 60 = 44.4 seconds

The possible size of his error of estimate is 44.4 seconds.

This student could say that the average time required for the respondents to answer his questionnaire is
18.65 minutes and he is 95% sure of this that his error is less than 44.4 seconds.

The result could be best explained if the student will express the result as an interval estimate of the
mean and we will discuss this in the next module.

References:
1. Freund, John E., Williams, Frank J.; Business statistics
2. Ronald E. Walpole (1974); Introduction to Statistics: Third Edition; EDCA Publishing &
Distributing Corporation, Quezon City
3. Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers (1998); Probability and Statistics for
Engineers and Scientists: sixth edition; Prentice Hall International, Inc.
GUIDED EXERCISE

Fill the blanks of this guided exercise with the correct figures, words, phrases or sentences, write
your name in every page and scan it together with your solution of the exercises given at the
end of this material. Send these materials through email to robbiecapon@gmail.com.

In order for you to understand more about this topic, let us have a short review of our previous topics by
solving this problem:

A sample of size n =2 is to be drawn from a finite population of size N = 5 whose elements are the
numbers 1, 2, 3, 4 & 5.

g) How many possible samples can be drawn?


h) Find the mean of each sample.
i) Find the mean of all the sample means.
j) Find the population mean.
k) Find the standard deviation of the sample means.
l) Find the standard deviation of the population.

Solution:

a) Recall from our previous topic in probability that we discuss three types of problems in
probability; the permutation, combination and problems that could be solved using the
theorem. The sample 1 & 2 is just the same as sample 2 % 1 hence this problem is a____1_____
problem. The total number of samples that could be drawn could be solved using the
combination formula since this type of problem is a combination problem. The total number of
samples is equal to:
!
5C2 = = = 10 and the possible samples are:
! ! ( )

_____________________________2___________________________
b)
Solving the mean of each sample: (fill the blanks to complete the table:3 points)

Sample ȳ
___ __
___ __
___ __
___ __
___ __
___ __
___ __
___ __
___ __

The below above shows the ten possible samples and the means of the ten possible
samples if we draw two numbers from the population of five numbers (1,2,3,4,&5). The
mean of sample 1 & 2 is 1.5, the mean of sample 1 & 3 is 2 and so on.

c) We will now solve and compute the mean of all the sample means by adding all the sample
means and dividing it by ten, the total number of sample means. The mean of all the sample
means is denoted by µȳ.

Sample ȳ The mean of all the sample means is denoted by µȳ which is


1,2 1.5 equal to:
1,3 2 µȳ = ____6_____
1,4 2.5
1,5 3
2,3 2.5
2,4 3
2,5 3.5
3,4 3.5
3,5 4
4,5 4.5
TOTAL 30

d) To find the population mean µ :

µ = = _____7_______

( µ)
e) We have to use the same formula σ = in solving the standard deviation of all the
sample means. Since we are solving the standard deviation of all the sample means which is
(ȳ µȳ)
denoted by σȳ , we will use the formula σȳ = (µȳ in this formula is actually µȳ)

(ȳ µȳ)
σȳ = = σȳ = ______8______

fill the blanks to complete the table below: ( 3 points)


Sample ȳ ȳ - µȳ (ȳ - µȳ )2
1,2 1.5 __ __
1,3 2 __ __
1,4 2.5 __ __
1,5 3 __ __
2,3 2.5 __ __
2,4 3 __ __
2,5 3.5 __ __
3,4 3.5 __ __
3,5 4 __ __
4,5 4.5 __ __
TOTAL 50 __

f) To solve the standard deviation of the population:

X X-µ (X - µ )2 µ= = =3
1 -2 4
( µ)
2 -1 1 σ= = ____12_____
3 0 0
4 1 1
5 2 4
15 10

There are ten possible samples that could be drawn if we take two samples from a population of five
numbers which are 1, 2, 3, 4, & 5. Since each sample has a corresponding mean so there are ten means
in this particular case. A simple frequency distribution of all the sample means will look like this: (fill the
blanks to complete the table: 3 points)

ȳ f
1.5 1
__ __
__ __
__ __
__ __
__ __
__ __

TOTAL 10
This frequency distribution of all the sample means is what we called the sampling distribution of the
means which is our topic in this module. We will also discuss some important characteristics of a
sampling distribution of the mean. Take note from our example that there are ten possible samples that
could be drawn from these 5 elements in our population and we are just taking two samples at a time.
How many possible samples could be drawn if we want to take ten samples from a population of fifty?
! !
nCr =( = = 10,272,278,170
)! ! ( )! !

There are more than ten billion possible samples that could be drawn if we take ten samples from a
population of fifty and getting the means of all these samples is impossible. How much more if the size
of the population is more than this value likewise the sample size that you want to take is also
increased. In analyzing the mean of your sample, you need to determine the mean of all the sample
means and the standard deviation of all the sample means. Recall the standardized normal formula Z =
____16____. This formula is used if you want to determine the probability for the occurrence of an
event given that the distribution is normal and is based on the value of X. You can also use this
standardize normal formula in analyzing the mean of your sample. Revising this formula in order to be
used in sampling distribution of the sample means:
ȳ µȳ
Zα/2 = (µȳ in the formula is actually µȳ and ȳ σȳ)
ȳ

We will use the notation Zα/2 if we are analyzing the mean of all the sample means and Z if analyzing a
particular value of a normal distribution. Since we are analyzing a sample mean in sampling distribution
of the means, we will use this notation Zα/2.

Note that in order for you to use the formula, you need to find the value of the mean of all the sample
means µȳ and the standard deviation of all the sample means σȳ which is impossible to get through
manual computations. You could determine these two values from this theorem:

Theorem:

For random samples of size n taken from a population having the mean µ and the standard deviation σ,
the theoretical sampling distribution of the ȳ has the mean µ and the standard deviation:

σȳ = for finite population


σȳ = for infinite population


let us now try to interpret this theorem. It says in the first part of the theorem that the theoretical
sampling distribution of the sample means has a mean µ. Meaning, that the mean of all the sample
means is equal to the mean of the population µ (µȳ = µ). It can be shown from my example that the
mean of the ten sample means is 3 which is also the same as the mean of the population.
Let us now compute the standard deviation of all the sample means based from this theorem. Since the

population is finite with size N = 5 we use the first formula σȳ = .


σȳ = ________17_______

This theorem will help find the two values µȳ and σȳ which are needed in the analysis. The standard
deviation of all the sample means σȳ is also called the _____________18________ while this portion in

the formula in computing the standard error of the mean which is is called the
________________19_____________. This value is approximately equal to 1 if the size of the
population is large.

You have to be careful in using this theorem because there is a condition in order for this theorem to be
applicable. It clearly says that “for random samples” which mean that your sample should be taken at
random otherwise you cannot use this theorem.

You might also wonder why we use the standardized normal formula for this sampling distribution of
the means. This is because of the so called Central Limit Theorem which says “ if n (the sample size) is
large, the sampling distribution of the means can be approximated closely with a normal distribution”.
Going back to my illustration using the example of getting two samples from a population of five
elements, ten possible samples could be drawn. Taking a look at the frequency distribution of these
samples and if we construct a histogram or a bar graph for this case, you will notice that the important
characteristics of a normal distribution is already present here. The center of the graph is the mean
which is equal to 5 and the graph is symmetric. If you take ten samples out of a population of fifty, there
are more than ten billion possible samples. If you will construct the histogram of this frequency
distribution with more than ten billion sample means, it would be more approximately normal with all
the characteristics of a normal distribution. How much more if you have more samples? Another
important consideration in order for us to use the central limit theorem is the condition which says that
“if the sample size is large”. Meaning, you can only use this standardized normal formula if the sample
size is large and how large is large. We consider a sample to be large if the sample size is more than
___20__. So, in order for us to use the Central Limit Theorem, the sample size should be more than
thirty.

You can use the Chebyshev’s Theorem if this condition is not satisfied. Chebyshev’s theorem say that
“we can assert with a probability of at least 1 - (denominator is actually K2) that the mean of a
random sample of size n will differ from the mean of the population from which it came by less than
Kσȳ”. These two theorems, the Central Limit Theorem and the Chebyshev’s Theorem are used to
determine the degree of precession of the estimate and the size of error in estimating the mean of the
population based from the mean of the sample. The Chebyshev’s Theorem could be used for any sample
size while the Central Limit Theorem could only be used for large sample size.
Example:

The mean of a random sample of size n = 50 is used to estimate the mean of a population having a
standard deviation σ = 5. With what probability can we assert that the error will be less than 2?

Solution:

This problem could be solved using the Chebyshev’s Theorem. Probability is represented by P = 1 -
(denominator is actually K2) while the error is represented by e = k σȳ .

σȳ = = ___21______

e = k σȳ ; K = ȳ
= ______22______

P= 1- = ______23________

We have to change this figures back to the statement of the Chebyshev’s Theorem and we now say in
our analyses and interpretation that “ we are least 87.5% sure that the mean of the random sample will
differ from the mean of the population by less than 2”. We just change these inequalities into equations
to simplify our solution but we have to change it back to the original statement of the theorem.

This problem could also be solved using the Central Limit Theorem since the sample size is more than 30
and we consider this as a large sample. We use the normal distribution and the normal curve when we
use the Central Limit Theorem. Recall that there are two scales in a normal distribution, the Z scale and
the X scale. Based from our discussion in the last module, we could determine the area under the
normal curve based from the Z scale using the normal distribution table or the Z table. When the point
of reference is on X scale, we have to convert it into Z scale using the standardized normal formula in
order to solve the area which represents probability.

In sampling distribution of the mean, the reference point is represented by ȳ and not X while the
standard scale Z is represented by Zα/2 since we are not dealing with an individual value of X but with a
sample mean ȳ. Imagine that you have a normal curve with center equal to µȳ and any point in the curve
is represented by ȳ. Since the mean of all the sample means µȳ is equal to the mean of the population µ
(µȳ = µ), the center of the curve is now the population mean µ while any point along the line is ȳ. The
difference between the sample mean and the population mean is the error of estimate. This error of
estimate is represented by the distance from the center(the population mean) to ȳ. Let us say that the
true population mean is equal to 500 (µ = 500) which you do not know and you want to estimate this
value by using the mean of your sample. If the mean of your sample is 505 and you say that this is the
mean of the population, there is an error of 5 units in your estimate. This is a case of over estimation.
There is also an error of under estimation. If the mean of the sample is 495, there is also 5 units error in
estimation. Error of estimation could occur in both directions.
Using the normal curve in sampling distribution, the probability that the difference between the
population mean and the sample mean will be less than any value is represented by the area from ȳ to
the center. However, you have to change this reference point of ȳ into its standard value denoted by Zα/2
in order use the normal distribution table and solve the area required. Using the standardized normal
formula:
ȳ µȳ
Zα/2 = ȳ
(µȳ in the formula is actually µȳ and ȳ σȳ)

But based from the theorem presented above:

µȳ = µ and σȳ = (for infinite population or large sample)


So the formula now is:

Zα/2 = _____24_____

The result of ȳ - µ is positive if the sample mean ȳ is more than the population mean (for over
estimation) and negative if the sample mean ȳ is less than the population mean (for under estimation).
The area representing the probability that sample mean will differ from the population mean is the area
from -Zα/2 (under estimation) to Zα/2(over estimation). Take note also that the difference between the
sample mean and the population mean (ȳ - µ) is the error of estimate (e). Substituting this in the
formula:
ȳ µ
Zα/2 =
/√

Zα/2 = /√

This is now the formula that you will use for the Central Limit Theorem.

Solving this problem using the Central Limit Theorem:

e = 2, σ =5, n = 50

Zα/2 = /√
= ____25_____ -Zα/2 = ____26____

v = ___27____ v = ______28______

p = _______29_________

Both theorems could be applied to this case but have two different figures representing the degree of
precision of the estimate. Using the Chebyshev’s Theorem, we can assert that there is at least 87.5%
percent probability that the error will be less than 2. But if we use the Central Limit Theorem, we can
assert that there is a 99.54% probability that the error will be less than 2. This is the reason why most
statisticians use the Central Limit Theorem because of the higher degree of precision shown in
estimating the population mean based from the sample mean. This is also why most statisticians take
samples more than 30. We never use the Chebyshev’s Theorem if the sample size is more than 30.
However, there are studies wherein it is difficult or even impossible to get samples of more than 30.
That is the time that this Chebyshev’s Theorem is used.

Example 2:

A researcher wants to estimate the average time required to finish a certain task. He conducted a survey
and asked 200 workers about the time it will take them to finish that certain task. If the standard
deviation of the time it will take to finish the task can be assumed to be equal to 5 hours, with what
probability can he assert that his error will be less than thirty minutes using

c) Chebyshev’s Theorem
d) Central Limit Theorem

Solution:

The first step in solving this problem is to identify the given values. You have to change the error terms
to hours since the standard deviation σ is in hours. The mean, the standard deviation and the error term
should be with the same units.

n = 200, σ = 5, e = =0.50 hours


/

c) Using the Chebyshev’s Theorem:

σȳ = = σȳ = ____30_____

e = K σȳ ; K = ȳ
= _____31_____

p=1- = ________32_______

The researcher can assert that he is at least ____33___% sure that his error is less than thirty minutes if
he will use the mean of his sample in estimating the true average time required to finish this task.

d) Using the Central Limit Theorem:


Zα/2 = = _______34______ -Zα/2 = _____35_____
/√

v = ______36_____ v = ______37______

P = ________________38______________
The researcher can assert with a probability of ______39_______ that his error is less than thirty
minutes if will use the mean of his sample in estimating the true average time required to perform the
task.

The formula based from the Central Limit Theorem (Zα/2 = /√


) could also be used to determine the
sample size needed if you want to estimate the mean of the population based from the mean of the
sample.

/ / 2
Zα/2 = /√
; = /
; √ = ; n=

Example:

A manufacturer wants to be 99% sure that he is error by less than 1 minute and 30 seconds in estimating
the true average time it takes to perform a certain task. If the standard deviation of the time it takes to
perform the task can be assumed to be equal to 10 minutes, on how large a sample should he base his
estimate.

Solution:

Using the Central Limit Theorem:

At .99 probability; Zα/2 = ____40____ (p =0.99 is the area from -Zα/2 to Zα/2 ; the area to the right of Zα/2
plus the area to the left of -Zα/2 is equal to 1 – p or 1 – 0.99 = 0.01 since the entire area under the normal
curve is 1 or 100% of itself. The sum of these two remaining areas outside p is usually denoted by α. So,
p + α = 1 or α = 1 – p and the area to the right of Zα/2 is one half of alpha or α/2 while the area to the left
of -Zα/2 is also one half alpha or α/2. Note that the table that we are using gave the area to the left of Z
which is denoted by v or the tabular value. The area between -Zα/2 and Zα/2 which is equal to p plus the
area to the left of -Zα/2 is equivalent to the tabular value of Zα/2; p + α/2 = v)

At 0.99 probability,
P = 0.99
α = 0.01
α/2 = 0.005
v = p + α/2 = 0.99 + 0.005 = 0.995
Zα/2 = _____41_____
σ = 10; e = 1.5 minutes
/ 2
n=

n = _____42______

The manufacturer needs at least 295.84 samples in order to be 99% sure that his error is less than 1
minute and 30 seconds. Since the sample size should always be an integer, the manufacturer should
take a sample size of 296, the next upper integer. The manufacturer cannot guarantee that his error will
be less than 1 minute and thirty seconds If the sample size will be less than 295.84. This formula is just a
guide in determining the sample size needed in estimating the mean of the population but there should
always be a factor of safety. In actual situation, I would like to recommend that you get a sample size of
at least five percent more than the result of this formula. In case the result of this formula is less than
30, you have to get a sample of at least 32 in order to satisfy the condition required for this formula.

You will notice in my examples above that the standard deviation of the population is known or given.
You do not have this information in actual situation. You always have to remember that there is a
significant difference between the standard deviation of the sample (s) and the population standard
deviation (σ) If the sample size is 30 or less. So you cannot use the sample standard deviation in the
formula where the population standard deviation is required. However, there is no significant difference
between the population standard deviation and the sample standard deviation if the sample size is more
than 30. So you can use the sample standard deviation where the population standard deviation is
required.

Example:

A student who conducted a pre test recorded the time required for the respondents to finish answering
his questionnaire. He conducted his data gathering procedures using his questionnaire to 50 people
getting the average time or the mean of 15 minutes with a standard deviation of 2 minutes to finish
answering the said questionnaire. What can he say about the possible size of his error in estimating as
18.65 minutes the true average it takes to answer the questionnaire at 0.99 probability?

Solution:

At 0.99 probability, α = 0.01; α/2 = 0.005; v = p + α/2; v = 0.99 + 0.005 = 0.995

When v = 0.995, Zα/2 = ____43____

Since n > 30, s is approximately equal to σ

Substituting these values to the formula:

( / )
Zα/2 = /√
; e=

e = ____44______

This student could say that the average time required for the respondents to answer his questionnaire is
15 minutes and he is 99% sure of this that his error is less than 43.78 seconds.

The result could be best explained if the student will express the result as an interval estimate of the
mean and we will discuss this in the next module.
Exercises

1. The mean of a random sample of size n = 80 is used to estimate the mean of a population with a
standard deviation σ = 7. With what probability can we assert that the error will be less than 2 if
we use
a) Chebyshev’s Theorem
b) Central Limit Theorem
2. A social scientist wants to determine the average time required to do a certain task. He
conducted a survey asking 150 individuals about the time for them to finish the job. The result
shows that the time required to finish the job has a mean of 3 hours with a standard deviation
of 15 minutes. What is the possible size of his error in saying that he is 98% sure that the
average time required to do the job is 3 hours?
3. A researcher wants to be 96% sure that his error will be less than 2 minutes in estimating the
true average time required for his respondents to answer his questionnaire. What size of sample
does he need if the standard deviation of the time required to answer the questionnaire can be
assumed to be equal to 5 minutes?

You might also like