Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

A Transcript of the Lectures in the Sampling Methods and Sampling Distribution Chapter

Lecture Objective:
In the past lectures, we discussed about the populations of discrete and continuous random variables and
the parameters that describe them. In this lecture, our goal is to learn about samples and the statistics that
describe them.

References Used:

Albert, J., Albacea, J., Ayaay, M., David, I., and de Mesa, I. (2016). Teaching Guide for Senior High School –
Statistics and Probability. Commission on Higher Education K to 12 Transition Program Management Unit.

Illowsky, B. and Dean, S. (2018). Introduction to Statistics. OpenStax.

Melosantos, L., Antonio, J., Robles, S., Bruce, R., and Sacluti, J (2016). Math Connections in the Digital Age
Statistics and Probability. Quezon City: Sibs Publishing House, Inc., 2016

Mendelhall, W., Beaver, R., and Beaver, B. (2013). Introduction to Probability and Statistics.
Pacific Grove, Calif. : Brooks/Cole ; Andover : Cengage Learning [distributor], 2013.

Lecture 3.1
Random Sampling

Introduction

Previously we learned about the normal distribution. Recall that the shape of a normal distribution is
determined by the mean and the standard deviation of the random variable. The mean and the standard
deviation of the normal random variable are called its parameters. To this, we say that we need the
parameters of the normal random variable to calculate probabilities associated with it. However, in the
real-world set-up, frequently, the parameters of the normal random variable are not always known.

For example, since grades are generally considered as normal random variables, then we have a reason to
believe that the scores of all Grade 11 UST-SHS Students in the Statistics and Probability 1st Quarterly
examinations is normally distributed. However, its parameters (the mean 𝜇 and standard deviation 𝜎) may
not always be available (or difficult to acquire).

In such case, we may have to rely on the sample to learn about the population through the statistics that
describe them. The mean and standard deviation of the sample grades of Grade 11 UST-SHS Students
approximate the actual values of 𝜇 and 𝜎. Now, if we want to provide reliable and valid information about
the population, we must be able to select the sample in a reasonable and justified way, that is, a statistically-
based randomized way.

(Read pages 243 to 246 of the Introduction to Probability and Statistics textbook (viewing link:
https://drive.google.com/file/d/152oxLsvFxxDIX2ly1Tmy8BS7d5bGquWP/view?usp=sharing) to learn
the proper way of selecting random samples.)

1
Lesson Proper

When we select a random sample from a given population, the numerical descriptive measures (mean,
standard deviation, and variance) are called its statistics. Note that the statistics of a sample taken from
the possible values of a random variable may be different each time you select as it is random in nature.
The probability distribution for the statistics is called sampling distribution.

Def. Sampling Distribution


The sampling distribution of a statistic is the probability distribution for the possible values of the
statistic that results when the random samples of size 𝑛 are repeatedly drawn from the population.

The sampling distribution can tell us the values of that statistic, and how often each value of the random
variable occurs.

Remark. The terms “sampling distribution”, “sampling distribution of a statistic”, and “sampling
distribution of the sample mean”, are synonymous terms.

Remark. There are generally three ways to find the sampling distribution of a statistic. The most
economical way to determine it is to use proven statistical theorems to derive the exact or approximate
sampling distributions.

Def. Central Limit Theorem


Suppose 𝑋 is a random variable with a known or unknown distribution. The central limit theorem states
that if we draw random samples of size 𝑛 from 𝑋, then when 𝑛 is large, the sampling distribution of the
sample mean (𝑋̅) consisting of sample means (𝑥̅ ) tend to be normal, having a mean that is the same as
𝜎2
𝜇, and a variance equal to . Given that the standard deviation is the square root of the variance, it can
𝑛
𝜎
be shown that its standard deviation (also referred to as standard error) is , often denoted as SE.
√𝑛
𝜎 2
In other words, given a random variable 𝑋 with parameters 𝜇 and 𝜎, when 𝑛 is large, 𝑋̅ ~𝑁 (𝜇, 𝑛 ).

Remark. The approximation given by the central limit theorem becomes more accurate as 𝑛 becomes large.
However, how large is “large”? Unfortunately, there is no rigorous answer to this question, but in general,
when the sample is at least thirty (30), the sampling distribution of a statistic becomes approximately
normal.

On another note, since the sampling distribution of the sample mean possesses a different standard
deviation computation, it follows that the standardization process is also different. So, to standardize a
value 𝑥̅ we use the formula:

𝑥̅ − 𝜇
𝑍= 𝜎
√𝑛

Example 3.1.1. The duration of Alzheimer’s disease from the onset of symptoms until death ranges from 3
to 20 years; the average is 8 years with a standard deviation of 4 years. The administrator of a large medical

2
center randomly selects the medical records of 30 deceased Alzheimer’s patients from the medical center’s
database and records the average duration. Find the approximate probabilities for these events: a) The
average duration is less than 7 years, and b) the average duration exceeds 7 years.
Solution to Example 3.1.1

a) It can be deduced from the given that the distribution is skewed a bit to the right. But regardless of
the population distribution we know that the sampling distribution of the sample mean, has a mean
4
similar to 𝜇, which according to the given is equal to 8 and a standard deviation of due to the
√30
Central Limit Theorem (CLT). Now, to find that the average is less than 7 years, we need to compute
for 𝑃(𝑋̅ < 7). And so, we have

7−8
𝑃(𝑋̅ < 7) = 𝑃 (𝑍 < ) = 𝑃(𝑍 < −1.37) = 0.0853 or 8.53%
4
√30

Recall that the value 0.0853 can be determined using a 𝑧-table; or the NORMSDIST function of MS
Excel (syntax: =norm.s.dist(-1.37,true)); or the NORMDIST function (syntax:
=norm.dist(7,8,0.7303,true))

b) 𝑃(𝑋̅ > 7) = 1 − 𝑃(𝑋̅ < 7) = 1 − 𝑃(𝑍 < −1.37) = 0.9147 or 91.47%

Remark. If the sampled population is normal, then the sampling distribution of the sample mean will also
be normal, no matter what sample size we choose.

Example 3.1.2. An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size
𝑛 = 49, are drawn randomly from the population. Find the probability that the sample mean is between
85 and 92.

Solution to Example 3.1.2.


In this problem, we want to find 𝑃(85 < 𝑋̅ < 92). And so, using the CLT, we have

92 − 90 85 − 90
𝑃(85 < 𝑋̅ < 92) = 𝑃(𝑋̅ < 92) − 𝑃(𝑋̅ < 85) = 𝑃 (𝑍 < ) − 𝑃 (𝑍 < )
15 15
√49 √49

After simplifying the argument of the probability function, we have,

14 7
𝑃 (𝑍 < ) − 𝑃 (𝑍 < − ) = 0.8149 or 81.49%
15 3

3
Illustration:

𝑃(𝑋̅)

𝑋̅
85 90 92

Remark. It is imperative to understand when the central limit theorem is used. If we are being asked to find
the probability of the mean, then we should use the CLT for the mean. On the other hand, note that if we
are being asked to find the probability concerning the value of a random variable, we use the techniques
for computing probabilities associated with the random variable’s distribution.

Remark. The important contribution of the central limit theorem will be highlighted when we embark on
making statistical inferences. We will see that the value used as “estimators” about the population’s
parameters are based on the averages of the sample measurements.

Example 3.1.3. In a recent study report made by a certain organization, it was discovered that the mean
age of tablet users is 34 years with a standard deviation of 15 years.

a) Suppose we take a sample size of 𝑛 = 100, what can we say about the mean and the standard error
for the sampling distribution of the sample mean ages of tablet users?
b) What does the distribution look like?
c) Using the reported parameters on this study, find the probability that the sample mean is more than
30 years old.
d) What is the age where only 5% of the sample means are greater than it?

Solution to Example 3.1.3.


a) Since the sample size is large enough, using the CLT, we can say that the mean of the sampling
distribution tends to be the same as the mean of the population. And so, the mean of the sampling
distribution of the sample mean is 34. On the other hand, using the same theorem, the standard
15
error of the sampling distribution is .
√100
b) According to the CLT, the distribution should be approximately bell-shaped.
30−34 8
c) 𝑃(𝑋̅ > 30) = 1 − 𝑃(𝑋̅ < 30) = 1 − 𝑃 (𝑍 < 15 ) = 1 − 𝑃 (− 3) = 0.9962 or 99.62%
√100
d) The age where only 5% of the samples means are greater than it its 95th percentile. In other words,
we want to find the sample mean, where 5% of the data are greater than it (or alternatively, 95%
are less than it). Thus, we need to compute for

𝑃(𝑋̅ < 𝑘) = 0.95 or 𝑃(𝑋̅ > 𝑘) = 0.05

Seeing that it is shorter (not to mention, easier) to compute for the value of 𝑘 using 𝑃(𝑋̅ < 𝑘) =
0.95, we go and solve this ahead using MS Excel. And so,
4
𝑃(𝑋̅ < 𝑘) = 0.95
𝑘 = 36.47 years old
(The value of 𝑘 was obtained using the following syntax:
=norm.inv(0.95,34,15/10))

Supplementary Exercises

1) Random samples of size 𝑛 were selected from populations with the means and variances given here.
Find the mean and standard deviation of the sampling distribution of the sample mean in each case:
a) 𝑛 = 36, 𝜇 = 10, 𝜎 2 = 9
b) 𝑛 = 100, 𝜇 = 5, 𝜎 2 = 4

2) Suppose that SHS faculty members (with a master teacher rank) in the Philippines earn an average
of 720,000 php per year with a standard deviation of 10,000. In an attempt to verify these given, a
random sample of 60 master teachers were selected from a database for all master teachers in the
country.

a) Describe the sampling distribution of the sample mean in terms of its governing characteristics.
b) Within what limits would you expect the sample average to lie, with a probability of 90 percent?
c) Calculate the probability that the sample mean is greater than 721,200 php per year.
d) If your random sample actually produced a mean of 723,500 php, would you consider this as
unusual? What conclusions you might draw?

- End of the Lecture Transcript -

You might also like