IPPTCh 008

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Learning

Objectives
• LO8-1 Explain why populations are sampled and describe four
methods to sample a population.
• LO8-2 Define sampling error.
• LO8-3 Demonstrate the construction of a sampling distribution
of the sample mean.
• LO8-4 Recite the central limit theorem and define the mean and
standard error of the sampling distribution of the sample mean.
• LO8-5 Apply the central limit theorem to calculate probabilities.
LO8-1 Explain why populations are sampled and describe
four methods to sample a population.

Why Sample a Population?


§ Selecting a sample is less time-consuming than selecting every
item in the population.
§ Selecting a sample is less costly than selecting every item in the
population.
§ Analyzing a sample is less cumbersome and more practical than
analyzing the entire population.
§ The destructive nature of some tests.
LO8-1

Commonly Used Sampling Methods


• Simple Random Sampling
• Systematic Random Sampling
• Stratified Random Sampling
• Cluster Sampling
LO8-1

Simple Random Sampling


SIMPLE RANDOM SAMPLING A sample selected so that each item
or person in the population has the same chance of being included.

EXAMPLE:
A population consists of 845 employees of Nitra Industries. A sample of 52
employees will be selected from that population. The name of each employee is
written on a small slip of paper and all slips are deposited in a box. After they
have been thoroughly mixed, the first selection is made by drawing a slip out of
the box without looking at it. This process is repeated until the sample of 52
employees is chosen.
Note: This process is sampling without replacement, so the probability of each
selection changes: 1/845, 1/844, 1/843, etc. When the population is large, the
difference in the probabilities is very small. In this case, the probability for each
of the 52 selections is about 0.001.
LO8-1

Simple Random Sampling: Using Table of Random Numbers


A population consists of 845 employees of Nitra Industries. A sample
of 52 employees will be selected from that population.
A more convenient method of selecting a random sample is to use the
identification number of each employee and a table of random
numbers such as the one in Appendix B.4.
LO8-1

Simple Random Sampling: Using Excel


Jane and Joe Miley operate the Foxtrot Inn, a bed and breakfast in
Tryon, North Carolina. There are eight rooms available for rent at
this B&B. For each day of June 2017, the number of rooms rented is
listed. Use Excel to select a sample of five nights during the month
of June.
LO8-1

Simple Random Sampling: Using Excel

• Open the data file


• Go the Data tab
• Click on Data Analysis
• Select Sampling and click OK
• Enter the values as shown in figure
LO8-1

Disadvantages of Simple Random Sampling

• It is an expensive method of sampling as it requires a complete


list of all potential respondents to be available beforehand.

• A large population requires a larger sample frame. It is difficult to


manage the large population with simple random sampling.
LO8-1

Systematic Random Sampling


SYSTEMATIC RANDOM SAMPLING A random starting point is selected
and then every kth member of the population is selected for the sample.
𝑁
𝑘=
𝑛
EXAMPLE:
A population consists of 845 employees of Nitra Industries. A sample of 52
employees will be selected from that population.
First, k is calculated as the population size divided by the sample size. For
Nitra Industries, we would select every 16th (845/52) employee list. If k is not
a whole number, then round down. Random sampling is used in the selection
of the first name. Then, select every 16th name on the list thereafter.
Limitation of Systematics Sampling
Before using systematic random sampling, we should carefully observe
the physical order of the population. When the physical order is related
to the population characteristic, then systematic random sampling should
not be used because the sample could be biased.
For example, if we wanted to audit the invoices in a file drawer that were
ordered in increasing dollar amounts, systematic random sampling would
not guarantee an unbiased random sample.
To overcome the inefficiency of simple random sampling and the
potential selection bias involved with systematic sampling, you can use
either stratified sampling methods or cluster sampling methods.
LO8-1

Stratified Random Sampling

STRATIFIED RANDOM SAMPLING A population is first


divided into subgroups, called strata, and a sample is
selected from each stratum. This is useful when a
population can be clearly divided in groups based on
some characteristics.
Suppose we want to study the advertising expenditures for the 352
largest companies in the United States to determine whether firms
with high returns on equity (a measure of profitability) spend more of
each sales dollar on advertising than firms with a low return or deficit.
We decide to sample a total of 50 companies.
To make sure that the sample is a fair representation of the 352
companies, the companies are grouped on percent return on equity
and the number to sample in each group is proportional to the relative
size of the group. Then, the number of companies is randomly
selected from each group.

* Number sampled for


Stratum 1 is (0.02)(50)=1;
Stratum 2 is (0.10)(50)=5, etc.
LO8-1

Cluster Sampling

CLUSTER SAMPLING A population is divided into


clusters using naturally occurring geographic or other
boundaries. Then, clusters are randomly selected, and
a sample is collected by randomly selecting from each
cluster.
Suppose you want to determine the views of residents in the
greater Chicago, Illinois, metropolitan area about state and
federal environmental protection policies.

You can employ cluster sampling by subdividing the region into


small units, perhaps by counties. These are often called
primary units. Of the 12 counties, you randomly select 3: La
Porte, Cook, and Kenosha. Next you select a random sample of
residents in each of these counties.
LO8-2 Define sampling error.

Sampling Error
By definition, sampling is used to calculate sample statistics which
are estimates of population parameters. So there will always be a
difference (usually an unknown difference) between the sample
statistic and the population parameter. This difference is called
sampling error.
Examples: x! − μ
s–σ
s2 – σ2
p–π
Refer to the example/solution on page 253, where we studied the
number of rooms rented at the Foxtrot Inn bed and breakfast in
Tryon, North Carolina. The population is the number of rooms rented
each of the 30 days in June 2017. Find the mean of the population.
Select three random samples of 5 days. Calculate the mean rooms
rented for each sample and compare it to the population mean. What
is the sampling error in each case?
The population mean is

Open the data file


Some Important Facts
• How many samples are possible?
Answer: 30C5 =142,506 samples
• Each sample may have different sample
mean and different sampling errors.
• Adding all sampling errors for all 142506
samples, the answer will be 0.
LO8-3

A Question!

In previous example we obtained three random samples


and we get three different estimates for the population
mean.
How can we determine how accurate the estimate is?
LO8-3

Sampling Distribution of the Sample Mean SDSM

Sampling Distribution of the Sample Mean A


probability distribution of all possible sample means of a
given sample size.
LO8-3

Sampling Distribution of the Sample Mean – Example


Tartus Industries has seven production employees (considered the population).
The hourly earnings of each employee are given in the table below.

1. What is the population mean?


2. What is the sampling distribution of the sample mean for samples of size 2?
3. What is the mean of the sampling distribution?
4. What observations can be made about the population and the sampling
distribution?
1.

2. There are 21 possible samples of size 2, found by 7C2 = 21


Below is the list of all possible samples with their means.
The Probability distribution of all 21-sample means is given below

3. The mean of all sample mean is usually written as 𝜇"̅


14×3 + 15×9 + 16×6 + 17×3
𝜇"̅ = = $15.43
21
4. These observations can be made:
§ The mean of the distribution of the sample mean
($15.43) is equal to the mean of the population.
§ The spread in the SDSM is less than the spread in the
population values. The sample means range from $14
to $17 while the population values vary from $14 up
to $18. If we continue to increase the sample size, the
spread of the SDSM becomes smaller.
§ The shape of the SDSM and the shape of the
frequency distribution of the population values are
different. The SDSM tends to be more bell-shaped and
to approximate the normal probability distribution.
A population consists of the following five values: 2, 2, 4, 4,
and 8.
a) List all samples of size 2, and compute the mean of each
sample.
b) Compute the mean of the distribution of sample means
and the population mean. Compare the two values.
c) Compare the dispersion in the population with that of the
sample means.
Solution a. Sample
1
Values
2,2
Mean
2
2 2,4 3
3 2,4 3
4 2,8 5
5 2,4 3
6 2,4 3
7 2,8 5
8 4,4 4
9 4,8 6
10 4,8 6
b. μ = (2 + 2 + 4 + 4 + 8)/5 = 4
𝜇"̅ = (2+3+3+5+3+3+5+4+6+6 )⁄10 = 4 They are equal

c. The dispersion for the population is greater than that for the
sample means. The population varies from 2 to 8, whereas
the sample means only vary from 2 to 6.
LO8-4 Recite the central limit theorem and define the mean and
standard error of the sampling distribution of the sample mean.

Central Limit Theorem

CENTRAL LIMIT THEOREM If all samples of a particular size


are selected from any population, the sampling distribution
of the sample mean is approximately a normal distribution.
This approximation improves with larger sample size.
Regardless of the shape of population distribution, SDSM becomes normal as sample size LO8-4
increases.
Discussion
• If the population follows a normal probability distribution, then
for samples of any sizes the SDSM will also be normal.
• If the population distribution is symmetrical (but not normal),
the normal shape of the SDSM occurs with samples as small as
10.
• If the population distribution is skewed or thick tailed, it may
require sample size of 30 or more to observe the normality
feature.
LO8-4

Central Limit Theorem – Example


Ed Spence began his sprocket business 20 years ago. The business has grown over the
years and now employs 40 people. Spence Sprockets Inc. faces some major decisions
regarding health care for these employees. Before making a final decision on what health
care plan to purchase, Ed decides to form a committee of 5 representative employees.
The committee will be asked to study the health care issue carefully and make a
recommendation as to what plan best fits the employees’ needs. Ed feels the views of
newer employees toward health care may differ from those of more experienced
employees. If Ed randomly selects this committee, what can he expect in terms of the
mean years with Spence Sprockets for those on the committee? How does the shape of
the distribution of years of service of all employees (the population) compare with the
shape of the SDSM? The years of service of the 40 employees currently on the Spence
Sprockets, Inc., payroll are as follows.
The distribution for the population
of 40 employees is positively
skewed.
The population mean is
//010/20⋯0405
μ= = 4.80
16

Papulation of 40 Employees
25 Samples of Five Employees

The distribution of these 25 sample


means does not reflect the same
degree of skewness as shown in the
previous graph (population).
The mean of means is
8.6 + 3.8 + 7.6 + ⋯ + 2.6 + 1.8
𝜇"̅ =
25
= 4.35
25 Samples of 20 Employees

Now increase the sample size from 5 to


20. Comparing this graph with the
previous two graphs, we observe:
• The distribution approaches to the
normal probability distribution.
• There is less dispersion.
The mean of means is
3.95 + 3.25 + ⋯ + 4.3 + 5.05
𝜇"̅ =
25
= 4.676
LO8-4

Standard Error
§ The mean of the SDSM will be exactly equal to the population mean if we
are able to select all possible samples of the same size from a given
population. 𝜇"̅ = 𝜇
§ The Standard deviation of SDSM or Standard Error of the Mean:

x
n
§ There will be less dispersion in the SDSM than in the population. As the
sample size increases, the standard error of the mean decreases.
LO8-4

Standard Error
The size of the standard error is affected by two values:

§ Population standard deviation σ


The larger σ means the larger 𝜎 ∕ 𝑛. If the population is homogeneous,
resulting in a small σ, the standard error will also be small.

§ Sample size n
A large n will result in a small standard error of estimate, indicating that there
is less variability in the sample means.
LO8-5 Apply the central limit theorem
to calculate probabilities.

Using the SDSM


The SDSM will be normally distributed under 2 conditions:
1. If a population follows the normal distribution, the SDSM will also follow
the normal distribution. The size of the sample doesn’t matter.
2. If the shape is known to be non-normal but the sample contains at least
30 observations, the central limit theorem guarantees the SDSM follows
a normal distribution.
When the population standard deviation is known, a z-statistic for the SDSM
is calculated as: 𝑥̅ − 𝜇
𝑧=
𝜎⁄ 𝑛
LO8-5

Using the SDSM– Example


The Quality Assurance Department for Cola, Inc. maintains records
regarding the amount of cola in its jumbo bottle. The actual amount of cola
in each bottle varies a small amount from one bottle to the next. Cola, Inc.
does not wish to underfill or overfill the bottles. Records indicate that the
amount of cola follows the normal probability distribution. The mean
amount per bottle is 31.2 ounces and the population standard deviation is
0.4 ounces.
At 8am today the quality technician randomly selected 16 bottles from the
filling line. The mean amount of cola in the bottles is found 31.38 ounces.
Is this an unlikely result? Is it likely the process is putting too much soda in
the bottles? Or is the sampling error of 0.18 ounces unusual?
Step 1: Find the z-value corresponding to the sample mean of 31.38.

Step 2: Find the probability of observing a z equal to or less than 1.80.


P(0 < z <1.8) = 0.4641
P(z > 1.8) = 0.5 – 0.4641
= 0.0359

0.0359
What do we conclude?
It is unlikely, less than a 3.59% chance, we could
select a sample of 16 observations from a normal
population with μ = 31.2 ounces and a σ = 0.4
ounces and find the sample mean equal to or
greater than 31.38 ounces.
Moreover, there is 46.41% chance that the sample
mean will be greater than population mean.
We conclude the process is putting too much cola
in the bottles.
LO8-5

Self-Review 8-5
Refer to the Cola, Inc. information. Suppose the quality technician selected
a sample of 16 jumbo bottles that averaged 31.08 ounces. What can you
conclude about the filling process?

31.08 − 31.20
𝑧 = = −1.20
0.4/ 16
P(−1.2 < z < 0) = 0.3849
There is 38.49% chance that we collect a sample of 16 bottles, and they are
underfilled. Therefore the process is unusual. Also
P(z > −1.20) = 0.5000 + 0.3849 = 0.8849
There is more than an 88.49% chance the filling operation will produce
bottles with at least 31.08 ounces.
LO8-5

Exercise (Question 36)


A recent study by the Greater Los Angeles Taxi Drivers Association
showed that the mean fare charged for service from Hermosa Beach to
Los Angeles International Airport is $21 and the standard deviation is
$3.50. We select a sample of 15 fares.
a) What is the likelihood that the sample mean is between $20 & $23?
b) What must you assume to make the above calculation?
a) P(20 <𝑥<
̅ 23) = P(–1.11 < z < 2.21) 20 − 21
𝑧 = = −1.11
= 0.3665 + 0.4864 = 0.8529 3.5⁄ 15
23 − 21
𝑧 = = 2.21
3.5⁄ 15

b) Since the sample size is small and less than 30 we assume


the population is normally distributed in order to apply the
central limit theorem.
LO8-5

Exercise (Question 38)


The mean amount purchased by a typical customer at Churchill’s
Grocery Store is $23.50, with a standard deviation of $5.00. Assume
the distribution of amounts purchased follows the normal
distribution. For a sample of 50 customers, answer the following
questions.
a) What is the likelihood the sample mean is at least $25.00?
b) What is the likelihood the sample mean is greater than $22.50
but less than $25.00?
c) Within what limits will 90% of the sample means occur?
a) P(𝑥̅ > 25) = P(z > 2.12)
= 0.5 – 0.4830 = 0.017
25 − 23.5
𝑧 = = 2.12
b) P(22.5 <𝑥<̅ 25) = P(–1.41 <z< 2.12) 5⁄ 50
22.5 − 23.5
𝑧 = = −1.41
= 0.4830 + 0.4207 = 0.9037 5⁄ 50

c) 𝑥̅ = z𝜎"̅ + μ
In table 0.45 probability
= (-1.64)(0.7071) + 23.5
corresponds to z = 1.64
= 23.33 and z = –1.64.
𝑥̅ = z𝜎"̅ + μ 5
45% 45% 𝜎"̅ = = 0.7071
= (1.64)(0.7071) + 23.5 50
= 24.66
LO8-5

Exercise (Question 45)


Nike's annual report says that the average
American buys 6.5 pairs of sports shoes per year.
Suppose a sample of 81 customers is surveyed
and the population standard deviation of sports
shoes purchased per year is 2.1.

a) What is the standard error of the mean in this experiment?


b) What is the probability that the sample mean is between 6 and 7 pairs of
sports shoes?
c) What is the probability that the difference between the sample mean and
the population mean is less than 0.25 pair?
d) What is the likelihood the sample mean is greater than 7 pairs?
# %.'
a) 𝜎"! = = = 0.2333
$ ('

7.0 − 6.5
b) P(6 < 𝑥̅ < 7) = P(-2.15 < z < 2.15) 𝑧 = = 2.14
2.1⁄ 81
= 0.4838 + 0.4838 = 0.9676 6.0 − 6.5
𝑧 = = −2.14
2.1⁄ 81

c) P(6.25 < 𝑥̅ < 6.75) = P(-1.07 <z< 1.07) 6.25−6.5


𝑧 = = −1.07
= 0.0.3577 + 0.3577 = 0.7154 2.1⁄ 81
6.75−6.5
𝑧 = = 1.07
2.1⁄ 81
d) P(𝑥̅ > 7) = P(z > 2.14)
= 0.5 – 0.4838 = 0.0162
Questions
2, 9, 10, 12, 17, 18, 22, 31,
32, 37, 38, 39, 42, 43, 44

You might also like