Professional Documents
Culture Documents
Sampling Lesson 10: Sampling Without Replacement: The Binomial and Hypergeometric Probability Models
Sampling Lesson 10: Sampling Without Replacement: The Binomial and Hypergeometric Probability Models
Sampling Lesson 10: Sampling Without Replacement: The Binomial and Hypergeometric Probability Models
Mix the marbles well. Then, draw n < N successively, without putting back those previously drawn.
In this case, the outcomes of the draws are no longer independent. Hence the variable X = number of
white marbles in the sample will not behave according to the binomial pmf 2. From basic counting
techniques (through a branch of mathematics called combinatorics), the number of ways n can be
𝑁!
drawn from N marbles is NCn = (𝑁−𝑛)!𝑛!; the number of ways x white marbles can be drawn from M
and n-x from N-M non-white marbles is MCx, (N-M)C(n-x), respectively. Therefore, the probability of
having x marbles in the sample is
BRIEF EXPLANATION
1Strictly speaking, random does not mean equal probability. However, very early on in the history of
statistics, simple random was used to describe sampling with equal probability, and the usage persisted.
2Xwill follow the binomial pmf if the marbles are drawn with replacement: draw, replace, mix the marbles,
and repeat n times. Then, the outcomes of the draws are independent and you are drawing from the same
population (of marbles) each time.
(𝑴 𝑪𝒙 )(𝑵−𝑴 𝑪𝒏−𝒙 )
𝑷(𝒙 = 𝒙) = for x = 0, 1, 2, …, n
𝑵 𝑪𝒏
𝜎 𝑛 𝑁 𝜎 𝑁−𝑛
𝑆𝐸 = √[1 − ( )] [ ]= √( )
√𝑛 𝑁 𝑁−1 √𝑛 𝑁−1
𝑁−𝑛 𝑛
The term √( ) is called the finite population correction (fpc) and the ratio ( ) is the
𝑁−1 𝑁
sampling rate, where n and N are the sample size and population size, respectively. When
the sampling rate is small enough, the two SEs for the mean (where sampling is conducted
with and without replacement) can be assumed to be virtually the same.
𝑛
Bu how small is “small enough”? it is depend on the situation, although ( ) < .05 is a workable
𝑁
rule of thumb in many real situations. In majority of actual sampling applications, N is very large so
the fpc is replaced by 1.
First Second Average Average Average First Second Average Average Average First Second Average Average Average
Sample Sample Sample
Student Student Height Weight BMI Student Student Height Weight BMI Student Student Height Weight BMI
Figure 1 illustrates the sampling distributions for the average height, average weight, and average
BMI of sample size n=2.
Figure 2: Sampling distributions of the sample means of size n=2 female students
(selected at random without replacement) for their (a) heights, (b) weights, and (c) BMI levels
Computations can also be readily made for the EVs and the SEs of the sampling distributions for the
average height, average weight, and average BMI when a sample size n=2 is taken (where sampling
is done without replacement). They yield:
Recall the EVs and the SEs of the sampling distributions for the average height, average
weight, and average BMI of sample size n=2 (when sampling is conducted with replacement).
They were:
Figure 4. Sampling Distribution of the Sample Mean Weight (taken from a random sample
without replacement) of size (i) n=3; (ii) n=5; (iii) n=9; (iv) n=14
Figure 5. Sampling Distribution of the Sample Mean BMI (taken from a random sample
without replacement) of size (i) n=3; (ii) n=5; (iii) n=9; (iv) n=14
What justifies the choice sampling “without replacement” over “with replacement”? As was pointed
out, more information is gained by having sampling done without replacement.
EXAMPLE 2:
A janitor has 20 keys, and one of them is the key to a locked office door. Should sample the
keys with or without replacement?
If he randomly tries the keys one by one, but does not eliminate the ones he tries, then he is
sampling with replacement. In this case, the long-run average number of tries to unlock the
door is 20.
If he tries the keys one by one, eliminating the ones that do not work, then he is sampling
without replacement. In this case, the long-run average number of tries to unlock the door is
11.
In this case, sampling without replacement makes sense over sampling with replacement.
KEY POINTS:
Sampling with replacement results in independent events that are unaffected by previous
outcomes, but in practice, there is more of sampling without replacement since we do want
to have more information. Additional information is gained whenever a new unit is drawn, but
no new information is gained from a unit that had already been drawn previously (which
happens when sampling is done with replacement).
When selecting a relatively small sample from a large population, obtaining a sample of
independent subjects occurs whether we sample with replacement or without replacement.
While the standard error (SE) of the sampling distribution of the mean is
𝜎
𝑆𝐸 =
√𝑛
When sampling with replacement, the SE for the mean for sampling without replacement is
less, and given by
𝜎 𝑛 𝑁 𝜎 𝑁−𝑛
𝑆𝐸 = √[1 − ( )] [ ]= √( )
√𝑛 𝑁 𝑁−1 √𝑛 𝑁−1
Where s is the population standard deviation, while n and N are the sample size and
population size, respectively.
𝑁−𝑛 𝑛
o The term √( ) is called the finite population function (fpc) and the ratio ( ) is
𝑁−1 𝑁
the sampling rate.
o When the sampling rate is small enough, the two SEs (for with and without
replacement) can be assumed to be virtually the same. In majority of actual sampling
applications, N is very large so that the fpc is replaced by 1.
For the special case, when the sample mean is actually a proportion, the EV of the sampling
distribution of sample proportion p is the population proportion P; the standard error (SE) is
𝑃(1 − 𝑃) 𝑁 − 𝑛
√ √( )
𝑛 𝑁−1
Where n and N are the sample size and population size, respectively.
APPLICATION AND ASSESSMENT
Do the following.
1. A city has 300,000 registered voters, with 120,000 of them poor. A survey organization is
about to take a random sample of 1,000 registered voters. Describe the sampling distribution
of the fraction of poor among the 1,000 sampled voters.
2. Consider a school district that has 10,000 11th graders. In this district, the average weight of
an 11th grader is 45 kg, with a standard deviation of 10 kg. Suppose you draw a random
sample of 50 students. What is the probability that the average weight of a sampled student
will be less than 42.5 kg?
4. A simple random sample of 400 persons 15 years old and above is taken in Naga City. The
total years of schooling of all the sampled persons is 3230, so that the average educational
3230
attainment is ≈ 8.1 years. The standard deviation of the sample data is 4.1 years. Describe
400
the sampling distribution.