Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Sampling and Sampling

distribution: Probability and


non-probability Sampling
Population vs Sample
• Population refers to the entire group of objects or persons of interest.
The population of interest might be all the persons in the city
receiving welfare payments or all the computer chips produced during
the last hour.
• A sample is a portion, a part, or a subset of the population. Fifty
welfare recipients out of 4,000 receiving payments might constitute
the sample, or 20 computer chips might be sampled out of 1,500
produced last hour.
Reasons for Sampling
1. To contact the whole population would often be very time consuming. To ask every eligible
voter if they plan to vote for the current senator in the forthcoming election would take months.
The election would probably be over before the survey was completed.
2. The cost of studying all the items in the population is often prohibitive. Some television
program ratings are established by analyzing the viewing habits of about 1,200 viewers. The
cost of studying all the homes having television would be exorbi­tant.
3. The physical impossibility of checking all the items in the population. The South Dakota
Game Commission, for example, cannot check all the deer, grouse, and other wild game
because they are always moving.
4. The destructive nature of certain tests. The manufacturer of fuses cannot test all of them
because in the testing the fuse is destroyed and none would be available for sale.
5. The adequacy of sample results. If the sample results of the viewing habits of 1,200 homes
revealed that only 1.1 percent of the homes watched “60 Minutes” no doubt the program would
be replaced by another show. Checking the viewing habits of all the homes regarding "60
Minutes" probably would not change the percent significantly.
Probability Sampling Methods

• Four types of probability sampling are commonly used: simple


random sampling, systematic random sampling, stratified random
sampling, and cluster sampling. The most widely used type of
sampling is a simple random sample.
Simple random sample
• A sample selected so that each item or person in the population has the same
chance of being included.
• Several ways of selecting a simple random sample are:
• The name or identifying number of each item in the population is recorded on
a slip of paper and placed in a box. The slips of paper are shuffled and the
required sample size is chosen from the box.
• Each item is numbered and a table of random numbers, such as the one in
Appendix B.6, is used to select the members of the sample.
• There are many software programs, such as MINITAB and Excel, which have
routines that will randomly select a given number of items from the
population.
Systematic random sample.

• A random starting point is selected and then every kth member of the
population is selected
• In a systematic random sample the items or individuals of the
population are arranged in some way — alphabetically, in a file drawer
by date received, or by some other method. A random starting point is
selected, and then every kth member of the population is selected for
the sample. In a systematic random sample, you might take all the
items in the population and number them 1, 2, 3,.... Next, a random
starting point is selected, let's say 39. Every kth item thereafter, such as
every 100th, is selected for the sample. This means that 39, 139, 239,
339, and so on would be a part of the sample.
Stratified random sampling.

• A population is divided into subgroups, called strata, and a sample is


randomly selected from each stratum
• For example, if our study involved Army personnel, we might decide
to stratify the population (all Army personnel) into generals, other
officers, and enlisted personnel. The number selected from each of
the three strata could be proportional to the total number in the
population for the corresponding strata. Each member of the
population can belong to only one of the strata. That is, a military
person cannot be a general and a private at the same time.
Cluster sampling.

• A population is divided into clusters using naturally occurring


geographic or other boundaries. Clusters are then randomly selected
and a sample is collected by randomly selecting from each cluster.
• Cluster sampling is often used to reduce the cost of sampling when the
population is scattered over a large geographic area. Suppose the
objective is to study household waste collection in a large city.
• Step 1:Divide the city into smaller units (perhaps precincts).
• Step 2:The precincts are numbered and several selected randomly.
• Step 3:House­holds within each of these precincts are randomly
selected and interviewed.
Sampling “Error”

• It is not logical to expect that the results obtained from a sample will
coincide exactly with those from a population. For example, it is
unlikely that the mean welfare payment for a sample of 50 recipients
is exactly the same as the mean for all 4,000 welfare recipients. We
expect a difference between a sample statistic and its corresponding
population parameter. The difference is called sampling error.
• The difference between a sample statistic and its corresponding
population parameter
Sampling Distribution of the Sample Mean

• Suppose all possible samples of size n are selected from a specified population, and the
mean of each of these samples is com­puted. The distribution of these sample means is
called the sampling distribution of the sample mean.
• A probability distribution of all possible sample means of a given sample size
• The sampling distribution of the mean is a probability distribution and has the following
major characteristics:
1. The mean of all the sample means will be exactly equal to the population mean.
2. If the population from which the samples are drawn is normal, the distribution of
sample means is also normally distributed.
3. If the population from which the samples are drawn is not normal, the sampling
distribution is approximately normal, provided the samples are “sufficiently” large
(usually accepted to include at least 30 observations).
Sampling Distribution of the Proportion
• Consider a population of N = 5 numbers: 0, 3, 6, 3, 18. It consists of
even numbers 0, 6, 18 and two odd numbers 3, 3. Thus the
population proportion of even numbers is:
X 3
    0. 6.
N 5
Now consider a sample of size n  3 , say, the first three population numbers: 0, 3, 6. Two of the
numbers in the sample are even i.e. x  3. Hence the sample proportion of even numbers is
• Ten samples of size can be drawn from a population of . Followings
are the all possible samples along with their sample estimates of
proportions.
Samples Observations x p
ABC 0,3,6 2 2/3
ABD 0,3,3 1 1/3
ABE 0,3,18 2 2/3
ACD 0,6,3 2 2/3
ACE 0,6,18 3 1
ADE 0,3,18 2 2/3
BCD 3,6,3 1 1/3
BCE 3,6,18 2 2/3

BDE 3,3,18 1 1/3


CDE 6,3,18 2 2/3
• As done earlier for sample means, we can construct sampling
distribution of sample proportion:
p f Prob.

1/3 3 0.3

2/3 6 0.6

1 1 0.1

Now, calculate the mean and standard deviation of this sampling distribution of proportion:

 p  0.6
(Verify it !)
 p  0.2

Thus we see that:

 p    0 .6
 (1   ) N  n (0.4)(0.6) 53
p    0.04  0.2
n N 1 3 5 1
Random Number Tables
APPENDIX
• How to use a random number table:
• Note: This method is one from a variety of methods of reading numbers from random number tables.
• 1. Assume you have the test scores for a population of 200 students. Each student has been assigned a number from
1 to 200. We want to randomly sample only 5 of the students for this demo.
• 2. Since the population size is a three-digit number, we will use the first three digits of the numbers listed in the table.
• 3. Without looking, point to a starting spot in the table. Assume we land on 75636 (3rd column, 2nd entry).
• 4. This location gives the first three digits to be 756. This choice is too large (> 200), so we choose the next number in
that column. Keep in mind that we are looking for numbers whose first three digits are from 001 to 200 (representing
students).
• 5. The second choice gives the first three digits to be 407, also too large. Continue down the column until you find 5
of the numbers whose first three digits are less than or equal to 200.
• 6. From this table, we arrive at 070 (07015), 038 (03811), 045 (04594), 055 (05542), and 194 (19428).
• 7. RESULT: Students 38, 45, 55, 70, and 194 will be used for our random sample.
• Our sample set of students has been randomly selected where each student had an equal chance of being selected
and the selection of one student did not influence the selection of other student

You might also like