By the end of the session, the learner should be able to:
1. Identify examples of the applications of sampling in health research;
2. Enumerate the advantages of sampling;
3. Define the meaning of basic sampling concepts;
4. Differentiate between probability and non-probability sampling designs; and
5. Describe the procedures in implementing the basic probability sampling designs


Sampling refers to the process of getting information from a sub-set of a population, and use
that information to make generalizations about the whole population. It is a process which we
frequently do in everyday life. When we buy fruits for example, we frequently decide to buy a
kilo of fruit only after asking permission from the vendor to taste a piece or two. For those who
want to go into show business, so much of their fate often hinges on how well they are able to
show their talent on that one single day when they go for auditions. When a patient goes to a
doctor for consultation, the diagnosis of his illness is sometimes based on the results of the
tests done on the 5ml of blood extracted from him. These are a few examples of the
applications of sampling in different aspects of our lives.

This sub-module discusses the applications of sampling in health research. It explains the
meaning of terms frequently used in sampling. It describes the basic probability and non-
probability sampling designs. A large part of this sub-module is spent in describing the steps in
implementing the different probability sampling designs.


The applications of sampling in health research are endless, in order to address different types
of health problems, using different units of observation and analysis.

Most health researches involve the sampling of individuals, which is done when evaluating the
health status of a given population or when investigating the risk factors or determinants of
certain diseases or conditions. Even studies done to evaluate the effectiveness of health
interventions frequently require the sampling of individuals who are exposed and unexposed to
the intervention being studied.

There are also several health research problems which require the sampling not of individuals,
but of health facilities. A study which aims to assess the level of disaster preparedness of
hospitals in a given area for example, would require the selection of sample hospitals if the total
number of hospitals in the area is too numerous to be covered within the time frame and budget
of the survey. There are so many other interesting research problems about characteristics of
health facilities which are investigated by getting a sample of health facilities like assessing the
adequacy of their equipment, the accessibility of health services offered, the efficiency of their
information system, or the determination of important indicators like the bed occupancy rate,
average length of hospital stay, average number of patients seen per day by health centers, or
the percent of health centers with a medical doctor.

Communities can be sampled too, such as when the objective is to determine the level of
community participation in health activities, or the proportion of communities (cities,
municipalities, barangays) with a Disaster Prepared Plan. In environmental health or in studies
related to climate change, most of the variables studied refer to community characteristics like
the level of air pollution, the temperature, amount of rainfall or the humidity.

Even non-human populations or entities can be sampled to address health problems. Assessing
the quality of mineral water sold in streets is done by getting a sample of bottles of mineral
water sold by street vendors. The determination of the mosquito population which is of interest
in malaria control programs require the sampling of mosquitoes. Those interested in mercury
poisoning resulting from mining activities often get water samples from rivers and waterways
near the mining sites. One needs to get samples of fish sold in markets to determine if formalin
is one of the substances used by vendors as preservative, while a sample of mussels and other
types of shellfish is needed to determine the incidence of red tide. This sub-module however,
will focus on sampling of human populations.


The reason why sampling is commonly used in health research is because it has several
advantages. Among these are as follows:

a. It is cheaper. Definitely, it costs much less to collect data from a small subset of the
population, compared to including everybody.

b. It is faster. The time taken to collect data from a small group is shorter than what it takes to
cover everybody.

c. Better quality of information can be collected. It is a lot easier to supervise data collectors
and implement data quality control mechanisms if only a smaller portion of the population is

d. More comprehensive data may be obtained. If the whole population will be covered, the
feasibility of doing studies like the nutrition surveys which try to get very detailed
information such as a 24-hour recall of food intake will have a lot of feasibility problems and
will not be finished in a timely manner.

e. It is the only possible method when the procedure is destructive. An example of a

destructive method is getting blood specimen from a person for purposes of laboratory
testing. Definitely, blood sampling is the only possible way to collect data in this case,
otherwise, the person will die.


One of the most frequently used terms in sampling is population. Although in usual
conversations, the word population typically refers to people, in sampling terminology, it refers
to the totality of whatever is being studied. As such it does not always refer to people -- it can be
a population of houses, health centers, medical records, snails or artesian wells, among others.
The usual notation used for the size of the population is “N”. Since the population is frequently
too big to be studied entirely, a sub-set of the population is usually taken. This sub-set is
referred to as the sample. The usual notation used for the sample is “n”.

When talking about populations, we would usually want to distinguish between the target
population and the sampling population. The target population is the group from which
representative information is desired and to which inferences will be made. Whatever conclusions
will be derived from the study, will be generalized to the target population. The sampling population
on the other hand is the population from which a sample will actually be taken. Ideally, the target
population should be the same as the sampling population. However, there are certain instances
when there is a gap between the two, resulting from limited resources and other field constraints.
When this occurs, what is important is for the investigator to determine the extent and direction of
the bias (if any) created by the gap between the target and the sampling populations.

To illustrate this point, let us assume that a researcher was commissioned by the Provincial
Health Office to conduct a survey to determine the prevalence of disability among children 6 to
12 years old in Province X. The survey has limited budget and the results need to be submitted
within 4 months in order for the results to be included as part of the inputs of the strategic
planning process to be conducted in 4 months’ time. In view of these constraints, the researcher
decided to collect data for this research by conducting a school survey. Did the researcher’s
decision to conduct a school survey lead to a:

a. Gap between the target and the sampling population of the study?
b. Bias in the resulting estimate of the prevalence of disability among 6-12 year old children?

An analysis of the situation described above shows that there is a gap between the target and
the sampling populations because while the target population consists of all children 6-12 years
old in Province X, the sampling population covers only school children within the same age-
group, eliminating those who are out-of-school. Since in general, the prevalence of disability
among out-of-school children is expected to be higher than those who are in-school, then the
gap between the target and the sampling population could possibly lead to an under-estimates
of the prevalence of disability that will be computed based on the results of the school survey.

Another pair of terms which need to be differentiated from each other are the elementary units
or elements and the sampling unit. An elementary unit or element is an object or a person on
which a measurement is actually taken while the sampling unit is the unit which was chosen in
selecting the sample. If a list of elementary units is available for purposes of sample selection,
then the sampling unit will be the same as the elementary unit. However, if a list of elementary
units is not available, then the researcher will be used to use other types of lists for purposes of
sample selection. For example, in a study to determine the prevalence of malnutrition among
preschoolers, a list of preschoolers in the community may not be available. However, there is a
list of households. The researcher can use this list, in order to get a sample of households. All
preschoolers found in the sampled household will be included in the study. In this example, the
sampling unit is the household, but the elementary unit is the preschooler. This type of sample
selection will be described later on as cluster sampling.

A very important requirement for purposes of sample selection is the sampling frame. This is a
listing or any other material like spot maps or aerial photographs which shows or accounts for
the target population. It is a collection of the sampling units. The type of sampling frame which is
available determines the type of sampling design which can be used in selecting the survey


There are two general types of sampling designs – the probability and non-probability
sampling designs.

In non-probability sampling designs, the probability of each member of the sampling

population to be selected in the sample is difficult to determine or cannot be specified, hence
the reliability of the resulting estimates of the sample results cannot be assessed. As such, the
external validity of the results becomes an issue. Examples of non-probability sampling designs
are purposive sampling, judgment sampling, convenience or accidental sampling and the
snow-ball technique or referral sampling These are the types of designs usually used in
qualitative studies.
In contrast, in probability sampling designs, the rules and procedures for selecting the sample
and estimating the parameters are explicitly and rigidly specified. As such, the reliability of the
resulting estimates can be determined. Most quantitative studies use probability sampling
designs in the selection of subjects. These are the types of sampling designs which will be
described in detail in this sub-module


Before we describe the different probability sampling designs, let us first identify what are the
components which should be included if one describes the sampling design of a given research.
Five (5) aspects of the sampling design should be described, namely:

a. Where? -- geographic area to be covered by the survey

b. Who? – These are the elements (households, mothers, infants, etc.) to be studied in the
survey. In cases when the subjects of the survey are not in a position to provide the
information (ex., young children, sick elderly), the actual survey respondents must be

c. How many? – This refers to the sample size or the number of elements to be included in the
survey and its basis. The values of important parameters considered in sample size
determination must be explicitly indicated (ex., specific variable used as basis, anticipated
value of the variable, confidence level; margin of error; power of the test, etc.)

d. How to select? – This is a description of the procedures to be followed in selecting the

elements to be included. If stratification variables are used, these should be mentioned
with a concise justification why they were considered. If multi-stage sampling is used,
the sampling units at each level of selection and the corresponding sampling frames
used must be mentioned.

e. When? – This refers to the time period for the conduct of the survey. This is an
important consideration when the variable being studied has seasonality


In the next sections, the description and method of sample selection using the five basic
probability sampling designs are presented. These include simple random sampling; stratified
random sampling; systematic sampling; cluster sampling; and multi-stage sampling designs.

5.7.1 Simple Random Sampling

The main characteristic of simple random sampling is that every element in the population has an
equal chance of being included in the sample. The procedures for sample selection are as follows:

a. Prepare the sampling frame

b. Number all the population elements in the sampling frame chronologically from 1 to N,
where N is the population size

c. Determine the required sample size, n.

d. Select n numbers at random between 1 and N, using either the lottery method, a table of
random numbers found at the back of statistics books, or by generating random
numbers by using a computer software like Excel. In Excel, the function used to
generate random numbers is =randbetween(bottom, top) where the numbers in
parenthesis are the range of values from which the random numbers will be generated.

e. The population elements in the list whose numbers correspond to the n numbers
randomly selected will comprise the simple random sample

In the above steps, you may have noted that we are required to already determine the sample size,
n. However, since sample size computation will still be taken-up in the next sub-module, for our
purposes, we will just assume that the sample size has been computed for us and is a given.

5.7.2 Stratified Random Sampling

Stratified random sampling is used when the investigator wants to:
a. ensure that groups of interest or subsections of the population considered important
for the study are adequately represented
b. derive reasonably precise estimates for important subsections of the population

The procedure is similar to that of simple random sampling except that the population elements
must first be grouped according to categories of the stratification variable, prior to sample
selection. Specifically, the procedures are as follows:

a. Identify the stratification variable.

b. Classify the population elements according to the categories of the stratification variable
c. Number the population elements chronologically from 1 to N, within each category of the
stratification variable.
d. Determine the sample size needed from each stratum
e. Within each stratum, select the required number of samples by simple random sampling.

To compare the method of sample selection between simple random sampling and stratified
random sampling, suppose we have the following situation:

N=800 households of which: N = 320 and N = 480

Urban rural

n=200 households of which: n = 80 and n = 120

Urban Rural

The difference in the number and content of the sampling frames to be prepared, as well as in
the method of sample selection between the two types of sampling designs are summarized in
the Table 1.




Simple random sampling List of 800 households, Select 200 numbers at random,
numbered chronologically from between 1 and 800
1 to 800

Stratified random sampling Two sampling frames are Urban and rural samples are
needed: selected separately, as follows:
a. For URBAN areas: List of a. For urban areas, 80
320 urban households, numbers are selected at
chronologically numbered random between 1 and 320
between 1 and 320 b. For rural areas, 120
b. For RURAL areas: List of numbers are selected at
480 rural households, random between 1 and 480
chronologically numbered
between 1 and 480

In the above example, you might be wondering how it was determined to get 80 samples from
the urban area and 120 samples from the rural area. This brings us to the issue of sample size
allocation in stratified random sampling.

There are many ways of allocating samples to every stratum in stratified random sampling. The
most commonly used method is proportional allocation, which is done as follows:
Suppose we want to allocate 250 samples to 3 sample barangays included in the study.
These 3 barangays have the following populations:


A 3, 000 15.0
B 10,500 52.5
C 6,500 32.5
TOTAL 20,000 100.0

Proportional allocation means that the sample will be distributed following the exact distribution of
the population. The resulting sample allocation following this principle is shown in Table 3.



A 3, 000 15.0 38 15.0
B 10,500 52.5 131 52.5
C 6,500 32.5 81 32.5
TOTAL 20,000 100.0 250 100.0

5.7.3 Systematic Sampling

Systematic sampling is the same as that of simple random sampling in the sense that under
this sampling design, every element also has an equal chance of being selected. However, it
is often used under the following conditions:

a. the population elements are too many to list or to number chronologically

b. a frame is not available
This sampling design is often used in combination with other designs.
The process for sample selection using systematic sampling is as follows:

a. Determine the required sample size, n.

b. Determine the sampling interval, k, where: k = N/n
c. Select a number at random between 1 and k. The population element in the frame
corresponding to the random number selected will be the first to be included in the
d. Include in the sample survey every k population element after the first random
number selected

Using the same example presented earlier where N=800 and n=200, the procedure for sample
selection will be as follow:

a. Compute for the sampling interval, k where k=N/n. Therefore k=800/200 = 4. This
means that for every 4 households in the population, 1 household will be selected as

b. Select a random number between 1 and 4. Suppose #2 was selected. Therefore, the
second household in the population to be studied is included as sample.

c. Every 4th household thereafter will be included on the study. These include
households number 2, 6, 10, 14, 18, 22, 26, 30, 34, 38. etc.

5.7.4 Cluster Sampling

Cluster sampling was described earlier when we defined the elementary unit and the sampling
unit. It is used when a frame for the individual elementary units in the population is not available.
However, a frame for groups or clusters of elements is available. In cluster sampling, the
sampling unit is different from the elementary unit.

The steps in sample selection using cluster sampling are as follows:

a. Identify the groups or clusters of elementary units. It is best if the sizes of the clusters
are not too big and do not vary much from each other.
b. Select a random sample of clusters.
c. All elements in the selected clusters will be included in the survey.

5.7.5 Multi-stage Sampling

Multi-stage sampling is generally used when the survey has a wide coverage and a sampling
frame for the elementary units is difficult to obtain. Under this design, Sampling is done in
successive stages and data collection is concentrated only on the samples selected at each
stage, resulting in lower cost per unit of inquiry. One of the disadvantages of multi-stage
sampling is that statistical analysis of the data is more complicated.

The procedures for sample selection when using multi-stage sampling are as follows:
a. Determine the number of stages of selection to be used in the sampling design and the
sampling units to be used at each stage.
b. Determine the sample size necessary for each stage of selection.
c. Prepare the sampling frame for the 1 stage of selection, and select at random a
sample of primary sampling units (PSUs).
d. For each of the PSUs earlier selected, prepare the sampling frame for the 2nd stage of
selection. Randomly select the corresponding number of secondary sampling units
(SSUs) from each PSU included in the sample.
e. Repeat the process of frame preparation and sample selection until the last stage of
sampling is reached.


1. Give an example of how the following persons can apply sampling in their work:
a. Hospital Director
b. Director of Food and Drug Administration (FDA)
c. Microbiologist

2. Suppose a researcher wishes to determine the prevalence of psychosocial

problems among survivors of mining accidents in Benguet in 2017. Because of
the difficulty of looking for and interviewing survivors in their homes, the
researcher decided to conduct a mining company survey, and interview
survivors who have gone back to work.
a. What is the target population for this study?
b. What is the sampling population based on the study design adopted by the
c. Is there a gap between the target and the sampling populations?
d. If your response to c) is yes, did this gap lead to a bias on the resulting
estimate of the prevalence of psychosocial problems among survivors of
mining accidents? If yes, what is the direction of this bias?


Mendoza, O.M. (2000). Sampling Human Populations (Chapter 4 in BIO 201 Fundamentals of
Biostatistics). Manila. Distance Education Program, UPM College of Public Health.

Mendoza, O.M., et al. (1996). Foundations of Statistical Analysis for the Health Sciences. Manila.
Dept. of Epidemiology and Biostatistics, UP College of Public Health.


