05 Biostat

BIOSTATISTICS & EPIDEMIOLOGY BSMLS-2C
ARMAN ARQUILADA RMT 2025
TOPIC OUTLINE II. TYPES OF PROBABILITY SAMPLING

I. TYPES OF SAMPLING DESIGN DESIGN
II. TYPES OF PROBABILITY SAMPLING
DESIGN
i. SAMPLE RANDOM SAMPLING 1) SAMPLE RANDOM SAMPLING
Characteristics:
ii. STRATIFIED RANDOM SAMPLING ➢ Every element in the population has an equal
chance of being included in the sample
a. Comparison of The Method of Sample
Between Simple and Stratified Random Procedures for sample selection:
Sampling 1. Prepare the sampling frame
b. Allocation of Sample to the Different Strata 2. Number all the population elements in the sampling
frame chronologically from 1 to N,
iii. SYSTEMATIC SAMPLING
iv. CLUSTER SAMPLING N = is the population size
3. Determine the required sample size, n
III. SAMPLE SIZE DETERMINATION
4. Select n numbers at random between 1 and N, using
i. Information Needed for Sample Size
Determination When Estimating A Mean either the lottery method or computer-generated
or A Proportion random numbers a software like EXCEL
ii. Sample Size Determination: Estimating A Lottery method: it is not practical if you have a large
Proportion Using Simple Random number of population.
Sampling
iii. Dealing with Common Issues in Sample Excel: =RANDBETWEEN (1,N)
Size Determination Example:
iv. Additional Considerations in Sample Size
Determination N= 1000
n=300
=RANDBETWEEN (1,1000)
I. TYPES OF SAMPLING DESIGN
➢ Dropped down until 300 only
1. None Probability Sampling Design

5. The population element in the list whose numbers
a. The probability of each member of the sampling correspond to the n numbers randomly selected will
population to be selected in the sample is difficult to comprise the simple random sample
determine or cannot be specified, hence the reliability 2) STRATIFIED RANDOM SAMPLING
of the resulting estimates of the sample results cannot Characteristics:
be assessed.
This design is used when the investigator wants to:
b. The external validity of the results becomes an issue a. Ensure that groups of interest or subsections of
the population considered important for the study
c. Examples of non-probability sampling becomes an are adequately represented
issue
o Purposive sampling b. Derive reasonably precise estimates for important
o Judgement sampling subsections of the populations
o Convenience or accidental sampling Procedures:
o Snow-ball technique or referral sampling
1. Identify the stratification variable (grouping variable)
d. These are the types of designs usually used in 2. Classify the population elements according to the
qualitative studies. categories of the stratification variable
3. Number the population elements chronologically from
2. Probability Sampling 1 to N, within each category of the stratification
variable
a. The results procedures for selecting the sample and
estimating the parameters are explicitly and rigidly 4. Determine the sample size needed from each stratum
specified. 5. Within each stratum, select the required number of
samples by simple random sampling
b. The reliability of the resulting estimates can be
Example:
determined
CAGAYAN STATE UNIVERSITY (GROUPING: Different
c. Most quantitative use probability sampling designs in campuses)
the selection of subjects
❖ Probability Sampling Design: randomized Andrews Carig

(Quantitative)
❖ Non-probability Sampling Design: not randomized
Lallo Aparri
(Qualitative)
BSMLS BATCH ’25 (Block 2C) Page 1 | 5

COMPARISON OF THE METHOD OF SAMPLE 𝑁𝐴

BETWEEN SIMPLE AND STRATIFIED RAMDOM
𝑛𝐴 = [ ]𝑥 𝑛
𝑁
SAMPLING 3000
𝑛𝐴 = [ ] 𝑥 250 = 37.5~38
SRS: 20,000
10500
Just get one sample 𝑛𝐵 = [ ] 𝑥 250 = 131.25~132
20,000
200 frame from the whole
population; no need 6500
𝑛𝐶 = [ ] 𝑥 250 = 81.25~82
for groupings 20,000
Add all answers:
STRS:
nA+nB+ nC=
80 Get from each of the
320
distributed group sample 38 + 132 + 82 = 252
480 of the population; there is
a need for groupings/
stratified variables 2. Non-proportional Allocation (Equal)
120 ➢ Divide the sample size into the number of strata
identified.
N= 800: Nurban = 320 and Nrural = 480 RULE: Always round up your final answers
n= 200: nurban =80 and nrural = 120
Example:
250
Example #3: nA =84 = 83.33 ~84
3
Suppose we have the following: Add the 84 +84+84= 252
250 nB =84
N=800 household of which Nurban = 320 and Nrural = 480 ❖ The sum should always be
n= 200 household of which nurban =80 and nrural = 120 equal or higher to the total
nC =84
sample size (N)
SAMPLING SAMPLING FRAME METHOD OF
DESIGN SAMPLE ❖ The higher the sample, the higher the reliability level
SELECTION
Simple List of 800 household, Select 200 numbers at Proportional allocation vs Non-proportional allocation
random numbered chronologically random, between 1 and 800 Proportional allocation Non- proportional
from 1 to 800 allocation
sampling
Stratified Two sampling frames are Urban and rural samples
random needed: are selected separately, as
sampling a. For URBAN areas: List follows: 3) SYSTEMATIC SAMPLING
of 320 urban a. For urban areas. 80 Characteristics:
household, numbers are
a. Every element has an equal chance of being selected
chronologically selected at random
b. It is often used under the following conditions:
numbered between 1 between 1 and 320
➢ The population elements are too many to list or to
and 320 number chronologically
b. For rural areas, 120 ➢ A frame is not available
b. For RURAL areas: List numbers are c. It is often used in combination with other designs
480 rural households, selected at random
chronologically between 1 and 480 Procedures:
numbered between 1 1. Determine the required sample size, n
and 480 2. Determine the sampling interval, k (skipping
pattern)
where:
ALLOCATION OF SAMPLES TO THE DIFFERENT 𝑁
STRATA 𝑘=
𝑛
1. Proportional Allocation 3. Select a number at random between 1 and k.
➢ Pagmas mataas yung original population (N) The population element in the frame
dapat mas mataas din yung sample size (n) corresponding to the random number selected
The 250 samples can be allocated to the 3 barangays will be the first to be included in the sample
(stratification variable) to reflect the population
4. Include in the sample survey every kth population
distribution as follows:
element after the first random number selected
Example: Example:
Barangay Population Size Sample size N= 1000; n=250
Number % Number %
A NA = 3000 15.0 nA = 38 15.0 𝑁
𝑘=
B NB =10,500 52.5 nB =131 52.5 𝑛
C NC =6500 32.5 nc =81 32.5 1000
TOTAL 20,000 100.0 250 100.0 𝑘= =4 You could use lottery method or
250
randomized using =RANDBETWEEN
Formula:

=RANDBETWEEN (1,4) III. SAMPLE SIZE DETERMINATION

If we got:
=2 (add 4) INFORMATION NEEDED FOR SAMPLE SIZE
DITERMINATION WHEN ESTIMATING A MEAN OR A
=6 (+4)
PROPOTION
=10(+4)
a) The anticipated value of the parameter to be
=14… until you reach 250 samples estimated in the study
➢ e.g., the prevalence of the disease
Example #2: o Expressed in proportion
➢ the average/mean grade of students given a
Using the same examples presented earlier where particular intervention, etc.
N= 800 n=200
➢ Possible sources of this value are:
SAMPLING SAMPLING METJOD OF SAMPLING SELECTION
DESGIN FRAME i. Previous studies or past records
Systematic Not needed 1. Compute for the sampling interval, k ii. Values derived from the pre-test or pilot
sampling where k=N/n. Therefore phase of the projects
k=800/2= 4 iii. An expert’s opinion or an educated guess
This means that for every 4 households will be iv. Conducting the study in two parts
selected as sample
2. Select a random number between 1 and 4. b) The degree of precision required for the resulting
Suppose #2 was selected. Therefore estimates (margin of error)
➢ the second household in the
➢ this can be expressed either in absolute
population to be studied is included
as sample Ex.
3. Every 4th household thereafter will be o ±5%) or relative terms
included on the study. These include o ±5% of the value of the resulting estimate
household number
2,6,10,14,18,22,26,30,34,38 etc. ➢ The value of an “acceptable” margin of error will
depend on:
i. The magnitude/ level of the parameter being

4) CLUSTER SAMPLING estimated
Characteristics: (the same as stratified) ii. How the results of the study will be used
a. It is used when a frame for the individual elementary iii. Available resources for the conduct of the
units in the population is not available. However, a study
frame for groups or cluster of elements is available Accuracy and Precision:
Accuracy: Near to the target value
b. The sampling unit is different from the elementary unit Precision: The closeness of values to
Procedures: each other
1. Identify groups or cluster of elementary units. It is best Example:
if the sizes of the clusters are not too big and do not Target estimated:
very much from each other. d=0.05
2. Select a random sample of clusters d=0.10
3. All elements in the selected cluster will included in the Estimate of the parameter
survey (population mean)
Thus, the lesser the margin of error 0.05
Example: the reliability level in increases
Elementary Pupils
0.10
No list of sampling frame of pupils
We have list of schools; you don’t
need to get samples in each c) The desired confidence levels
group, but you need to do ➢ standard levels used are 90%, 95%, and 99 %,
sampling to get randomized
with 95% being the most commonly used
cluster
Formula:
𝐶𝑙𝑢𝑠𝑡𝑒𝑟 (𝑅𝑎𝑛𝑑𝑜𝑚𝑖𝑧𝑒𝑑)
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 = (1−∝ (𝑎𝑙𝑝ℎ𝑎))
𝐴𝑙𝑙 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑒𝑛𝑡𝑠 𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 =∝ (𝑎𝑙𝑝ℎ𝑎)
Example:
STRS Cluster Sampling 1 − ∝= 0.95
Get representative per Select random clusters
∝= 0.05 (𝑡ℎ𝑒 𝑐ℎ𝑎𝑛𝑐𝑒 𝑜𝑓 𝑐𝑜𝑚𝑚𝑖𝑡𝑖𝑛𝑔 𝑒𝑟𝑟𝑜𝑟)
stratum (group) and include all
respondents from that 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒍𝒆𝒗𝒆𝒍 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝒍𝒆𝒗𝒆𝒍
clusters - Statistics - Chance of
- Deals with real committing error
Advantage of Cluster Sampling: Reserve cost and time life phenomena,
not perfect

- Percentage of Example for Information needed for sample size

your confidence determination when estimating a PROPORTION:
𝑧∝²
d) the estimated degree of variability of the 𝑝(𝑞)
observations (variance or standard deviation) 𝑛= 2
𝑑²
Where in:
Example: 𝑞 = 1−𝑃
𝑧∝² no need to square because q=is proportion variance
𝑃(𝑞)
𝑛= 2 given:
𝑑²
∝= 0.05 = 𝑍0.05/2
𝑍0.025 = 1.96 (𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑍 − 𝑣𝑎𝑙𝑢𝑒𝑠 𝑡𝑎𝑏𝑙𝑒)
Where:
𝑝 = 0.25
z= a value derived from the normal distribution and is
dependent on the desired confidence level for the 𝑞 = (1 − 𝑝) = (1 − 0.25) = 0.75
derivation of the estimate. The z-values corresponding to 𝑑(𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟) = 0.05
the standard confidence levels used when deriving
Substitute on the formula:
estimates in research studies are as follows.
Confidence level z-value 𝑧∝²
𝑝(𝑞)
90% 1.645 𝑛= 2
95% 1.96 𝑑²
99% 2.58 𝑍0.05 ²
0.25(1 − 0.25)
𝑛= 2
P & Q: The variance of the proportion 0.05²
P= anticipated value of the proportion to be estimated in 𝑍0.025 ²
the population (0.25)(0.75) Look for the Z value in the table
𝑛= 2 using the computed 0.025
Q= 1-P (the complement of P, where P+Q=1) 0.05²
2
d=the margin of error or maximum permissible error; a 1.96 (0.25)(0.75)
measure of desired level of precision for the resulting 𝑛=
estimates. 0.05²
Sample problem:
3.8416(0.1875) REMEMBER:
𝑛=
0.05² Set p= 0.5 and q=0.5 if you
3.8416(0.1875)
𝑛= don’t know the proportion
0.0025 variance
0.7203
𝑛=
0.0025
𝑛 = 288.12
DEALING WITH COMMON ISSUES IN SAMPLE SIZE

DETERMINATION
a) When deciding on the sample size requirements for a
study with more than one objective involving the
estimation and/or testing of several parameters and
hypothesis, the same size required of each important
parameter has to be computed and considered
b) When estimating a proportion whose value is

unknown, a common practice is to assume that
P=0.50.
o The basis for this is the fact that the variance of
indicators which are in the form of proportions
have a maximum value when P=0.50 and
Q=0.50, and hence will ensure an adequate
sample size irrespective of the actual value of P
c) When the sample design used make use of cluster

sampling instead of pure simple random sampling,
the sample size has to be corrected for the design
effect (deff)- i.e.,
n(cluster sampling) = n(simple random sampling) X deff

Deff = N= population size

➢ is the factor by which the sample size for a cluster Example:
sample has to be increase in order to derive N=350
estimates with the same precision as a simple
random sample. n=400 (which is not good)
➢ In the area of health, it has been shown for most 𝑛˳ = 4000
health surveys, deff=1.5 to 2.0, with deff= 2.0 𝑛˳
𝑛𝑓𝑝𝑐 =
being a common value used. 1 + (𝑛˳ − 1)/𝑁
d) In order to ensure that the required sample size is 400
reached, a correction factor for non-response is usually 1 + (400 − 1)/350
applied at the time of sample size determination. This 400
avoids the need for looking for “substitutes” during data
1 + 1.14
collection which usually introduces biases in sample
400
selection
2.14
o The non-response rate varies depending on the
187 is more valid because it is lesser than the total population
= 186.92~187
survey setting
ex. Urban areas generally have high non-response rated
compared to rural areas; surveys which ask for sensitive
questions also have higher non-response rates SUMMARY OF FORMULAS
o bit in general, an inflation factor of 10% for non- Proportional Allocation
response has been shown to be adequate in most 𝑁𝐴
situations. Therefore, if for example, the required
𝑛𝐴,𝐵,𝐶 = [ ]𝑥 𝑛
𝑁
sample size of a given survey after applying the Then, add all NA + NB + NC and so on…
design effect is 800, then the revised target Non-proportional Allocation (Equal)
sample size after applying for the correction
Just divide the sample size to the number of strata then
factor for non-response will be 800 + 80= 880.
add all answers again
Data collection activities should therefore be
planned for a sample size of 880 Systematic Sampling: Sampling Interval (K)
𝑁
Example for Information needed for sample size 𝑘=
determination when estimating a MEAN: 𝑛
𝑍∝2 𝑍∝2
𝜎 𝜎² Confidence Level & Significance Level
2
𝑛=( )² that Is 𝑛= 2
that is why this
𝑑² 𝑑²
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 = (1−∝ (𝑎𝑙𝑝ℎ𝑎))
became variance 𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 =∝ (𝑎𝑙𝑝ℎ𝑎)
SLOVIN’S FORMULA
Sample Size Determination When Estimating a PROPORTION
𝑁
𝑛= e= the margin of error; N= population
1+𝑁𝑒² 𝑧∝²
𝑝(𝑞)
𝑛= 2
ADDITIONAL CONSIDERATIONS IN SAMPLE SIZE
𝑑²
DETERMINATION Sample Size Determination When Estimating a MEAN
e) There are instances when the computed sample size is (Cochran’s Formula)
deemed too big relative to the population size.
𝑍 ∝2
o There are even instances when the computed 𝜎
sample size is bigger than the population size. 𝑛=( 2 )²
𝑑²
o This is when the finite population correction (fpc)
can be applied to determine the final sample size
to be considered. Slovin’s Formula
o The sample size formula after application of the
fpc is: 𝑁
𝑛=
𝑛˳ 1 + 𝑁𝑒²
nfpc =
1+(𝑛˳−1)/𝑁 Finite Population Correction Formula
𝑛˳
nfpc =
nfpc = computed sample size after application of finite 1+(𝑛˳−1)/𝑁
population corrections
n0 = initial sample size computed prior to application of fpc
Sample Random Sampling
=RANDBETWEEN(1,N)

05 Biostat

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 Biostat

Uploaded by

Copyright:

Available Formats

BIOSTATISTICS & EPIDEMIOLOGY BSMLS-2C

ARMAN ARQUILADA RMT 2025

TOPIC OUTLINE II. TYPES OF PROBABILITY SAMPLING

1. None Probability Sampling Design

❖ Probability Sampling Design: randomized Andrews Carig

BSMLS BATCH ’25 (Block 2C) Page 1 | 5

COMPARISON OF THE METHOD OF SAMPLE 𝑁𝐴

BSMLS BATCH ’25 (Block 2C) Page 2 | 5

=RANDBETWEEN (1,4) III. SAMPLE SIZE DETERMINATION

i. The magnitude/ level of the parameter being

BSMLS BATCH ’25 (Block 2C) Page 3 | 5

- Percentage of Example for Information needed for sample size

DEALING WITH COMMON ISSUES IN SAMPLE SIZE

b) When estimating a proportion whose value is

c) When the sample design used make use of cluster

n(cluster sampling) = n(simple random sampling) X deff

BSMLS BATCH ’25 (Block 2C) Page 4 | 5

Deff = N= population size

BSMLS BATCH ’25 (Block 2C) Page 5 | 5

You might also like