Biostat - Inferential Statistics 1 and 2 - Lec 4

1st SEMESTER | 2022-2023 SEPTEMBER 19, 2022


LECTURER: Dr. Junjie Zuasula, MD, PHSAE, CTTS

variables that we are analyzing. We

TOPIC OUTLINE need to know that specific statistical
test that is appropriate for that particular
INFERENTIAL STATISTICS variable. We need to know the
characteristic, the nature, and type of
variable because each type of variable
STATISTICAL POWER has a specific appropriate statistical test
that you need to utilize in order for you
SAMPLING to come up with the right results by
calculating it mathematically.
➢ The process of selecting a number of
subjects from all the subjects in a
INFERENTIAL STATISTICS particular group, or in statistical research
we call it “universe.”
- By the term alone “inferential”, meaning
➢ From there, you derive conclusions and
to say from the root word “inference”
some appropriate recommendation
which is an educated guess. You want
based on sample results that may be
to infer basing on a set of samples on the
attributed only to the population
characteristics of the total population.
- Without studying the total population,
➢ From a smaller set of units, you can now
just by taking samples, and studying a
infer that most likely, most probably, this
particular variable/s of the particular
is also what is happening to the whole
sample, you can be able to generalize
population or your universe.
the variables or characteristics into the
➢ There are several populations that you
total population.
may be interested with in when you start
your research. One of the questions that
SAMPLE you can use to be guided is:
o To whom do you want to
➢ Is a selected subset of a population.
generalize your results?
➢ May be random or nonrandom and may
→ Doctors, school
be representative or non-
children, Indians,
representative. But we want to have
Women aged 15-45
representative samples.
years, and other
➢ A smaller (but hopefully representative)
specific groups as long
collection of units from a population
as it is a defined group.
used to determine truths about that
o Can you sample the entire
population (Field, 2005)
→ If not, then maybe the
➢ We sample because we do not have
population is just too
time or resources.
small which is no longer
➢ Another usual reason is because there
logical for you to
are only a few people who research.
Such that manpower resources, is one of
→ If you can reach the
the reasons why we need to sample.
whole population then
➢ We want results with known accuracy
why not.
that can be calculated
→ You do not have to
mathematically. This depends on the
calculate the sample

size for each of the → If that is the case, in
research study. If the order for you to get the
number of people you exact phenomenon,
want to study is within you might as well get
your reach, or as the entire population’s
resources are response.
concerned, then by all → For example, the
means you can just do prevalence of a
what we call “total specific disease entity is
enumeration.” Which is very small in a
like a census. Therefore, particular population.
it is no longer in the When you a sample a
realm of inferential number of individuals
statistics. or representatives from
→ But, if you can sample that population, you
the entire population, may not get the real
but it is too large for picture because the
you, then by all means, incidence of a
compute for an particular factor is very
appropriate sample small, or the response is
size which you will need not that high enough.
to do sampling design
because you are
already particularly
interested in doing
inferential statistics.
➢ There are three factors that influence
sample representativeness:
o Sampling procedure
→ Otherwise known as
“sampling design”, “This is just a diagram to show you where you get
“sampling technique”, the sample. You start first with your TARGET
“sampling POPULATION the define it. From that target
methodology”. population, what particular subgroup will you
o Sample size study which will be your STUDY POPULATION.
→ Is it large enough for From that study population, you will then define
you to derive a your SAMPLE. Very important in research is that
particular conclusion? you must define the target, study, and sample
→ Are the samples populations.”
randomly chosen?
biases involved? ➢ To assure confidence in the study results
o Participation (response) and conclusions.
WHEN MIGHT YOU SAMPLE THE ENTIRE ➢ To estimate an appropriate number of
POPULATION? subjects for a given study design, i.e, the
o When your population is very number needed to find the result you
small. are looking for with the least bias.
o When you have extensive
resources. Remember!
o When you do not expect a very • In reality, you are only making a ballpark
high response. estimate.

• Samples are just there for you to the study, when major changes are still
estimate the real value of a particular possible.
characteristic if you have the chance to → In developing a research
study the population. It depends on the design, you would already take
population size. into consideration confounding
• If you collect a sample that is way below variables.
the sample size, you may fail to answer ➢ In addition to the statistical analysis plan,
the research question because the the sample size section is critical to a
power of your study will be very low. research proposal of any kind of grant.
• Too large of a sample will be more → Your research will somehow be
difficult and costly especially when you driven by a sponsor or a grantor,
are doing a prospective study design the number of samples will
such as randomize control trial, really define how much that
community control trials like vaccine sponsor will give you as far as
efficacy studies. On top of that, you funding the research is
may be needlessly exposing a number concerned.
of ‘subjects’ to possible harm. ➢ According to research, 42% of research
• It is important to know that although a proposals examined in one review
useful guide, sample size calculations paper were criticized for the sample size
give a deceptive impression of justifications or analysis plans.
statistical objectivity. → Some have bloated work and
• We will also be requiring assumptions in financial plan. Because of that
order to compute for the sample size. sample size, they were criticized
These assumptions could be right or if this was enough to really
wrong. But we need to have previous accurately represent the total
figure study results in order for us to study population.
gauge our samples in the study we are ➢ Worst, some are much more involved in
about to start. the cut-and-paste paragraph.
→ Most of these novice
SAMPLE SIZE researchers become tired of
➢ Can only be as accurate as the data coming up of their own
and estimates on which they are based. description or sentences, with a
→ If the target population is convenient source they can just
biased, then you will expect cut-and-paste the research
that the results of your sample that were published online.
will also be biased. → They will be suffering
➢ Often reveals that the research design is technicalities such as
not feasible, or that different predictor or plagiarism, data privacy act.
outcome variables are needed.
→ Some researchers, specifically
novice researchers, would only
look at one particular risk factor ➢ A crucial element in the computation of
in COVID when in fact it needs sample size is what we call “statistical
several factors that are power.”
interacting with one another. ➢ Power is the probability that the null
→ It is very important to have the hypothesis is rejected using that
knowledge of research design particular statistical test, if a specific
or the knowledge of alternative hypothesis is true.
confounding and bias. ➢ It is the ability of that particular statistical
➢ RULE OF THUMB: sample size should be test to detect the true change of a
estimated early in the design phase of

particular characteristic or variable in randomize control trials, are
the population. what we call as:
• β – represents Type II error or false • Cohen’s d or delta
negative values of a set of data, the value – assuming SD’s
probability of not rejecting the null are the same in each
hypothesis when the given group.
alternative is true. → 0.2 = small
1-β= power effect
➢ The power of a study should be → 0.5 = medium
minimally 80% and often, studies are effect
designed to have 90-95% power to → 0.8 or higher =
detect a particular clinical effect. large effect
→ When you start computing for (better)
the sample size using softwares, • Glass’s delta – when
you will be asked for the power the SD’s of the groups
of the study. differ.
→ In medical research, it is usually - The amount of difference between two
at 80% as the minimum power groups in standard deviation units.
of the research statistics. If you - PURPOSE:
are not yet confident of the o It is used as a counterpoint to
data you have gathered, do significance tests where it gives
not manipulate the power. Let it an indication of how big or
remain at 80%. If you are small a significant difference is.
confident enough, then you This difference can then be
may increase it to 90-95%. compared to Cohen’s
WHAT FACTORS AFFECT POWER? estimates as of what is typical of
a small, medium, or large
o α – level of significance
o β – the false negative
→ Significance tests
o variability of your data
meaning the statistical
o baseline incidence
tests of significance.
o n – sample size
→ If you really want more
That is why we study statistical power because
detailed evidence of
this is a very important element in computing
your statistical test, you
sample size.
can compute for the
effect size.
o To provide a common measure
1. Effect Size
on which to compare effects
- It is the deviation from the null
for meta-analysis or what not
that the investigator wishes to
when outcome variables may
be able to detect.
be measured on different
*null – 1 or no effect; equivocal; not beneficial
and not harmful
- It is a quantitative measure of
the strength of a phenomenon /
association / relationship.
- One common effect size that
you often meet especially
➢ There was already a prior statistical test
when reading or coming across
which showed a significant difference in
prospective studies or
self-esteem. The researchers did not
stop there. They then performed a

Cohen’s d determination which they compute for the sample size because as
have found a value of .50. a researcher you do not have enough
➢ In other words, the intervention was resources to get swabs of all medtech
effective because it was associated students in velez. So, you just get a
with an increase in self-esteem by 1 sample of each of the year level. One of
standard deviation from the mean. the requirements for getting the sample
INTERPRETING EFFECT SIZES: size is the baseline incidence rate of a
previous study with similar interest. You
would want to use the incidence rate of
that study to compute for the sample
size of your study.
➢ This is related to effect size because
effect size is relative to the prevalence
or incidence rate.
➢ Power is directly related to effect size,
sample size and significance level. An
EXAMPLE increase in either the effect size, sample
size or the significance level will produce
increased statistical power, all other
factors being equal.
→ The more samples, the more
➢ The effect size is 1.82 looking at the powerful your statistics will be.
Cohen’s d. We know that with a value ➢ Power is inversely related to variability.
of more than 0.8 is a large effect size. Decreasing variability will increase the
With this, more or less, this is a well power of a study.
conducted study. → And increasing variability, of
course, will decrease the power
2. Variability of the study.
➢ May be expressed in terms of a standard EXAMPLE:
deviation or an appropriate measure of
variability for the statistic. If the
hypotheses are concerned with a
population proportion, the value of the
proportion and the sample size are used
to calculate the variability. The ➢ For this example, a p value of less than
investigator will need an estimate of the 0.05 is significant. And a p value of more
variability in order to calculate the than 0.05 is less statistically significant.

3. Baseline Incidence
➢ It is related to the effect size.
➢ If it is hypothesized that a rate has
increased or decreased, the baseline
rate and the effect size must both be SAMPLING
known to calculate the power for
detecting such change. ➢ SAMPLING BIAS
➢ For example, if you are interested in → Is a systematic error due to
knowing the particular prevalence rate study of a nonrandom sampling
of a COVID-19 in a particular institution, of a population.
you need to know what you want to → If you do not perform
randomization, chances are

you will be committing
sampling bias. You will be
biased of the particular
characteristic of the sample 2. nQuery Advisor
group you chose. - not free
→ Randomization is one effective
way of eliminating bias.
→ Occur because of variation in
the number or
representativeness of the 3. G Power
sample that responds. Sampling
errors can be controlled and
reduced by (1) careful sample
designs, (2) large enough
samples, (3) and multiple
contacts to assure a 4. Open Epi
representative response. A - this is usually
sample may be obtained suggested for us.
randomly but they may not
always be representative.
→ When you do a qualitative
design, when you ask several
individuals, multiple contacts
until you get saturated 5. Epi Info
responses. This is how you - This is a software developed by the
decrease sampling error. CDC.
→ Since the inclusion of individuals disease surveillance.
in a sample is determined by
chance, the results of analysis
in two or more samples will differ
purely by chance.
→ There are natural differences
pressure may be different in SAMPLE SIZE:
other people and even heart. - Consider strategies for minimizing
These differences are what we sample size and maximizing power,
term as “variation.” which include using:
o Continuous variables
SAMPLE SIZE SOFTWARES o Paired measurements
➢ There are several softwares available o Unequal group sizes
nowadays in order for you to compute o A more common binary
for a good sample size: outcome (ie, prevalent)
1. Power and Sample Size Calculation (PS) - Useful to calculate and report a range of
- free for sample sizes by assuming different
download or use combinations of parameter values -
online. take the largest sample size to cover all

- Always justify the feasibility of the 1 Defining the population of concern.
calculated sample size. 2 Specifying a sample frame, a set of items
→ Why did you come up with such or events possible to measure.
a sample size? 3 Specifying a sampling method for
- How long would it take to accrue/enroll selecting items or events from the frame.
the subjects? 4 Determining the sample size.
→ You may want to just enroll for 5 Implementing the sampling plan.
example 6 months of your data 6 Sampling and data collecting.
gathering period in order for 7 Reviewing the sampling process.
you to come up with a sample
size. ➢ In any research proposal, if you plan to
- Need to consider the source of subjects, do inferential statistics, you have to
the inclusions/exclusions criteria the mention sample size calculation and
prevalence of the outcome and etc. sampling design.
→ You have to define the
inclusion/exclusion criteria. Who PROBABILITY SAMPLING
are included? Who will be in ➢ The underlying mechanism here is
your exclusion? randomization. The probability of each
of the units to be chosen with an equal
SAMPLING DESIGNS chance to be chosen.
➢ This is far better than non-probability
*SAMPLING FRAME = it is important to have an sampling.
updated sampling frame because this is the ➢ Every unit in the population has a
source material or device from which a sample greater than zero chance of being
is drawn. It is a list of all those within a population selected in the sample and this,
who can be sampled, and may include probability can be accurately
individuals, households, or institutions. It must be determined.
representative of the population and should be ➢ When every element in the population
an updated list. does have the same probability of
➢ These are the most common sampling selection, this is known as an “equal
designs. Two major criteria are: probability of selection design” (EPS).
o Probability (Random) Samples Such designs are also referred to as ‘self-
• Simple random sample weighting’ because all sample units or
• Systematic random given the same weight.
sample ➢ The evidence is better, stronger, and
• Stratified random more reliable if you do probability
sample sampling.
• Multistage sample
o Non-Probability Samples
• Convenience sample
• Purposive/ Judgmental
• Quota
➢ The process of sampling must be
included in the proposal. The sampling ➢ It is based on convenience or maybe
process comprises several stages: because you do not have enough
resources that you have to resort in non-
probability sampling.

➢ It doesn’t mean if you are doing non- DISADVANTAGES
probability sampling, your evidence is If sampling frame is large, this method is
not that strong enough. impracticable.
➢ Examples of the common non- Minority subgroups of interest in population may
probability sampling designs includes: not be present in sample in sufficient numbers for
o Accidental sampling study
o Quota sampling → Some simple random sampling in animal
o Purposive sampling studies does “WOR” or “without
➢ In addition, nonresponse effects may replacement” and “WR” or “with
turn any probability design into a replacement.
nonprobability design if the
characteristics of the nonresponse are
not well understood, since nonresponse
effectively modifies each element’s
probability of being sampled.


➢ It is the most common method of
probability sampling.
➢ One of the common simple random
➢ WR (with replacement): Once a sample
sampling designs is the fishbowl
is chosen, there is a design that could
sampling where you assign a number to
either study that particular sample or
each of the elements and roll it into
animal and put it back. There will then
pieces of paper and put it in the fish
be a chance that that particular animal
bowl for it to be picked randomly until
will be chosen again.
you get the desired sample size. It is
➢ WOR (without replacement): Once that
important that you assign each of the
particular sample is chosen, then you
units in random numbers.
have to put it somewhere else wherein
➢ Applicable when population is small,
the chance of it not being chosen is
homogeneous and readily available
already there. In other words, it wont be
➢ All subsets of the frame are given an
chosen again.
equal probability. Each element of the
frame thus has an equal probability of
➢ Relies on arranging the target
➢ It provides for greatest number of
population according to some ordering
possible samples. This is done by
scheme and then selecting elements at
assigning a number to each unit in the
regular intervals through that ordered
sampling frame.
➢ A table of random number or lottery
➢ Systematic sampling involves a random
system is used to determine which units
start and then proceeds with the
are to be selected.
selection of every kth element from then
➢ One important aspect is you need to
onwards. In this case, k = population
generate a random number using a
size/sample size. We are arranging it in
random number generator.
chronological or natural order. From
there, you can start selecting your
sample based on the sample size
→ Estimates are easy to calculate.
→ Simple random sampling is
➢ It is important to have the kth element or
always an EPS design, but not
the sampling interval.
all EPS designs are simple
random sampling.

➢ It is important that the starting point is Sample evenly
not automatically the first in the list but is spread over entire
instead randomly chosen from within reference
the first to the kth element (sampling population.
interval) in the list. ➢ There is a way to minimize errors in
➢ A simple example would be to select systematic sampling in the analysis.
every 10th name from the telephone
directory. (an ‘every 10th’ sample, also PROBABILITY SAMPLING: Stratified Random
referred to as ‘sampling with a skip of ➢ Where population embraces a number
10’). of distinct categories, the frame can be
➢ This is calculating the kth element = organized in to separate “strata”.
sampling interval. This is calculated by ➢ Each stratum is then sampled as an
the total number of populations over the independent subpopulation, out of
sample size which will be our kth which individual elements it can be
element. But you have to arrive at a randomly selected.
random number from the random • Every unit in a stratum has the
number calculator. You identify what same chance of being
that random number is, and this is where selected.
you start you sampling interval. • Using same sampling fraction
➢ For example, you have gotten a for all strata ensures
random number of 2 or 5, you then proportionate representation in
identify the 5th element of your sampling the sample.
frame and start from there. If your • Adequate representation of
sampling interval is 10, you count from minority subgroups of interest
the 5th element, the first sample is the can be ensured by stratification
15th element. You continue counting and varying sampling fraction
until you arrive at the sample size. between strata as required.
➢ This is basically doing systematic random
sampling in certain groups or strata.


➢ Usually used in public health studies.
➢ Two types of cluster sampling methods
Sample is easy to Sample may be A sample of areas is chosen.
select. biased if hidden i.e., communities. All of the
periodicity in elements within selected clusters
population coincides are included in the sample.
with that of selection.
i.e., if there is a
A sample of respondents within
particular season that
those areas is selected. A subset
a particular variable of elements within selected
exists, and you chose
clusters are randomly selected for
another season, then
inclusion in the sample.
chances are the
➢ Population divided into clusters of
homogeneous units, usually based on
will suffer.
geographical contiguity.
Suitable sampling Difficult to assess
➢ Sampling units are groups rather than
frame can be precision of estimate individuals i.e., households.
identified easily. from one survey.

➢ A sample of such clusters is then ➢ Differences between the two are the
selected. homogeneity of the groups. Strata may
➢ All units from the selected clusters are be heterogeneous. For clusters, the
studied. homogeneity of the samples are
Cuts down on the Sampling error is
cost of preparing a higher for a simple PROBABILITY SAMPLING: Multistage Sampling
sampling frame. random sample of
the same size.
This can reduce travel Generally, increases
and other the variability of
administrative costs. sample estimates
above that of simple
random sampling.
➢ For this reason, cluster sampling requires
a larger sample than SRS to achieve the
same level of accuracy -but cost
savings from clustering might still make
this a cheaper option.
➢ Often used to evaluate vaccination
coverage in EPI.
➢ In essence, there is no perfect sampling
design. What we can do is develop a
better sampling design in order to
eliminate bias by looking at your PROBABILITY SAMPLING: Matched Sampling
➢ It is better and ideal to sample each of
the participants or units. But again, ideal
may necessarily not be achievable.



➢ The population is first segmented into
mutually exclusive subgroups, just as in
stratified sampling.
➢ Then judgment used to select subjects
or units from each segment based on a
➢ Both of these are groups but strata is
specified proportion.
particularly the groups involved in
➢ For example, an interviewer maybe told
sampling. Clusters are another group
to sample 200 females and 300 males
but specifically which of the group will
between the age of 45 and 60.
be chosen.
➢ A good example is when you enter SNR
➢ Strata is just defining the categories of
or Landers where they are giving
each group while clusters are specific
samples of food to taste which depicts
groups that you choose to study.
Quota sampling. Once they reach their
sample size, they stop giving out.

➢ This is basically known as grab or
opportunity sampling or accidental or
haphazard sampling.
➢ A type of non-probability sampling
which involves the sample being drawn
from that part of the population which is
close to hand. That is, readily
available and convenient.
➢ The researcher using such a sample
cannot scientifically make
generalizations about the total
population from this sample because it
would not be representative enough.


Purposive Sampling
➢ The researcher chooses the sample
based on who they think would be
appropriate for the study.
➢ This is used primarily when there is a
limited number of people that have
expertise in the area being researched.
➢ For example, the research question is
that the knowledge, attitude, and
practices of OB-GYN Practitioners in
promoting condom use among
couples. The group there is the OB-GYN
so that is purposive. In this, the
researchers are clear that they only
want to study OB-GYN practitioners.


