Professional Documents
Culture Documents
Biostat - Inferential Statistics 1 and 2 - Lec 4
Biostat - Inferential Statistics 1 and 2 - Lec 4
Biostat - Inferential Statistics 1 and 2 - Lec 4
TAGUIBAO | BSMT 2D
size for each of the → If that is the case, in
research study. If the order for you to get the
number of people you exact phenomenon,
want to study is within you might as well get
your reach, or as the entire population’s
resources are response.
concerned, then by all → For example, the
means you can just do prevalence of a
what we call “total specific disease entity is
enumeration.” Which is very small in a
like a census. Therefore, particular population.
it is no longer in the When you a sample a
realm of inferential number of individuals
statistics. or representatives from
→ But, if you can sample that population, you
the entire population, may not get the real
but it is too large for picture because the
you, then by all means, incidence of a
compute for an particular factor is very
appropriate sample small, or the response is
size which you will need not that high enough.
to do sampling design
because you are
already particularly
interested in doing
inferential statistics.
➢ There are three factors that influence
sample representativeness:
o Sampling procedure
→ Otherwise known as
“sampling design”, “This is just a diagram to show you where you get
“sampling technique”, the sample. You start first with your TARGET
“sampling POPULATION the define it. From that target
methodology”. population, what particular subgroup will you
o Sample size study which will be your STUDY POPULATION.
→ Is it large enough for From that study population, you will then define
you to derive a your SAMPLE. Very important in research is that
particular conclusion? you must define the target, study, and sample
→ Are the samples populations.”
randomly chosen?
There weren’t any WHY COMPUTE FOR THE SAMPLE SIZE?
biases involved? ➢ To assure confidence in the study results
o Participation (response) and conclusions.
WHEN MIGHT YOU SAMPLE THE ENTIRE ➢ To estimate an appropriate number of
POPULATION? subjects for a given study design, i.e, the
o When your population is very number needed to find the result you
small. are looking for with the least bias.
o When you have extensive
resources. Remember!
o When you do not expect a very • In reality, you are only making a ballpark
high response. estimate.
TAGUIBAO | BSMT 2D
• Samples are just there for you to the study, when major changes are still
estimate the real value of a particular possible.
characteristic if you have the chance to → In developing a research
study the population. It depends on the design, you would already take
population size. into consideration confounding
• If you collect a sample that is way below variables.
the sample size, you may fail to answer ➢ In addition to the statistical analysis plan,
the research question because the the sample size section is critical to a
power of your study will be very low. research proposal of any kind of grant.
• Too large of a sample will be more → Your research will somehow be
difficult and costly especially when you driven by a sponsor or a grantor,
are doing a prospective study design the number of samples will
such as randomize control trial, really define how much that
community control trials like vaccine sponsor will give you as far as
efficacy studies. On top of that, you funding the research is
may be needlessly exposing a number concerned.
of ‘subjects’ to possible harm. ➢ According to research, 42% of research
• It is important to know that although a proposals examined in one review
useful guide, sample size calculations paper were criticized for the sample size
give a deceptive impression of justifications or analysis plans.
statistical objectivity. → Some have bloated work and
• We will also be requiring assumptions in financial plan. Because of that
order to compute for the sample size. sample size, they were criticized
These assumptions could be right or if this was enough to really
wrong. But we need to have previous accurately represent the total
figure study results in order for us to study population.
gauge our samples in the study we are ➢ Worst, some are much more involved in
about to start. the cut-and-paste paragraph.
→ Most of these novice
SAMPLE SIZE researchers become tired of
➢ Can only be as accurate as the data coming up of their own
and estimates on which they are based. description or sentences, with a
→ If the target population is convenient source they can just
biased, then you will expect cut-and-paste the research
that the results of your sample that were published online.
will also be biased. → They will be suffering
➢ Often reveals that the research design is technicalities such as
not feasible, or that different predictor or plagiarism, data privacy act.
outcome variables are needed.
→ Some researchers, specifically
STATISTICAL POWER
novice researchers, would only
look at one particular risk factor ➢ A crucial element in the computation of
in COVID when in fact it needs sample size is what we call “statistical
several factors that are power.”
interacting with one another. ➢ Power is the probability that the null
→ It is very important to have the hypothesis is rejected using that
knowledge of research design particular statistical test, if a specific
or the knowledge of alternative hypothesis is true.
confounding and bias. ➢ It is the ability of that particular statistical
➢ RULE OF THUMB: sample size should be test to detect the true change of a
estimated early in the design phase of
TAGUIBAO | BSMT 2D
particular characteristic or variable in randomize control trials, are
the population. what we call as:
• β – represents Type II error or false • Cohen’s d or delta
negative values of a set of data, the value – assuming SD’s
probability of not rejecting the null are the same in each
hypothesis when the given group.
alternative is true. → 0.2 = small
1-β= power effect
➢ The power of a study should be → 0.5 = medium
minimally 80% and often, studies are effect
designed to have 90-95% power to → 0.8 or higher =
detect a particular clinical effect. large effect
→ When you start computing for (better)
the sample size using softwares, • Glass’s delta – when
you will be asked for the power the SD’s of the groups
of the study. differ.
→ In medical research, it is usually - The amount of difference between two
at 80% as the minimum power groups in standard deviation units.
of the research statistics. If you - PURPOSE:
are not yet confident of the o It is used as a counterpoint to
data you have gathered, do significance tests where it gives
not manipulate the power. Let it an indication of how big or
remain at 80%. If you are small a significant difference is.
confident enough, then you This difference can then be
may increase it to 90-95%. compared to Cohen’s
WHAT FACTORS AFFECT POWER? estimates as of what is typical of
a small, medium, or large
o α – level of significance
effect.
o β – the false negative
→ Significance tests
o variability of your data
meaning the statistical
o baseline incidence
tests of significance.
o n – sample size
→ If you really want more
That is why we study statistical power because
detailed evidence of
this is a very important element in computing
your statistical test, you
sample size.
can compute for the
effect size.
ELEMENTS OF COMPUTING STATISTICAL POWER
o To provide a common measure
1. Effect Size
on which to compare effects
- It is the deviation from the null
for meta-analysis or what not
that the investigator wishes to
when outcome variables may
be able to detect.
be measured on different
*null – 1 or no effect; equivocal; not beneficial
scales.
and not harmful
EXAMPLE
- It is a quantitative measure of
the strength of a phenomenon /
association / relationship.
- One common effect size that
you often meet especially
➢ There was already a prior statistical test
when reading or coming across
which showed a significant difference in
prospective studies or
self-esteem. The researchers did not
stop there. They then performed a
TAGUIBAO | BSMT 2D
Cohen’s d determination which they compute for the sample size because as
have found a value of .50. a researcher you do not have enough
➢ In other words, the intervention was resources to get swabs of all medtech
effective because it was associated students in velez. So, you just get a
with an increase in self-esteem by 1 sample of each of the year level. One of
standard deviation from the mean. the requirements for getting the sample
INTERPRETING EFFECT SIZES: size is the baseline incidence rate of a
previous study with similar interest. You
would want to use the incidence rate of
that study to compute for the sample
size of your study.
➢ This is related to effect size because
effect size is relative to the prevalence
or incidence rate.
➢ Power is directly related to effect size,
sample size and significance level. An
EXAMPLE increase in either the effect size, sample
size or the significance level will produce
increased statistical power, all other
factors being equal.
→ The more samples, the more
➢ The effect size is 1.82 looking at the powerful your statistics will be.
Cohen’s d. We know that with a value ➢ Power is inversely related to variability.
of more than 0.8 is a large effect size. Decreasing variability will increase the
With this, more or less, this is a well power of a study.
conducted study. → And increasing variability, of
course, will decrease the power
2. Variability of the study.
➢ May be expressed in terms of a standard EXAMPLE:
deviation or an appropriate measure of
variability for the statistic. If the
hypotheses are concerned with a
population proportion, the value of the
proportion and the sample size are used
to calculate the variability. The ➢ For this example, a p value of less than
investigator will need an estimate of the 0.05 is significant. And a p value of more
variability in order to calculate the than 0.05 is less statistically significant.
power.
3. Baseline Incidence
➢ It is related to the effect size.
➢ If it is hypothesized that a rate has
increased or decreased, the baseline
rate and the effect size must both be SAMPLING
known to calculate the power for
detecting such change. ➢ SAMPLING BIAS
➢ For example, if you are interested in → Is a systematic error due to
knowing the particular prevalence rate study of a nonrandom sampling
of a COVID-19 in a particular institution, of a population.
you need to know what you want to → If you do not perform
randomization, chances are
TAGUIBAO | BSMT 2D
you will be committing
sampling bias. You will be
biased of the particular
characteristic of the sample 2. nQuery Advisor
group you chose. - not free
→ Randomization is one effective
way of eliminating bias.
➢ SAMPLING ERROR
→ Occur because of variation in
the number or
representativeness of the 3. G Power
sample that responds. Sampling
errors can be controlled and
reduced by (1) careful sample
designs, (2) large enough
samples, (3) and multiple
contacts to assure a 4. Open Epi
representative response. A - this is usually
sample may be obtained suggested for us.
randomly but they may not
always be representative.
→ When you do a qualitative
design, when you ask several
individuals, multiple contacts
until you get saturated 5. Epi Info
responses. This is how you - This is a software developed by the
decrease sampling error. CDC.
➢ SAMPLING VARIATION - Used for
→ Since the inclusion of individuals disease surveillance.
in a sample is determined by
chance, the results of analysis
in two or more samples will differ
purely by chance.
→ There are natural differences
amongst individuals. Blood ADDITIONAL CONSIDERATIONS FOR COMPUTING
pressure may be different in SAMPLE SIZE:
other people and even heart. - Consider strategies for minimizing
These differences are what we sample size and maximizing power,
term as “variation.” which include using:
o Continuous variables
SAMPLE SIZE SOFTWARES o Paired measurements
➢ There are several softwares available o Unequal group sizes
nowadays in order for you to compute o A more common binary
for a good sample size: outcome (ie, prevalent)
1. Power and Sample Size Calculation (PS) - Useful to calculate and report a range of
- free for sample sizes by assuming different
download or use combinations of parameter values -
online. take the largest sample size to cover all
bases.
TAGUIBAO | BSMT 2D
- Always justify the feasibility of the 1 Defining the population of concern.
calculated sample size. 2 Specifying a sample frame, a set of items
→ Why did you come up with such or events possible to measure.
a sample size? 3 Specifying a sampling method for
- How long would it take to accrue/enroll selecting items or events from the frame.
the subjects? 4 Determining the sample size.
→ You may want to just enroll for 5 Implementing the sampling plan.
example 6 months of your data 6 Sampling and data collecting.
gathering period in order for 7 Reviewing the sampling process.
you to come up with a sample
size. ➢ In any research proposal, if you plan to
- Need to consider the source of subjects, do inferential statistics, you have to
the inclusions/exclusions criteria the mention sample size calculation and
prevalence of the outcome and etc. sampling design.
→ You have to define the
inclusion/exclusion criteria. Who PROBABILITY SAMPLING
are included? Who will be in ➢ The underlying mechanism here is
your exclusion? randomization. The probability of each
of the units to be chosen with an equal
SAMPLING DESIGNS chance to be chosen.
➢ This is far better than non-probability
*SAMPLING FRAME = it is important to have an sampling.
updated sampling frame because this is the ➢ Every unit in the population has a
source material or device from which a sample greater than zero chance of being
is drawn. It is a list of all those within a population selected in the sample and this,
who can be sampled, and may include probability can be accurately
individuals, households, or institutions. It must be determined.
representative of the population and should be ➢ When every element in the population
an updated list. does have the same probability of
➢ These are the most common sampling selection, this is known as an “equal
designs. Two major criteria are: probability of selection design” (EPS).
o Probability (Random) Samples Such designs are also referred to as ‘self-
• Simple random sample weighting’ because all sample units or
• Systematic random given the same weight.
sample ➢ The evidence is better, stronger, and
• Stratified random more reliable if you do probability
sample sampling.
• Multistage sample
• Cluster sample NON-PROBABILITY SAMPLING
o Non-Probability Samples
• Convenience sample
• Purposive/ Judgmental
sample
• Quota
➢ The process of sampling must be
included in the proposal. The sampling ➢ It is based on convenience or maybe
process comprises several stages: because you do not have enough
resources that you have to resort in non-
probability sampling.
TAGUIBAO | BSMT 2D
➢ It doesn’t mean if you are doing non- DISADVANTAGES
probability sampling, your evidence is If sampling frame is large, this method is
not that strong enough. impracticable.
➢ Examples of the common non- Minority subgroups of interest in population may
probability sampling designs includes: not be present in sample in sufficient numbers for
o Accidental sampling study
o Quota sampling → Some simple random sampling in animal
o Purposive sampling studies does “WOR” or “without
➢ In addition, nonresponse effects may replacement” and “WR” or “with
turn any probability design into a replacement.
nonprobability design if the
characteristics of the nonresponse are
not well understood, since nonresponse
effectively modifies each element’s
probability of being sampled.
TAGUIBAO | BSMT 2D
➢ It is important that the starting point is Sample evenly
not automatically the first in the list but is spread over entire
instead randomly chosen from within reference
the first to the kth element (sampling population.
interval) in the list. ➢ There is a way to minimize errors in
➢ A simple example would be to select systematic sampling in the analysis.
every 10th name from the telephone
directory. (an ‘every 10th’ sample, also PROBABILITY SAMPLING: Stratified Random
referred to as ‘sampling with a skip of ➢ Where population embraces a number
10’). of distinct categories, the frame can be
➢ This is calculating the kth element = organized in to separate “strata”.
sampling interval. This is calculated by ➢ Each stratum is then sampled as an
the total number of populations over the independent subpopulation, out of
sample size which will be our kth which individual elements it can be
element. But you have to arrive at a randomly selected.
random number from the random • Every unit in a stratum has the
number calculator. You identify what same chance of being
that random number is, and this is where selected.
you start you sampling interval. • Using same sampling fraction
➢ For example, you have gotten a for all strata ensures
random number of 2 or 5, you then proportionate representation in
identify the 5th element of your sampling the sample.
frame and start from there. If your • Adequate representation of
sampling interval is 10, you count from minority subgroups of interest
the 5th element, the first sample is the can be ensured by stratification
15th element. You continue counting and varying sampling fraction
until you arrive at the sample size. between strata as required.
➢ This is basically doing systematic random
sampling in certain groups or strata.
TAGUIBAO | BSMT 2D
➢ A sample of such clusters is then ➢ Differences between the two are the
selected. homogeneity of the groups. Strata may
➢ All units from the selected clusters are be heterogeneous. For clusters, the
studied. homogeneity of the samples are
ADVANTAGES DISADANTAGES present.
Cuts down on the Sampling error is
cost of preparing a higher for a simple PROBABILITY SAMPLING: Multistage Sampling
sampling frame. random sample of
the same size.
This can reduce travel Generally, increases
and other the variability of
administrative costs. sample estimates
above that of simple
random sampling.
➢ For this reason, cluster sampling requires
a larger sample than SRS to achieve the
same level of accuracy -but cost
savings from clustering might still make
this a cheaper option.
➢ Often used to evaluate vaccination
coverage in EPI.
➢ In essence, there is no perfect sampling
design. What we can do is develop a
better sampling design in order to
eliminate bias by looking at your PROBABILITY SAMPLING: Matched Sampling
resources.
➢ It is better and ideal to sample each of
the participants or units. But again, ideal
may necessarily not be achievable.
EXAMPLE:
TAGUIBAO | BSMT 2D
NON-PROBABILITY SAMPLING: Convenience
Sampling
➢ This is basically known as grab or
opportunity sampling or accidental or
haphazard sampling.
➢ A type of non-probability sampling
which involves the sample being drawn
from that part of the population which is
close to hand. That is, readily
available and convenient.
➢ The researcher using such a sample
cannot scientifically make
generalizations about the total
population from this sample because it
would not be representative enough.
TAGUIBAO | BSMT 2D