Professional Documents
Culture Documents
USMLE Biostats & Epi - A Complete Review - Azfar Basunia
USMLE Biostats & Epi - A Complete Review - Azfar Basunia
BIOSTATISTICS
A complete review
Hello!
@AzfarBasunia
Azfar Basunia, MD
2
Table of
Contents
Azfar Basunia, MD 3
1 Epidemiology & Population Health
Azfar Basunia, MD
Measures of
disease frequency
Prevalence:
╸ Disease frequency or burden of disease
# 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝑎𝑡 𝑎 𝑡𝑖𝑚𝑒 𝑝𝑜𝑖𝑛𝑡
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑡ℎ𝑎𝑡 𝑡𝑖𝑚𝑒 𝑝𝑜𝑖𝑛𝑡
╸ Pretest probability of disease
╸ Directly related to PPV and inversely related to NPV
Azfar Basunia, MD 5
Measures of
disease frequency
Incidence:
╸ Measures risk of disease
╸ The proportion of at-risk, disease-free population that develops
the disease over a defined time period
# 𝑜𝑓 𝑛𝑒𝑤 𝑐𝑎𝑠𝑒𝑠 𝑜𝑣𝑒𝑟 𝑎 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑟𝑖𝑠𝑘 𝑡ℎ𝑒 𝑠𝑡𝑎𝑟𝑡 𝑜𝑓 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
╸ Population at risk: SIDSà children > 1 year are not at risk
╸ Also known as cumulative incidence
Azfar Basunia, MD 6
Measures of
disease frequency
Incidence rate:
╸ Accounts for when disease occurs, dynamic population and
losses to follow-up
# 𝑜𝑓 𝑛𝑒𝑤 𝑐𝑎𝑠𝑒𝑠 𝑑𝑢𝑟𝑖𝑛𝑔 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑝𝑒𝑟𝑖𝑜𝑑
𝑡𝑜𝑡𝑎𝑙 𝑝𝑒𝑟𝑠𝑜𝑛 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑤ℎ𝑖𝑙𝑒 𝑎𝑡 𝑟𝑖𝑠𝑘 𝑑𝑢𝑟𝑖𝑛𝑔 𝑠𝑡𝑢𝑑𝑦
╸ Convention for person-time => per person-years or per
10/100/1000/ . . . person-years
Azfar Basunia, MD 7
Measures of
disease frequency
Relationship between incidence and prevalence
╸ Chronic diseases (DM, HTN): prevalence > incidence
╸ Increased survival and/or improved quality of care
╸ Acute/short lasting diseases (sepsis, flu): prevalence ≈ incidence
╸ Increased mortality or faster healing time
╸ Vaccination: decreases both incidence and prevalence
Azfar Basunia, MD 8
Measures of
disease frequency
Example
╸ Rural community with population of 10,000 people. 5,000 have T2DM at the
end of 2019. 100 new cases of T2DM arose between 2019 and 2020. In the
interval, 50 people passed away from complications of T2DM.
Azfar Basunia, MD 9
Measures of
Health Status
Mortality
╸ Number of deaths in a population within a time interval
Morbidity:
╸ Number of persons in a population with a disease at a specific
timepoint
Azfar Basunia, MD 10
Measures of
Health Status
Maternal mortality
╸ (Maternal deaths)/ (total number of live births in a year)
╸ Death of pregnant or w/i 42 days of birth or pregnancy termination
Children mortality
╸ Neonatal mortality: (# infant death w/i first 28 days of life) / (total
number of live births in a year)
╸ Infant mortality: (# infant death w/i first year of life) / (total number
of live births in a year)
╸ Under 5 mortality: (# children death w/i first 5 years of life) / (total
number of live births in a year)
Azfar Basunia, MD 11
Measures of
Health Status
Crude Rate
╸ Total number of cases at a given time point divide by the total
number of persons in the population
╸ Expressed as per 100/1000/10,000/. . .
╸ Does not account for/ adjust for confounding (e.g. age, sex)
Example:
╸ Town A: Population = 50,000; deaths = 3,000/year; crude mortality
rate = 60 per 1,000
╸ Town B: Population = 58,750; deaths = 4,100/year; crude mortality rate
= 70 per 1,000
╸ Is age distribution distorting the crude rates?
Azfar Basunia, MD 12
Measures of
Health Status
Adjusted Rate: Adjusts for possible confounding (e.g. age), allows for
better comparisons
Example: Age Specific mortality rates
Town Age # people % of total Deaths per Death rate per 1,000
group pop year
A ≤ 65 yr 25,000 50% 1,000 40
Azfar Basunia, MD 13
Measures of
Health Status
Standardization
╸ Adjustment of rates that take into account vital differences
between populations (typically age)
╶ Apply a standard age distribution (such as population
distribution of US by age group)
╶ Allow for directly comparison of health outcomes
╸ Commonly birth rates, death rates and unemployment rates
╸ Eg. Age-specific mortality rates among various US states
Azfar Basunia, MD 14
Measures of
Health Status
Case fatality rate (CFR)
╸ Proportion of patients with a particular disease/condition who dies
from the complications of the disease/condition
╸ Must be distinguished from mortality rate
╶ Analogous to risk of dying from a disease/condition in the
general population
╸ Eg. 10% of pts with diabetic nephropathy develop renal failure, but
75% pts with renal failure will die from its complications.
╶ CFR of renal failure among diabetic nephropathy pts = 75%
Azfar Basunia, MD 15
Measures of
Health Status
Life expectancy
╸ The average number of years an individual is expected to live
based on current death rates
╸ Account for social group, geographic region, sex and year of birth
etc.
Azfar Basunia, MD 16
Measures of
Health Status
Risk factor
╸ an attribute that increases probability of disease
╸ Eg. Smoking is a risk factor for lung cancer
Azfar Basunia, MD 17
Measures of
Health Status
Latency
╸ Time between inciting pathologic events or exposure to disease
manifestation or development of clinical signs and symptoms
╸ Can apply to both risk factors and risk reducers
╸ Eg.
╶ Risk factors:
╶ Bacterial infections have short latency periods;
╶ Chronic diseases like HTN, DM, HIV have long latency periods;
╶ Risk reducers: Lung healing after smoking cessation in chronic
smokers may take years w/o full return to baseline functioning
Azfar Basunia, MD 18
Survival Analysis
Kaplan-Meier curve
╸ Measure disease prognosis,
time-to-event analysis
╸ Y axis, estimated probability
of event-free survival
╸ X axis, time to follow-up since
entry into study
╸ Each step/vertical drop denotes an event = death
╸ Small vertical lines in the graph represent censoring, those lost of
follow-up, dropped out etc.
Azfar Basunia, MD 19
Survival Analysis
Kaplan-Meier curve
╸ Simple inferences from graph
╸ Median survival = 50%
survival = 28 mo vs. 115 mo
╸ Survival at 36 mo = 46% vs.
75%
╸ Log-Rank test
╸ Compare survival curves
╸ p < 0.05, survival between groups is significantly different
╸ P > 0.05, survival between groups is not significantly different
20
Composite health status
indicators, measures of
population impact
Quality Adjusted Life Years (QALYs)
╸ Measure burden of disease
╸ Economic impact of interventions on population
╸ Time Trade Off (TTO) = a tool for QALY calculation
╶ 1 year in healthy life = 1 TTO; 1 year in disability, 0 < TTO < 1
╸ Interventions are aimed at maximizing QALYs
Azfar Basunia, MD 21
Composite health status
indicators, measures of
population impact
Example
╸ 40-year-old patient, previously healthy until age 30, history of
meningitis at age 30 and subsequent stroke and right sided
hemiparesis. Every 5 years of life in his current state = 1 year of life in
full health.
╸ His TTO after disability = 1/5 = 0.2. He lived 3o years in TTO = 1, and 10
years in TTO = 0.2.
╸ QALYs = (30 × 1) + (10 × 0.2) = 32
╸ NOTE: Formulas will differ for populations. Focus on understanding
when QALYs are utilized
Azfar Basunia, MD 22
Composite health status
indicators, measures of
population impact
Years of Potential Life Lost (YPLL)
╸ Measure of premature mortality based on standard, age-based tables
of life expectancy
╸ sum of the years of life lost annually by persons who suffered death
prior to life expectancy
Azfar Basunia, MD 23
Composite health status
indicators, measures of
population impact
Example
4 people A, B, C and D died at ages 70, 50, 10 and 90 years of age
respectively in a small rural community in 2019. Life expectancy for the
town is 75 years of age. Calculate YPLL
╸ YPLL of the community = YPLLA + YPLLB + YPLLC + YPLLD
╸ YPLLA = 75-70 = 5 years; YPLLB = 75-50=25 years; YPLLC = 75-10 = 65
years. YPLLD = 0 (since 90 > life expectancy of 75).
╸ YPLL of the rural community = 5+25+65+0=95 person-years
Azfar Basunia, MD 24
Composite health status
indicators, measures of
population impact
Disability Adjusted Life Years (DALYs)
╸ Compare overall health and life expectancy among countries
╸ Represent difference between current health situation and ideal
living in perfect health up-to the standard life expectancy
╸ Based on premature mortality (YPLL) and burden of living with a
disease or disability (year lived with disability, YLD)
╸ Interventions aimed at minimizing DALYs
Azfar Basunia, MD 25
Composite health status
indicators, measures of
population impact
Example
╸ A previously healthy male develops T2DM at 30 and passes away
from MI at 60. Standard life expectancy is 75 and disability weight for
T2DM is 0.5 (all hypothetical numbers).
╸ YPLL = 75-60 = 15 years
╸ YLD = years lived in disability × disability weight = 30 × 0.5 = 15 years
╸ DALYs = YPLL + YLD = 15 + 15 = 30 years
╸ NOTE: Formulas will differ for populations. Focus on understanding
when DALYS are utilized
Azfar Basunia, MD 26
Composite health status
indicators, measures of
population impact
Standardized Mortality Ratio (SMR)
╸ Used in occupational epidemiology
Observed deaths in the study group
Expected deaths in the study group
╸ Expected deaths based on age-specific mortality rates among
general US population
Azfar Basunia, MD 28
Composite health status
indicators, measures of
population impact
Standardized Incidence Ratio (SIR)
╸ Used in cancer epidemiology, to compare the incidence of cancer in a
small population to a larger, control population
Observed cases in the target popupulation
Expected cases in a larger, control population
╸ Expected cases are based on age and sex-specific incidence rates
among the population of a state, the general US population etc.
╸ Interpretation similar to SMR:
╶ SIR = 1 => incidence similar
╶ SIR > 1 => incidence larger in the target population
╶ SIR < 1 => incidence smaller in the target population
Azfar Basunia, MD 29
Population pyramids &
impact of demographic
changes
Distribution of a population by sex
and age groups
Expansive pyramid:
╸ Higher percentage of youthful
population
╸ High birth rates + low life
expectancy
Source: Wikipedia
Azfar Basunia, MD 30
Population pyramids &
impact of demographic
changes
Constrictive pyramid:
╸ Higher percentage of elderly
population
╸ Low birth rate + high life
expectancy
Stationary pyramid:
╸ Age and sex distribution remain
stationary/constant over time
╸ Birth rate = death rate Source: health.pa.gov
Azfar Basunia, MD 31
Disease surveillance &
outbreak investigation
Disease surveillance:
╸ monitor spread of disease to establish patterns of progression
Disease Reporting
╸ Cornerstone of disease surveillance
╸ Formal reporting of notifiable infectious diseases required by healthcare
workers
╸ Around 80 reportable diseases in USA (COVID-19, TB, anthrax, rabies,
chlamydia, gonorrhea, HIV, measles, mumps, rubella etc.).
╶ Full list in CDC website: https://ndc.services.cdc.gov/
╸ Incidence of these diseases à indicators of overall health of population
Azfar Basunia, MD 32
Disease surveillance &
outbreak investigation
Public health advisors (PHA)
╸ Public health workers doing field work or contact epidemiology
╶ Contacting, interviewing and locating people who have been
exposed to an infectious agent
╶ Offer treatment and follow-up options
╸ Called upon to respond to public health or humanitarian crisis
Health promotion
╸ Public policy to address health determinants such as income, housing,
food security, employment and working conditions
╸ Eg. Health literacy in schools, breastfeeding promotion in clinics
Azfar Basunia, MD 33
Disease surveillance &
outbreak investigation
Recognition of clusters
╸ Cluster: An usually large
aggregate of a medical
condition, disease or event
within a particular geographic
location.
╸ If clusters of sufficient size or
importance are identified à
re-evaluate as outbreaks
Source: Johns Hopkins University
Azfar Basunia, MD 34
Communicable
disease transmission
Attack rate
╸ Cumulative incidence of disease, especially in the setting of outbreak
╶ Proportion of people with disease divided by total population at
risk
╸ Eg. In one year, 100 new cases of cholera in a rural village with
population of 8000.
╶ Attack rate of cholera = 100/8000 = 0.0125 = 1.25% per year
Azfar Basunia, MD 35
Communicable
disease transmission
Herd immunity
╸ Indirect protection from infectious diseases when certain percentage
of population becomes immune to an infection
╶ Either through vaccination or natural infection
╶ Reduces likelihood of transmission of infection to unvaccinated
individuals
╸ Threshold varies depending on infectivity of the pathogen
╶ Pertussis ~ 82%, Measles ~ 95%
Azfar Basunia, MD 36
Points of
intervention
Preventative Medicine
╸ Primordial prevention:
╶ Addressing risk factors (environmental, socioeconomic, behavioral).
╶ Eg. Encouraging healthy lifestyle and diet to prevent obesity, DM and
HTN (risk factors for cardiovascular disease)
╸ Primary prevention:
╶ Prevent disease occurrence
╶ Decreases both incidence and prevalence of disease
╶ Eg. Vaccinations, folate supplementation in pregnant women to
prevent neural tube defects in their children
Azfar Basunia, MD 37
Points of
intervention
Preventative Medicine
╸ Secondary prevention:
╶ Target disease early in its course to promote early intervention,
prevent disease progression and irreversible damage
╶ Screening and follow-up management
╶ Eg. Cardiovascular screening (BP, lipid panel), cancer screening (eg.
Colonoscopy, mammograms)
╸ Tertiary prevention:
╶ Limit impairments and disability from a disease that has already
progressed to its advanced stages
╶ Decrease morbidity and mortality after disease onset
Azfar Basunia, MD 38
Points of
intervention
Preventative Medicine
╸ Tertiary prevention:
╶ Eg. Stroke rehabilitation programs after acute stroke, tamoxifen
adjuvant therapy in breast cancer to reduce recurrence
╸ Quaternary prevention
╶ Limit unnecessary or excessive interventions by a health system that
could be harming more than benefiting the patients
╶ Eg. Avoid CT scan in infants unless absolutely necessary, avoid
prescribing antibiotics when viral disease is more likely
Azfar Basunia, MD 39
Points of
intervention
Community level
╸ Taxation (tobacco tax and soda taxes)
╶ Excising taxes on tobacco products and sugary drinks to
discourage purchase and reduce consumption
╶ Primordial and primary prevention
╸ Smoke-free cities, buildings, restaurants and public spaces
╶ Prevent harm from second-hand smoke exposure
╶ Risk of lung CA, emphysema, bronchitis and cardiovascular
diseases
╶ Primordial and primary prevention
Azfar Basunia, MD 40
Points of
intervention
School policies
╸ School based health and nutrition services promote early detection,
correction and prevention of disease and disability in children
╸ Eg. Safe-sex education, healthy lunches, presence of a school nurse
╸ Broader positive impacts in the children’s families and communities
Azfar Basunia, MD 41
Points of
intervention
Social determinants of health
╸ Economic and social factors that influence both individual and group
differences in health status, risk of disease or vulnerability to injury
╸ Eg. Education, income distribution, working conditions, healthy food
and clean water, housing, safe environments, gender, race, disability
╸ Possible Interventions
╶ Education: adequate teacher to student ratio, health curriculum
╶ Urban development: affordable housing, removal of lead paint,
parks, reliable public transportation
╶ Public policy: ADA, ACA, FMLA, smoking ban inside restaurants
Azfar Basunia, MD 42
2 Study design, types & selection
Azfar Basunia, MD
Descriptive
Studies
╸ (-) Cannot establish causality
Case report
╸ Describes an unusual disease presentation/outcome in a single
patient
╸ Can aid in hypothesis formulation
Case series
╸ Collect and analyze disease presentation, course or treatment
response of several patient cases
╸ (-) No control group, prone to selection bias since the researchers
are selecting the subjects
Azfar Basunia, MD 44
Analytical Studies:
Observational
Cross-sectional (individuals)
╸ Also known as prevalence study
╸ Examine associations between exposures (risk factors) and outcomes
(disease) at a particular time point
╶ Simultaneously measure frequency of exposure(s) and disease
╸ (-) Cannot establish causality
╸ E.g. Investigators at a state hospital are studying links between
incarceration and development of active TB. They collected data on
all active TB cases and prior incarceration history from the hospital
charts.
Azfar Basunia, MD 45
Analytical Studies:
Observational
Cross-Sectional Surveys (Community Surveys)
╸ A form of cross-sectional study
╸ A group of subjects from a defined population are selected and
evaluated for exposure of interest at a particular time point
Azfar Basunia, MD 46
Analytical Studies:
Observational
Ecological studies
╸ Assess the relationship between outcome/disease and exposure
at a population level (country, state, city etc.)
╸ (-) Ecological Fallacy: Cannot make inferences on individual level
based on group characteristics
╸ (-) Cannot establish causality
╸ E.g. Determining incidence of new COVID-19 cases among U.S.
counties and rates of vaccination
Azfar Basunia, MD 47
Analytical Studies:
Observational
Cohort Study
╸ Assess if an exposure/risk factor is associated with disease
╸ Method:
╶ Identify a group (cohort) with common characteristics (age,
background, geography, sex, etc)
╶ Divide subjects based on exposure status to risk factor
╶ Follow cohort over a period of time and determine if they
develop disease of interest
╸ Main outcome of interest: incidence.
╶ Risk of disease among exposed compared to risk of disease
among unexposed
Azfar Basunia, MD 48
Analytical Studies:
Observational
Prospective Cohort Study
╸ Study begins before outcome/disease has occurred
Retrospective Cohort Study
╸ Study begins after exposure + outcome/disease has occurred
╶ Patient charts are reviewed to gather data but the same
protocol as prospective cohort study is followed
Azfar Basunia, MD 49
Analytical Studies:
Observational
╸ Patients are followed over time
╶ (+) Can calculate incidence and risk of disease due to exposure
╶ (-) Need to determine exposure prior to study initiation
╶ (-) For rare diseases, large cohort needed, which drives up cost
Azfar Basunia, MD 50
Analytical Studies:
Observational
Example: Prospective design
╸ Clinicians want to investigate links between HTN development and heavy
smoking among males age 40-50
╸ A busy clinic had 900 new male pts w/o HTN between age 40-50.
55.56% of these pts are heavy smokers (>1 ppd).
╸ Pts’ are followed for next 5 years and new onset HTN is determined.
╸ 400 patients developed HTN among heavy smokers and 50 patients
developed HTN among non heavy smokers. RR = 6.4
╸ Conclusion: Risk of new onset HTN was 6.4 times greater among
heavy smokers for 5 years compared to non-heavy smokers
Azfar Basunia, MD 51
Analytical Studies:
Observational
Example: Retrospective design
╸ Clinicians want to investigate links between HTN development and heavy
smoking among males age 40-50
╸ 900 male pts w/o HTN between age 40-50 who were followed by the
clinic for the past 5 years were identified. 55.56% of these pts are heavy
smokers (>1 ppd) for the past 5 years.
╸ Pts’ charts were reviewed and new onset HTN w/i past 5 years is noted.
╸ 400 patients developed HTN among heavy smokers and 50 patients
developed HTN among non heavy smokers. RR = 6.4
╸ Conclusion: Risk of new onset HTN was 6.4 times greater among heavy
smokers for 5 years compared to non-heavy smokers
Azfar Basunia, MD 52
Analytical Studies:
Observational
Case-Control Study
╸ Assess if an exposure/risk factor is associated with disease
╸ Method:
╶ Identifying subjects with disease (cases) and subjects without
disease (controls)
╶ Matching Controls: patients w/o disease with similar features
(age, sex) from the same source population
╶ Goal is to limit confounding
╶ Look retrospectively and determine exposure/risk factor status
of cases and controls
Azfar Basunia, MD 53
Analytical Studies:
Observational
Case-control-study
╸ Main outcome measure: Odds ratio (OR).
╶ Compare odds of exposure among cases with odds of exposure
among controls
╸ Patients are not followed over time
╶ (+) Can be used in rare diseases (e.g. genetic diseases)
╶ (+) diseases with long latency period (e.g. cancer)
╶ (+) Cost efficient
╶ (-) Recall bias (especially in cases)
╶ (-) cannot calculate prevalence or incidence
Azfar Basunia, MD 54
Analytical Studies:
Observational
Example of Case Control Study
╸ Researchers are interested to investigate if hypertension (HTN) is associated
with heavy smoking (>1 pack/day) for 5 years in NYC.
╸ They identified 500 cases of HTN (disease) from 10 hospitals in NYC and
400 matched controls without HTN admitted in the same hospitals.
╸ Among pt with HTN, 400 were heavy smokers w/i past 5 years. Among pt
w/o HTN, 50 were heavy smokers w/i past 5 years
╸ Researchers then measured OR (28), which was statistically significant
(p<0.05). Odds of HTN were 28× greater among heavy smokers w/i past 5
years compared to non-heavy smokers.
Azfar Basunia, MD 55
Analytical Studies:
Observational
Nested Case-Control Study
╸ Cases and controls are drawn from participants of a prior, defined
cohort (i.e. cohort study)
╸ Cases have outcome of interest; controls do not
╸ Risk factors associated with outcome of interest can be investigated
╸ Eg. Examining effect of OCPs on VTE by drawing cases and controls
from nurse’s health study
Azfar Basunia, MD 56
Analytical Studies:
Interventional
Clinical Trial
╸ Prospective studies that assess whether new treatments are safe and
effective in patients with the disease of interest
╸ Exposures or treatments (drug, surgery, intervention) are assigned to
patients unlike observational studies
╸ End-points are defined prior to study initiation
╶ Primary: Outcome(s) the study is designed to evaluate. Number
of participants needed to detect meaningful differences of
intervention (i.e. power calculation/sample size determination) is
based on this. Eg. All-cause mortality, hospitalization
Azfar Basunia, MD 57
Analytical Studies:
Interventional
╶ Secondary: Outcomes of interest (e.g. effect of intervention on
co-morbidities, development of drug side effects )
╶ Combined: Combining multiple (categorical) endpoints (such
as all-cause mortality, MI, stroke, limb amputation). Can be
primary or secondary.
╶ Surrogate: Outcomes (lab test, imaging or physical finding)
that are predictive of future severe outcomes. E.g. rescue
inhalerà asthma control
╸ Outcomes reported: incidence, risk, survival analysis
Azfar Basunia, MD 58
Analytical Studies:
Interventional
Features of clinical trials
╸ Randomization: pts are randomized into treatment and control arm.
╶ Minimizes both obvious and hidden bias and confounding
╶ “Table 1” - baseline characteristics in treatment and control groups
╸ Blind: Not informing which pts are assigned to treatment or control
╶ Single blind: Only patients are unaware of their assignment
╶ Double blind: Both patients and researchers are unaware of
patient assignment. This is considered gold standard
╶ Triple blind: Patients, researchers and data analysts are unaware
of patient assignment.
Azfar Basunia, MD 59
Analytical Studies:
Interventional
Features of clinical trials
╸ Placebo controlled: Control subjects receive placebo
╸ Non-inferiority/ equivalence: Control subjects receive standard of
care or other treatment
╶ Goal: new drug is not worse than control drug by an acceptable
margin (the non-inferiority margin)
Azfar Basunia, MD 60
Analytical Studies: Non-inferiority study:
A. Not non-inferior and inferior
Interventional B. Non-inferior and inferior
D C. Non-inferior and not superior
D. Non-inferior and Superior
C
Azfar Basunia, MD 61
Analytical Studies:
Interventional
Azfar Basunia, MD 62
Analytical Studies:
Interventional
Types of clinical trials
╸ Phase 1:
╶ Evaluate drug safety: toxicity, maximum tolerated dose,
pharmacokinetics, pharmacodynamics
╶ Small number (<50) of healthy subjects
╸ Phase 2:
╶ Further evaluate drug safety: optimal dosing, efficacy and adverse
effects
╶ Small number (<100) of patients with disease of interest
Azfar Basunia, MD 63
Analytical Studies:
Interventional
Types of clinical trials
╸ Phase 3:
╶ Evaluate drug safety and efficacy for marketing: Compare drug to
current standard of care (if available) or placebo
╶ Randomized, blinded and controlled trial with large number (>100)
of patients with disease of interest
╶ Generally required for FDA approval
╸ Phase 4:
╶ Post marketing surveillance: safety studies ( rare and long-term
side effects)
Azfar Basunia, MD 64
Analytical Studies:
Interventional
Cluster, randomized controlled trials
╸ Participants are grouped into clusters, which are then randomized into
control or intervention.
╸ Individuals with similar background characteristics are usually
assigned to the same cluster
Factorial study (fully crossed design)
╸ (Randomized) study ≥ 2 interventions and all possible combinations.
╸ Eg. Exercise vs dietary modification for lowering BP. 4 possible
combinations: Exercise alone, diet alone, both, neither (control).
Azfar Basunia, MD 65
Analytical Studies: Abaluck et al. 2021.
Interventional
https://www.nber.org/system/files/
working_papers/w28734/w28734.
pdf
Community Intervention/ Community trials
╸ Trials involving entire communities instead of individuals
Example:
╸ Design: Cluster-randomized controlled trial with 600 communities in
Bangladesh with control arm (no special instructions), surgical masking
and cloth masking (both free mask distribution, how to mask, why its
important to mask).
╸ Monitor Sars-Cov-2 spread through symptoms + serological confirmation.
Results: 9% reduction in COVID transmission in surgical mask arm, cloth
masks arm transmission comparable to control arm.
Azfar Basunia, MD 66
Analytical Studies:
Interventional
Crossover study
╸ Participants serve as their own control after a brief washout period
╸ Design: Pts randomized to tx vs control à washout à switch
╸ (+) Controls for confounding, esp with fewer pts
╸ (-) Ineffective washout period and lingering effects of intervention
Azfar Basunia, MD 67
Systematic reviews
& meta-analysis
Meta-analysis
╸ Pooling data from several studies to increase statistical power
╶ detecting a difference in outcome of interest between groups
when one exists
╸ (+) More precise results
╶ good for rare diseases/outcomes
╶ Difference in outcomes between groups is small
╸ (-) Publication bias:
╶ Results are only as good as studies used
╶ pooling of biases and limitations of individual studies
╶ E.g. Using only studies with statistically significant results
Azfar Basunia, MD 68
Systematic reviews
& meta-analysis
Systematic reviews
╸ Compiling of primary studies and summarizing of evidence to
answer a defined question
╸ Does not involve statistical analysis
╸ (-) Publication bias
Azfar Basunia, MD 69
Systematic reviews
& meta-analysis
Funnel Plot:
╸ Assessment of publication bias
╸ Treatment effect (x-axis)
plotted against standard error
of treatment effect (y-axis)
╸ Look for symmetric distribution
of studies along the vertical line
╸ Asymmetric distribution
suggests publication bias
Source: Zhang et al. 2016
Azfar Basunia, MD 70
Systematic reviews
& meta-analysis
Study Heterogeneity:
╸ In an ideal world, studies pooled in meta-analysis would be undertaken
with the same experimental protocols à homogeneity
╶ Differences b/w outcomes only due to measurement error
╸ In real world, variability goes beyond what is expected from
measurement error
╶ Differences in investigated populations, treatment schedules,
endpoint definitions, etc
╸ Presence of some heterogeneity can be expected.
╶ Accounted by techniques such as random effects modeling,
stratified analysis
Azfar Basunia, MD 71
Systematic reviews
& meta-analysis
Risk of bias:
╸ Assess each study in a systematic review/ meta-analysis for bias in
domains of selection, performance, detection, attrition and report
╸ Classify as low risk, high risk or unclear
Azfar Basunia, MD 72
Systematic reviews
& meta-analysis
Source: Cochrane
handbook
Azfar Basunia, MD 73
Systematic reviews
& meta-analysis
Effect size:
╸ Strength of relationship between 2 variables (correlation, mean
difference, risk ratio, odds ratio, regression coefficient etc)
╸ Purpose is to combine multiple effect sizes in meta-analysis
╸ Uncertainty in effect sizes used to calculate weights or importance
╶ Larger studies with smaller uncertainty à larger weights
╶ Smaller studies with larger uncertainty à smaller weights
Azfar Basunia, MD 74
Systematic reviews
& meta-analysis
Forest plot:
Graphical display of results
Individual studies (dark squares)
Effect size (size of dark squares)
Calculated combined effect (white
diamond, dashed vertical line)
Null value (solid line)
Source: Wikipedia
Azfar Basunia, MD 75
Obtaining &
Describing samples
Inclusion criteria:
Characteristics that patients
have to meet to participate
in the study
Exclusion criteria:
Characteristics that
disqualify patients from
participating in the study
From Pennell et al. NJEM 2021
Azfar Basunia, MD 76
Obtaining &
Describing samples
Lack of controls
- Cannot determine if patient outcomes/improvements are due to
intervention or inherent sample characteristics or bias
Selecting appropriate controls for studies
╸ Ideally control group will be similar to treatment group with the
exception of receiving the intervention
╸ Limits confounding (external variable affecting both
exposure/intervention and outcome)
Azfar Basunia, MD 77
Obtaining &
Describing samples
Matching
╸ Utilized in case-control studies
╸ Selecting controls from the same source population with matching
baseline characteristics (age, sex, comorbidities) who does not have
the disease of interest.
Azfar Basunia, MD 78
Obtaining &
Describing samples
Randomization
╸ Subjects are randomized to treatment and control groups
╸ Controls for both known and unknown confounders
╶ Effect of confounders averaged between groups
╶ Baseline characteristics approximately equal between groups
(table 1 in most trial studies)
Azfar Basunia, MD 79
Obtaining &
Describing samples
Concealed allocation
╸ Used in randomized trial to prevent selection bias
╸ Preventing investigators from gaining knowledge of research
participant assignment through techniques such as pharmacy
controlled randomization.
Azfar Basunia, MD 80
Obtaining &
Describing samples
Stratification
╸ Partitioning participants by confounding
╶ (-) requires prior knowledge of
confounding factors
╶ randomization is superior
╸ Stratified random sampling: Partitioning
possible participants into subgroups (such
as by age group, sex) and randomly
sampling from each group
╸ Stratified analysis: partitioning of results by
a possible confounder Source: Wikipedia
Azfar Basunia, MD 81
Methods to handle
noncompliance
Loss to follow-up
╸ Participants drop out of study prior to completion
╸ Can lead to attrition bias if one group selectively loses more subjects
╶ Remaining participants differ significantly
╶ E.g. in a placebo controlled trial, sicker patients from the
treatment arm drop out selectively, leading to overestimation of
of the study drug’s beneficial effects
╶ New bias=> loss of randomization advantage
Azfar Basunia, MD 82
Methods to handle
noncompliance
Per protocol treatment
╸ Assumes ideal scenario of no loss to follow-up
╶ Results calculated based on currently retained participants
╸ Overestimates effects of the study drug/intervention
Intention to treat (ITT) analysis
╸ Participants analyzed according to original assignment during
randomization regardless of study completion
╸ Preserve randomization during dropout and crossover studies
╸ More conservative estimate of the effect of intervention, which better
mirrors the expected effect in a practical clinical setting.
Azfar Basunia, MD 83
Methods to handle
noncompliance
Example
╸ New hypoglycemic diabetes drug X.
╸ New randomized placebo controlled trial for 1 year planned with 40
patients with T2DM planned with goal of A1c control between 6.5 – 7
at the end of study.
╸ 2o randomized in tx arm and 20 randomized in control
╸ At the end of trial,
╶ Tx arm: 4 pts had uncontrolled DM. 10 pt dropped out
╶ Control arm: 12 pts had uncontrolled DM. 7 pt dropped out
╸ Find ARR of uncontrolled DM
Azfar Basunia, MD 84
Methods to handle
noncompliance
Per protocol analysis
╸ ARR = (12/13) – (4/10) = 0.52 = 52%
╸ 52% reduction in risk of uncontrolled DM due to treatment with drug X
compared to placebo
Intention to treat analysis
╸ ARR = (12/20) – (4/20) = 0.15 = 40%
╸ 40% reduction in risk of uncontrolled DM due to treatment with drug X
compared to placebo
Azfar Basunia, MD 85
Methods to handle
noncompliance
Sensitivity analysis
╸ Determine robustness of the results or conclusions
╸ Repeat primary analysis by modifying methods, models, criteria or
variable ranges to see if such changes drastically affect the outcomes
or the results
╸ Some common scenarios for sensitivity analysis in clinical trials:
╶ Outliers or missing data (exclude or impute)
╶ Definition of outcomes: modify cutoffs
╶ Non-compliance: ITT or per protocol analysis
╶ Distribution of data: Eg. Normal vs. binomial vs. Poisson
Azfar Basunia, MD 86
Qualitative
Analysis
Qualitative Research
╸ Goal is to understand human behavior, the “why” and “how” of
decision making
╸ Research questions are discovery oriented, descriptive and
exploratory in nature
╸ Methods of gathering data: (1) participation in the setting, (2) direct
observation (3) interviews (4) focus groups (5) analysis of documents
╸ Categorize and report data for patterns that may arise
Azfar Basunia, MD 87
Qualitative
Analysis
Examples
╸ In a clinic, about half of all patients with HIV fail to adhere to HAART
╸ The researchers decide to interview patients, care givers, and
healthcare providers to gain insights into reasons for non-adherence
Azfar Basunia, MD 88
3 Measures of Association
Azfar Basunia, MD
Measures of
Association
2 ✕ 2 table: Organizing and representing data for calculations
╸ Disease as columns,
╸ Exposure or Test as rows
Disease + Disease -
Azfar Basunia, MD 90
Measures of
Association
Risk of outcome or disease
╸ Probability of the outcome/disease occurring over a certain period of time
╸ Calculated from cohort studies (subjects are followed over time)
╸ Cannot be calculated from case control studies since patients are not
tracked over time
Azfar Basunia, MD 91
Measures of Disease + Disease -
Azfar Basunia, MD 92
Measures of Stroke + Stroke –
!! &'⁄
!"# (''
RR = $! = )⁄ =3
$"% )'
Azfar Basunia, MD 93
Measures of
Association
Odds
╸ Mathematically, probability of event occurring/probability of event not
occurring
╸ Preferred measure of association for case control studies
╶ Cannot calculate risk from case control study
╶ Odds of exposure for both pt w/ disease and pt w/o disease
Odds ratio (OR)
╸ The odds of exposure to risk factor among patients with disease or
outcome compared to the odds of exposure to risk factor among
patients without disease or outcome
Azfar Basunia, MD 94
Measures of
Association
Odds ratio calculation Disease + Disease –
Azfar Basunia, MD 95
Measures of Stroke + Stroke –
100 50
100 pts who were admitted for stroke (cases) from a hospital are evaluated for
hx of uncontrolled HTN. 50 pts with similar baseline characteristics but no hx
of stroke (controls) were also evaluated for uncontrolled HTN. 60 cases and 10
controls had uncontrolled HTN. Calculate OR
#×% ()× *)
OR = &×' = +) × *) = 6
Conclusion: The odds of having uncontrolled HTN among patients with
stroke are 6x higher than the odds of having uncontrolled HTN among
patients without stroke
Azfar Basunia, MD 96
Measures of Disease + Disease –
Association Exposure + A B
Exposure – C D
Odds ratio of disease
╸ The odds of disease among patients w/ exposure or risk factor compared
to the odds of disease among patients w/o exposure or risk factor
,--. /0 -1.23.2 34/56 278/.2- !! #×% ,--. /0 278/.9:2 34/56 -1.23.2;
OR = ,--. /0 -1.23.2 34/56 95278/.2- = $!# = &×' = ,--. /0 278/.9:2 34/56 -1.23.2<
%
Azfar Basunia, MD 97
Measures of Disease + Disease –
Association Exposure + A B
Exposure – C D
Rare disease assumption
╸ Mathematically, OR can approximate RR in disease prevalence is low (i.e.
disease is rare) => B >> A, D >> D
╸ Cut off for rare disease is arbitrary
!
=1.> /0 -1.23.2 34/56 278/.2- !! #×%
!"# #
RR = = $ ≈ $! = = OR
=1.> /0 -1.23.2 34/56 95278/.2- % &×'
$"%
Azfar Basunia, MD 98
Measures of
Association
Hazard Ratio
╸ Measure of effect in survival or time-to-effect studies
╸ Interpretation similar to RR
╶ RR only takes into account the occurrence of an event and can only
be calculated at the end of the study
╶ HR takes into account the timing & can be calculated at any time
╶ HR < 1: Outcome more likely among unexposed (protective effect);
╶ HR = 1: Outcome equally likely among exposed + unexposed (no
benefit or harm);
╶ HR > 1: Outcome more likely among exposed (detrimental effect).
╸ Calculated using Cox Proportional Hazards model
Azfar Basunia, MD 99
Measures of Source: Andre Thierry, et al. NJEM 2020
Association
Example: Phase 3 randomized clinical trial comparing pembrolizumab vs
conventional ctx for MSI-H–dMMR or metastatic colorectal cancer
Association Exposure + A B
Exposure – C D
Absolute Risk
╸ probability an individual develops an outcome or disease in the study
period => approximately equal to incidence rate
╸ Risk => calculated from cohort studies
╸ Absolute Risk =
?@?AB CDEFAEF;
?@?AB GA?DFH?E
I;J
= I;K;J;L
Association Exposure + A B
Association Exposure + A B
Exposure – C D
Attributable risk percent (ARP)
╸ % of disease incidence among exposed patients that can be attributed to
the risk factor, which can be eliminated if exposure is avoided
╸ ARP =
==<+
==
× 100%
╸ Also, ARP =
I=
=1.> 15 278/.2-
×100%, where risk in exposed =
I
I;K
Association Treatment/
exposure +
A B
Treatment/ C D
Relative risk reduction (RRR) exposure –
╸ I==
Also, RRR = =1.> 15 P/5M:/O 6:/98 ×100%, where risk in control = J;L
J
Association Exposure + A B
Exposure – C D
Population attributable risk (PAR)
╸ The proportion of all cases of an outcome/disease in the total population
that could be attributed to the risk factor
╸ PAR = absolute risk – risk in unexposed =
I;J
I;K;J;L J;L
-
J
╸ Also, PAR = AR ×
M/M3O 278/.9:2 ;
M/M3O 83M125M.
=(
I
I;K
-
J
J;L
)×
I;K
I;K;J;L
Association Exposure + A B
Exposure – C D
Population attributable risk percent (PAR%)
╸ The percentage of disease in the observed population that is attributable
to the risk factor
╸ What % of an outcome could possibly be prevented if a risk factor were to
be removed from the population
╸ PAR% =
QI=
3N./O9M2 :1.>
×100% =
3N./O9M2 :1.> <:1.> 15 95278/.2-
3N./O9M2 :1.>
×100%,
╸ Also,
8:2R3O25P2 /0 278/.9:2 × (==<+)
PAR% = +;(8:2R3O25P2 /0 278/.9:2 × ==<+ )
Association
> 1 ppd A = 300 B = 660
Association
> 1 ppd A = 300 B = 660
Association
> 1 ppd A = 300 B = 660
Example 2
> 30 min 300 cases/ 5000 cases/
A large prospective cohort study to daily 21,000 22,000
evaluate effect of daily > 30 min exercise person-years person-years
exercise on the incidence of T2DM
in men with BMI >30. 10,000 obese
men with no hx of DM by A1c < 30 min 3000 cases/ 1700 cases/
measurements were enrolled in the daily 21,700 20,000
study and followed up for 10 years. exercise person-years person-years
The following data are reported.
Calculate ARR, RRR
(2) In a prospective cohort study, RR = 0.65 for subsequent MI for pts with
prior MI on statins compared to those with prior MI and not on statins.
à The risk of subsequent MI is 35% lower in pts with prior MI on statins
compared to those with prior MI and not on statins
Azfar Basunia, MD
Distribution
of data
Data type
╸ Categorical/nominal variable
╶ Finite number of categories w/o any discrete order.
╶ Eg. Sex (male, female, other), States in USA
╸ Quantitative variable
╶ Discrete: only whole numbers/values; may have a logical order.
Eg. Diabetic category (no DM, pre-DM, DM) based on A1c
╶ Continuous: any real number values. Eg. SBP, temperature
Source: Wikipedia
Mean = (89+93+93+98+100+121)/6 = 99
Median = (93+98)/2 = 95.5
Mode = 93
Source: Wikipedia
Source: Wikipedia
Azfar Basunia, MD
Correlation & Correlation coefficients, r
regression ╸ Linear association between 2 variables
╶ Strength of association
╸ DOES NOT IMPLY CAUSATION
╸ Range (-1, 1)
╶ Closer to |1| => stronger correlation
╶ (+) : direction, both ↑ or ↓ together
╶ (-) : direction, one ↑ while the other ↓
╸ Dependent: y
╶ Effect or outcome variable
╸ Regression
╶ Mathematical relationship between dependent and independent
variables
Azfar Basunia, MD
Sensitivity, NPV Disease + Disease –
Total Lung CA+ = 100 Test – False Negative (FN) True Negative (TN)
specificity, PPV
Test + True Positive (TP) False Positive (FP)
Source: Wikipedia
Azfar Basunia, MD 154
Probability
Pre-test probability
╸ Probability a patient has a specific disease
╸ Usually equal to prevalence of disease
╸ Affects PPV, NPV; does not affect sensitivity, specificity, likelihood ratio
╶ ↑ pre-test probability => ↑ PPV, ↓ NPV
╶ ↓ pre-test probability => ↓ PPV, ↑ NPV
Post-test probability
╸ Probability a pt has a disease after diagnostic testing
╸ Post-test prob = pre-test prob ✕ likelihood ratio
B 70 % 95 % 14 0.32
╸ Assuming a pt has GBM, how likely are they to test positive for biomarker
C compared to a patient without GBM testing positive for biomarker C?
╶ LR+ for biomarker C = 2.5
╸ Assuming a pt has GBM, how likely are they to test negative for
biomarker B compared to a patient without GBM testing negative for
biomarker B?
╶ LR- for biomarker B = 0.32
Source: Wikipedia
Azfar Basunia, MD 165
7 Study interpretations &
drawing conclusions from data
Azfar Basunia, MD
Causation
Hypothesis-generating testing
╸ Investigating patterns in data to generate testable hypothesis using
qualitative research & descriptive statistics (mean, median, IQR, correlation)
╸ Eg. Investigating rising teenage obesity in a small town by analyzing
demographics, eating habits, and activity level of students
Hypothesis-driven testing
╸ Addressing a specific question using analytical studies and statistical
methods
╸ Eg. Are increased carbohydrate content in school lunches is linked to rising
obesity among school children in a small town
Azfar Basunia, MD 167
Causation
Causal criteria (Bradford-Hill Criteria): Causality in epidemiological studies
╸ Temporality:
╶ outcome occurs after exposure within an expected amount of time.
╶ Eg. Poison ivy causing type IV hypersensitivity reaction
╸ Temporal Sequence:
╶ The cause/exposure must happen before effect/outcome.
╶ There can be strong correlation between the cause and effect, but it
may be difficult to distinguish.
╶ Eg. Connection between poor performance and marijuana smoking.
Does poor performance cause more marijuana smoking or do
students with poor performance smoke more marijuana
Azfar Basunia, MD 168
Causation
Causal criteria (Bradford-Hill Criteria)
╸ Dose response relationship:
╶ Exposure to higher doses causes higher incidence of effect
╶ Eg. Patients with higher pack years of smoking have higher incidence
of lung and bladder cancer
╸ Reproducibility:
╶ Findings have been observed in studies with different persons,
different places, different sample sizes etc
Reverse causality
╸ The risk factor/exposure and the outcome are strongly associated but
the relationship between exposure and disease process are reversed.
╸ Eg. Schizophrenia leads to low SES, not the other way around (Source:
Gerstman 2003)
Azfar Basunia, MD 170
Chance
Null Hypothesis (H0)
╸ Assumption that there is no relationship between the exposure and
outcome or no difference between the two variables
Post-hoc analysis
╸ Comparisons are made after data is collected and evaluated
╸ Must account for multiple comparisons
Bias
╸ Errors in study design and execution that causes deviation of findings
from their true value
Azfar Basunia, MD
Clinical decision making
Limitations
╸ Patient values are not commonly emphasized
╸ Large lag between study conductance, result publications, and
application of recommendations
╸ Cognitive biases, anecdotal and/or experience may cause clinicians to
reject EBM
╶ Defensive medicine: practice of over-prescribing tests, Rx,
procedures to reduce threat of lawsuits
╶ Overtreat b/c of experience with rare, shocking outcome or pt’s
emotional needs
Azfar Basunia, MD
Informed consent
for research
Principles of Informed consent
╸ Patient was provided sufficient information regarding diagnosis, options,
benefits and risks of treatment options.
╸ Patient must have decision making capacity (no altered mental status,
acute psychiatric conditions like acute mania, psychosis)
╸ Patient is making the decision voluntarily (no coercion)
@AzfarBasunia
Azfar Basunia, MD
241