Professional Documents
Culture Documents
Clinical Epidemiology The Essentials Robert H. Fletcher, Suzanne, 2014 PDF
Clinical Epidemiology The Essentials Robert H. Fletcher, Suzanne, 2014 PDF
Epidemiology
The Essentials
Clinical
Epidemiology
The Essentials
Fifth Edition
Fifth Edition
Copyright © 2014, 2005, 1996, 1988, 1982 Lippincott Williams & Wilkins, a Wolters Kluwer business.
Printed in China
All rights reserved. This book is protected by copyright. No part of this book may be reproduced or transmitted
in any form or by any means, including as photocopies or scanned-in or other electronic copies, or utilized
by any information storage and retrieval system without written permission from the copyright owner, except
for brief quotations embodied in critical articles and reviews. Materials appearing in this book prepared by
individuals as part of their official duties as U.S. government employees are not covered by the above-mentioned
copyright. To request permission, please contact Lippincott Williams & Wilkins at 2001 Market Street,
Philadelphia, PA 19103, via email at permissions@lww.com, or via website at lww.com (products and services).
9 8 7 6 5 4 3 2 1
Fletcher, Robert H.
Clinical epidemiology : the essentials / Robert H. Fletcher, Suzanne
W. Fletcher, Grant S. Fletcher. – 5th ed.
p. ; cm.
Includes bibliographical references and index.
ISBN 978-1-4511-4447-5 (alk. paper)
I. Fletcher, Suzanne W. II. Fletcher, Grant S. III. Title.
[DNLM: 1. Epidemiologic Methods. WA 950]
614.4–dc23
2012022346
DISCLAIMER
Care has been taken to confirm the accuracy of the information present and to describe generally accepted
practices. However, the authors, editors, and publisher are not responsible for errors or omissions or for any
consequences from application of the information in this book and make no warranty, expressed or implied,
with respect to the currency, completeness, or accuracy of the contents of the publication. Application of this
information in a particular situation remains the professional responsibility of the practitioner; the clinical
treatments described and recommended may not be considered absolute and universal recommendations.
The authors, editors, and publisher have exerted every effort to ensure that drug selection and dosage set
forth in this text are in accordance with the current recommendations and practice at the time of publication.
However, in view of ongoing research, changes in government regulations, and the constant flow of information
relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any
change in indications and dosage and for added warnings and precautions. This is particularly important when
the recommended agent is a new or infrequently employed drug.
Some drugs and medical devices presented in this publication have Food and Drug Administration (FDA)
clearance for limited use in restricted research settings. It is the responsibility of the health care provider to
ascertain the FDA status of each drug or device planned for use in their clinical practice.
To purchase additional copies of this book, call our customer service department at (800) 638-3030 or fax orders
to (301) 223-2320. International customers should call (301) 223-2300.
Visit Lippincott Williams & Wilkins on the Internet: http://www.lww.com. Lippincott Williams & Wilkins
customer service representatives are available from 8:30 am to 6:00 pm, EST.
Preface
This book is for clinicians—physicians, nurses, physi- necessary to make them suitable. Local “journal
cians’ assistants, psychologists, veterinarians, and oth- clubs” now carefully evaluate pivotal articles rather
ers who care for patients—who want to understand than survey the contents of several journals. The
for themselves the strength of the information base National Library of Medicine now includes search
for their clinical decisions. Students of epidemiology terms in MEDLINE for research methods, such as
and public health may also find this book useful as a randomized controlled trial and meta-analysis. In
complement to the many excellent textbooks about short, clinical medicine and epidemiology are mak-
epidemiology itself. ing common cause. “Healing the schism” is what
To reach full potential, modern clinicians should Kerr White called it.
have a basic understanding of clinical epidemiology Third, the care of patients should be fun. It is not
for many reasons. fun to simply follow everyone else’s advice without
First, clinicians make countless patient care deci- really knowing what stands behind it. It is exhaust-
sions every day, some with very high stakes. It is their ing to work through a vast medical literature, or
responsibility to base those decisions on the best even a few weekly journals, without a way of quickly
available evidence, a difficult task because the evi- deciding which articles are scientifically strong and
dence base is vast and continually changing. At the clinically relevant and which are not worth bothering
simplest level, responsible care can be accomplished with. It is unnerving to make high-stakes decisions
by following carefully prepared recommendations without really knowing why they are the right ones.
found in evidence-based guidelines, review articles, or To capture all the enjoyment of their profession, cli-
textbooks, but patient care at its best is far more than nicians need to be confident in their ability to think
that. Otherwise, it would be sufficient to have it done about evidence for themselves, even if someone else
by technicians following protocols. Sometimes the has done the heavy lifting to find and sort that evi-
evidence is contradictory, resulting in “toss-up” deci- dence, by topic and quality, beforehand. It is fun to
sions. The evidence may be weak, but a decision still be able to confidently participate in discussions of
needs to be made. Recommendations that are right clinical evidence regardless of whether it is within
for patients on average need to be tailored to the spe- one’s specialty (as all of us are nonspecialists in every-
cific illnesses, behaviors, and preferences of individual thing outside our specialty).
patients. Expert consultants may disagree, leaving the In this book, we have illustrated concepts with
clinicians with primary responsibility for the patient examples from patient care and clinical research,
in the middle. Special interests—by for-profit compa- rather than use hypothetical examples, because medi-
nies or clinicians whose income or prestige is related cine is so deeply grounded in practical decisions about
to their advice—may affect how evidence is summa- actual patients. The important questions and studies
rized. For these reasons, clinicians need to be able have evolved rapidly and we have updated examples
to weigh evidence themselves in order to meet their to reflect this change, while keeping examples that
responsibilities as health care professionals. represent timeless aspects in the care of patients and
Second, clinical epidemiology is now a central classic studies.
part of efforts to improve the effectiveness of patient As clinical epidemiology becomes firmly estab-
care worldwide. Clinicians committed to research lished within medicine, readers expect more from an
careers are pursuing formal postgraduate training in entry-level textbook. We have, therefore, added new
research methods, often in departments of epidemi- topics to this edition. Among them are comparative
ology. Grants for clinical research are judged largely effectiveness, practical clinical trials, noninferiority
by the principles described in this book. Clinical trials, patient-level meta-analyses, and modern con-
epidemiology is the language of journal peer review cepts in grading evidence-based recommendations.
and the “hanging committees” that decide whether We have also discussed risk, confounding, and effect
research reports should be published and the revisions modification in greater depth.
v
vi Preface
Modern research design and analyses, supported Clinical epidemiology is now considered a cen-
by powerful computers, make it possible to answer tral part of a broader movement, evidence-based
clinical questions with a level of validity and gen- medicine. This is in recognition of the importance, in
eralizability not dreamed of just a few years ago. addition to judging the validity and generalizability
However, this often comes at the cost of complexity, of clinical research results, of asking questions that
placing readers at a distance from the actual data and can be answered by research, finding the available evi-
their meaning. Many of us may be confused as highly dence, and using the best of that evidence in the care
specialized research scientists debate alternative of patients. We have always considered these addi-
meanings of specific terms or tout new approaches to tional competencies important, and we give them
study design and statistical analyses, some of which even more attention in this edition of the book.
seem uncomfortably like black boxes no matter how We hope that readers will experience as much
hard we try to get inside them. In such situations, it enjoyment and understanding in the course of read-
is especially valuable to remain grounded in the basics ing this book as we have in writing it.
of clinical research. We have tried to do just that with
the understanding that readers may well want to go Robert H. Fletcher
on to learn more about this field than is possible from Suzanne W. Fletcher
an introductory textbook alone. Grant S. Fletcher
Acknowledgments
We are fortunate to have learned clinical epidemi- people and now learns from medical students, resi-
ology from its founders. Kerr White was Bob and dents, and faculty colleagues as he teaches them about
Suzanne’s mentor during postgraduate studies at the care for patients at Harborview Hospital and the
Johns Hopkins and convinced us that what matter University of Washington.
are “the benefits of medical interventions in relation While editors of the Journal of General Inter-
to their hazards and costs.” Alvan Feinstein taught nal Medicine and Annals of Internal Medicine, Bob
a generation of young clinician–scholars about the and Suzanne learned from fellow editors, including
“architecture of clinical research” and the dignity of members of the World Association of Medical Edi-
clinical scholarship. Archie Cochrane spent a night tors (WAME), how to make reports of research more
at our home in Montreal when Grant was a boy and complete, clear, and balanced so that readers can
opened our eyes to “effectiveness and efficiency.” understand the message with the least effort. With
David Sackett asserted that clinical epidemiology is our colleagues at UpToDate, the electronic informa-
a “basic science for clinical medicine” and helped the tion source for clinicians and patients, we have been
world to understand. Many others have followed. We developing new ways to make the best available
are especially grateful for our work in common with evidence on real-world, clinical questions readily
Brian Haynes, founding editor of ACP Journal Club; accessible during the care of patients and to make that
Ian Chalmers, who made the Cochrane Collabora- evidence understandable not just to academicians and
tion happen; Andy Oxman, leader of the Rocky investigators, but to full-time clinicians as well.
Mountain Evidence-Based Healthcare Workshop; Ed Wagner was with us at the beginning of this
Peter Tugwell, a founding leader of the International project. With him, we developed a new course in clini-
Clinical Epidemiology Network (INCLEN); and cal epidemiology for the University of North Carolina
Russ Harris, our long-time colleague at the interface School of Medicine and wrote the first edition of this
between clinical medicine and public health at the book for it. Later, that course was Grant’s introduction
University of North Carolina. These extraordinary to this field, to the extent he had not already been intro-
people and their colleagues have created an exciting duced to it at home. Ed remained a coauthor through
intellectual environment that led to a revolution in three editions and then moved on to leadership of
clinical scholarship, bringing the evidence base for Group Health Research Institute and other responsi-
clinical medicine to a new level. bilities based in Seattle. Fortunately, Grant is now on
Like all teachers, we have also learned from our the writing team and contributed his expertise with the
students, clinicians of all ages and all specialties who application of clinical epidemiology to the current prac-
wanted to learn how to judge the validity of clini- tice of medicine, especially the care of very sick patients.
cal observations and research for themselves. Bob and We are grateful to members of the team, led by
Suzanne are grateful to medical students at McGill Lippincott Williams & Wilkins, who translated
University (who first suggested the need for this word processed text and hand-drawn figures into an
book), the University of North Carolina and Har- attractive, modern textbook. We got expert, personal
vard Medical School; fellows in the Robert Wood attention from Catherine Noonan, who guided us in
Johnson Clinical Scholars Program, the International the preparation of this book throughout; Jonathan
Clinical Epidemiology Network (INCLEN), and the Dimes, who worked closely with us in preparing illus-
Harvard General Medicine Fellowship; CRN Schol- trations; and Jeri Litteral, who collaborated with us in
ars in the Cancer Research Network, a consortium of the copy editing phase of this project.
research institutes in integrated health systems; and We are especially grateful to readers all over the
participants in the Rocky Mountain Evidence-Based world for their encouraging comments and practical
Healthcare Workshops. They were our students and suggestions. They have sustained us through the rig-
now are our colleagues; many teach and do research ors of preparing this, the fifth edition of a textbook
with us. Over the years, Grant has met many of these first published 30 years ago.
vii
Contents in Brief
1. Introduction 1
2. Frequency 17
3. Abnormality 31
7. Prognosis 93
8. Diagnosis 108
9. Treatment 132
Index 251
ix
Contents
Introduction
We should study “the benefits of medical interventions in relation to their hazards and
costs.”
—Kerr L. White
1992
KEY WORDS
and sometimes at rest. He gave up smoking
Clinical epidemiology Dependent variable one pack of cigarettes per day 3 years ago
Clinical sciences Extraneous variables and has been told that his blood pressure is
Population sciences Covariates “a little high.” He is otherwise well and takes
Epidemiology Populations no medications, but he is worried about his
Evidence-based Sample health, particularly about heart disease. He
medicine Inference lost his job 6 months ago and has no health in-
Health services Bias surance. A complete physical examination and
research Selection bias resting electrocardiogram are normal except
Quantitative decision Measurement bias for a blood pressure of 150/96 mm Hg.
making Confounding
Cost-effectiveness Chance
analyses Random variation
Decision analyses Internal validity This patient is likely to have many questions. Am I
Social sciences External validity sick? How sure are you? If I am sick, what is causing
Biologic sciences Generalizability my illness? How will it affect me? What can be done
Variables Shared decision about it? How much will it cost?
Independent variable making As the clinician caring for this patient, you have
the same kinds of questions, although yours reflect
greater understanding of the possibilities. Is the
probability of serious, treatable disease high enough
to proceed immediately beyond simple explanation
Example and reassurance to diagnostic tests? How well do
various tests distinguish among the possible causes
A 51-year-old man asks to see you because of chest pain: angina pectoris, esophageal spasm,
of chest pain that he thinks is “indigestion.” muscle strain, anxiety, and the like. For example,
He was well until 2 weeks ago, when he no- how accurately will an exercise stress test be in either
ticed tightness in the center of his chest after confirming or ruling out coronary artery disease? If
a large meal and while walking uphill. The coronary artery disease is found, how long can the
tightness stopped after 2 to 3 minutes of patient expect to have the pain? How likely is it that
rest. A similar discomfort has occurred several other complications—congestive heart failure, myo-
times since then, sometimes during exercise cardial infarction, or atherosclerotic disease of other
organs—will occur? Will the condition shorten his
1
2 Clinical Epidemiology: The Essentials
life? Will reduction of his risk factors for coronary Table 1.1
artery disease (from cigarette smoking and hyperten- Clinical Issues and Questionsa
sion) reduce his risk? Should other possible risk fac-
tors be sought? If medications control the pain, would Issue Question
a coronary revascularization procedure add benefit—
Frequency (Ch. 2) How often does a disease occur?
by preventing a future heart attack or cardiovascular
death? Since the patient is unemployed and without Abnormality (Ch. 3) Is the patient sick or well?
health insurance, can less expensive diagnostic work- Risk (Chs. 5 and 6) What factors are associated with
ups and treatments achieve the same result as more an increased risk of disease?
expensive ones? Prognosis (Ch. 7) What are the consequences of
having a disease?
Clinical Questions and Diagnosis (Ch. 8) How accurate are tests used to
Clinical Epidemiology diagnose disease?
The questions confronting the patient and doctor in Treatment (Ch. 9) How does treatment change the
the example are the types of clinical questions at issue course of disease?
in most doctor–patient encounters: What is “abnor- Prevention (Ch. 10) Does an intervention on well
mal”? How accurate are the diagnostic tests we use? people keep disease from
How often does the condition occur? What are the arising? Does early detection and
risks for a given disease, and how do we determine the treatment improve the course of
risks? Does the medical condition usually get worse, disease?
stay the same, or resolve (prognosis)? Does treatment Cause (Ch. 12) What conditions lead to disease?
really improve the patient or just the test results? Is What are the origins of the
there a way to prevent the disease? What is the under- disease?
lying cause of the disease or condition? and How can a
Four chapters—Risk: Basic Principles (4), Chance (11), Systematic
we give good medical care most efficiently? These Reviews (13), and Knowledge Management (14)—pertain to all of
clinical questions and the epidemiologic methods to these issues.
answer them are the bedrock of this book. The clini-
cal questions are summarized in Table 1.1. Each is cultures, cell membranes, and genetic sequences) or
also the topic of specific chapters in the book. in animals. Clinical epidemiology is the science used
Clinicians need the best possible answers to these to study the 5 Ds in intact humans.
kinds of questions. They use various sources of infor- In modern clinical medicine, with so much order-
mation: their own experiences, the advice of their ing and treating of lab test results (for such things
colleagues, and reasoning from their knowledge of as plasma glucose levels, hematuria, troponins, etc.),
the biology of disease. In many situations, the most it is difficult to remember that laboratory test results
credible source is clinical research, which involves are not the important events in clinical medicine. It
the use of past observations on other similar patients
to predict what will happen to the patient at hand.
The manner in which such observations are made Table 1.2
and interpreted determines whether the conclusions Outcomes of Disease (the 5 Ds)a
reached are valid, and thus how helpful the conclu-
sions will be to patients. Death A bad outcome if untimely
b
Disease A set of symptoms, physical signs, and
Health Outcomes laboratory abnormalities
The most important events in clinical medicine are Discomfort Symptoms such as pain, nausea,
the health outcomes of patients, such as symptoms dyspnea, itching, and tinnitus
(discomfort and/or dissatisfaction), disability, disease, Disability Impaired ability to go about usual
and death. These patient-centered outcomes are some- activities at home, work, or recreation
times referred to as “the 5 Ds” (Table 1.2). They are
Dissatisfaction Emotional reaction to disease and its
the health events patients care about. Doctors should care, such as sadness or anger
try to understand, predict, interpret, and change these
a
outcomes when caring for patients. The 5 Ds can be Perhaps a sixth D, destitution, belongs on this list because the
financial cost of illness (for individual patients or society) is an
studied directly only in intact humans and not in important consequence of disease.
parts of humans (e.g., humeral transmitters, tissue b
Or illness, the patient’s experience of disease.
Chapter 1: Introduction 3
becomes easy to assume that if we can change abnor- aggressively lowering levels of blood sugar does not
mal lab tests toward normal, we have helped the protect against heart disease.) Establishing improved
patient. This is true only to the extent that careful health outcomes in patients is particularly impor-
study has demonstrated a link between laboratory test tant with new drugs because usually pharmacologic
results and one of the 5 Ds. interventions have several clinical effects rather than
just one.
Individual patient
Clinical epidemiology
questions
During their training, clinicians are steeped in Population methods
the biology of disease, the sequence of steps that
leads from subcellular events to disease and its con-
sequences. Thus, it seemed reasonable to assume that Epidemiology Populations
an intervention that lowered blood sugar in diabet-
ics would help protect against heart disease. How-
ever, although very important to clinical medicine,
these biologic mechanisms cannot be substituted
Health services Health care systems
for patient outcomes unless there is strong evidence
confirming that the two are related. (In fact, the
results of studies with several different medications Figure 1.1 ■ The health sciences and their comple-
are raising the possibility that, in type 2 diabetes, mentary relationships.
4 Clinical Epidemiology: The Essentials
patients’ health. Such studies have shown, for exam- in humans, occurring with certain mutations of the
ple, that medical care differs substantially from one X-linked G6PD gene. However, males with com-
small geographic area to another (without corre- monly occurring genetic variants of G6PD deficiency
sponding differences in patients’ health); that surgery are usually asymptomatic, developing hemolysis and
in hospitals that often perform a specific procedure jaundice only when they are exposed to environmen-
tends to have better outcomes than hospitals in which tal oxidant stresses such as certain drugs or infections.
the procedure is done infrequently; and that aspirin Finally, as shown in the example of rosiglitazone
is underutilized in the treatment of acute myocardial treatment for patients with type 2 diabetes, drugs
infarction, even though this simple practice has been often have multiple effects on patient health beyond
shown to reduce the number of subsequent vascular the one predicted by studying disease biology. There-
events by about 25%. These kinds of studies guide fore, knowledge of the biology of disease produces
clinicians in their efforts to apply existing knowledge hypotheses, often very good ones, about what might
about the best clinical practices. happen in patients. But these hypotheses need to be
Other health services sciences also guide patient tested by strong studies of intact human beings before
care. Quantitative decision making includes cost- they are accepted as clinical facts.
effectiveness analyses, which describe the financial In summary, clinical epidemiology is one of
costs required to achieve a good outcome such as pre- many sciences basic to clinical medicine. At best,
vention of death or disease and decision analyses, the various health-related sciences complement one
which set out the rational basis for clinical decisions another. Discoveries in one are confirmed in another;
and the consequences of choices. The social sciences discoveries in the other lead to new hypotheses in the
describe how the social environment affects health- first.
related behaviors and the use of health services.
Biologic sciences, studies of the sequence of
biologic events that lead from health to disease, are
a powerful way of knowing how clinical phenom-
ena may play out at the human level. Historically,
Example
it was primarily the progress in the biologic sciences In the 1980s, clinicians in San Francisco noticed
that established the scientific approach to clinical unusual infections and cancers in homosexual
medicine, and they continue to play a pivotal role. men, conditions previously seen only in pro-
Anatomy explains nerve entrapment syndromes and foundly immunocompromised patients. The
their cause, symptoms, and relief. Physiology and new syndrome was called “acquired immune
biochemistry guide the management of diabetic keto- deficiency syndrome” (AIDS). Epidemiologists
acidosis. Molecular genetics predicts the occurrence established that the men were suffering from a
of diseases ranging from common cardiovascular dis- communicable disease that affected both men
eases and cancer to rare inborn errors of metabolism, and women and was transmitted not only by
such as phenylketonuria and cystic fibrosis. sexual activity but also by needle sharing and
However, understanding the biology of disease, blood products. Laboratory scientists identified
by itself, is often not a sound basis for prediction in the human immunodeficiency virus (HIV) and
intact humans. Too many other factors contribute to have developed drugs specifically targeting the
health and disease. For one thing, mechanisms of dis- structure and metabolism of this virus. Prom-
ease may be incompletely understood. For example, ising drugs, often developed based on under-
the notion that blood sugar in diabetic patients is standing of biological mechanisms, have been
more affected by ingestion of simple sugars (sucrose tested for effectiveness in clinical trials. A new
or table sugar) than by complex sugars such as starch clinical specialty, in the care of patients with
(as in potatoes or pasta) has been dispelled by rig- HIV infection, has arisen. Public health workers
orous studies comparing the effect of these foods have promoted safe sex and other programs to
on blood glucose. Also, it is becoming clear that the prevent HIV infection. Thus, clinicians, epidemi-
effects of genetic abnormalities may be modified by ologists, laboratory scientists, and public health
complex physical and social environments such as officers have all contributed to the control of
diet and exposure to infectious and chemical agents. this new disease, especially in more developed
For example, glucose-6-phosphate dehydrogenase countries, leading to a major increase in sur-
(G6PD) is an enzyme that protects red blood cells vival and improvement in quality of life of HIV-
against oxidant injury leading to hemolysis. G6DP infected individuals
deficiency is the most common enzyme deficiency
6 Clinical Epidemiology: The Essentials
Measurement Bias 30
Example Doctor
10
Blood pressure levels are powerful predictors of
cardiovascular disease. However, multiple stud-
ies have shown that taking a blood pressure Nurse
measurement is not as simple as it seems (12). 0
Correct measurement requires using appropri- 0 5 10
ate procedures, including using a larger cuff size Duration of visit (minutes)
for overweight and obese adults, positioning the Figure 1.3 ■ White coat hypertension. Increase in systolic
patient so that the upper arm is below the level pressure, determined by continuous intraarterial monitor-
of the right atrium and so the patient does not ing, as the blood pressure is taken with a sphygmoma-
have to hold up the arm, and taking the mea- nometer by an unfamiliar doctor or nurse. (Redrawn with
surement in a quiet setting and multiple times. permission from Mancia G, Parati G, Pomidossi G, et al.
If any of these procedures is not done correctly, Alerting reaction and rise in blood pressure during mea-
the resulting measurements are likely to be ar- surement by physician and nurse. Hypertension 1987;9:
tificially and systematically elevated. Another 209–215.)
factor leading to systematically higher blood
pressure readings, sometimes called “white coat
hypertension” (Fig. 1.3), occurs when blood
pressure is measured by physicians, suggesting
that visits to the doctor cause anxiety in patients.
However, clinicians who deflate the blood pres- Example
sure cuff faster than 2 to 3 mm/sec will likely un-
Supplements of antioxidants, such as vitamins
derestimate systolic (but overestimate diastolic)
A, C, and E, are popular with the lay public.
blood pressure. Studies have also shown a ten-
Laboratory experiments and studies of people
dency for clinicians to record values that are just
who choose to take antioxidants suggested
at the normal level in patients with borderline
that antioxidants prevent cardiovascular dis-
high blood pressures. Systematic errors in blood
ease and certain cancers. However, careful
pressure measurements can, therefore, lead to
randomized studies, which are able to avoid
overtreatment or undertreatment of patients in
confounding, routinely found little effect of
clinical practice. Clinical research based on blood
antioxidants (13,14). In fact, when results of
pressure measurements taken during routine
these studies were combined, use of antioxi-
patient care can lead to misleading results unless
dants, especially at high doses, was associated
careful standardized procedures are used. These
with small increases, not decreases, in death
kinds of biases led to the development of blood
rates. How could the results of early studies be
pressure measurement instruments that do not
reconciled with the opposite findings of later,
involve human ears and hands.
carefully controlled, trials? Confounding has
been suggested, as illustrated in Figure 1.4.
Confounding People who take antioxidants on their own are
likely to do other things differently than those
Confounding can occur when one is trying to find who do not take antioxidants—such as exer-
out whether a factor, such as a behavior or drug expo- cise more, watch their weight, eat more veg-
sure, is a cause of disease in and of itself. If the fac- etables, and not smoke—and it may be these
tor of interest is associated or “travels together” with activities, not antioxidants, that led to lower
another factor, which is itself related to the outcome, death rates in the studies not randomizing the
the effect of the factor under study can be confused intervention.
with or distorted by the effect of the other.
Chapter 1: Introduction 9
ANTIOXIDANTS CARDIOVASCULAR
MAIN QUESTION
INTAKE DISEASE
PREVENTION
Age
Aspirin use
Physical activity
POTENTIALLY
CONFOUNDING Body mass index
FACTORS Cigarette smoking
Family history
Diet
Figure 1.4 ■ Confounding. The relationship between antioxidant intake and cardio-
vascular risk is potentially confounded by patient characteristics and behaviors related to
both antioxidant use and development of cardiovascular disease.
an earlier example, the simpler instrument is prone to All patients with the INTERNAL
error or deviations from the true value. In the figure, the condition of interest VALIDITY
error is represented by all of the sphygmomanometer
readings falling to the right of the true value. The devia-
Sampling
tion of sphygmomanometer readings to higher values SAMPLE SAMPLE
(bias) may have several explanations (e.g., the wrong
cuff size, patient anxiety, or “white coat hypertension”).
Selection
Individual blood pressure readings are also subject to bias
error because of random variation in measurement, as
illustrated by the spread of the sphygmomanometer
readings around the mean value (90 mm Hg).
Measurement and
The main reason for distinguishing between bias confounding bias
and chance is that they are handled differently. In
?
theory, bias can be prevented by conducting clini- ?? Chance
cal investigations properly or can be corrected dur-
ing data analysis. If not eliminated, bias often can be CONCLUSION
EXTERNAL
detected by the discerning reader. Most of this book VALIDITY
is about how to recognize, avoid, or minimize bias. (generalizability)
Chance, on the other hand, cannot be eliminated, Figure 1.6 ■ Internal and external validity.
but its influence can be reduced by proper design of
research, and the remaining effect can be estimated
by statistics. No amount of statistical treatment can
validity of assuming that patients in a study are simi-
correct for unknown biases in data. Some statisticians
lar to other patients.
would go so far as to suggest that statistics should not
Every study that is internally valid is generaliz-
be applied to data that are vulnerable to bias because
able to patients very much like the ones in the study.
of poor research design, for fear of giving false respect-
However, an unimpeachable study, with high inter-
ability to fundamentally misleading work.
nal validity, may be totally misleading if its results are
generalized to the wrong patients.
Internal and External Validity
When making inferences about a population from
observations on a sample, clinicians need to make
up their minds about two fundamental questions. Example
First, are the conclusions of the research correct for
the people in the sample? Second, if so, does the sam- What is the long-term death rate in anorexia
ple represent fairly the patients the clinician is most nervosa, an eating disorder mainly afflicting
interested in, the kind of patients in his or her prac- young women? In a synthesis of 42 studies, es-
tice, or perhaps a specific patient at hand (Fig. 1.6)? timated mortality was 15% over 30 years (17).
Internal validity is the degree to which the results These studies, like most clinical research, were
of a study are correct for the sample of patients being of patients identified in referral centers where
studied. It is “internal” because it applies to the relatively severe cases are seen. A study of all
conditions of the particular group of patients being patients developing anorexia in a defined popu-
observed and not necessarily to others. The internal lation provided a different view of the disease.
validity of clinical research is determined by how well Researchers at the Mayo Clinic were able to iden-
the design, data collection, and analyses are carried tify all patients developing this disease in their
out, and it is threatened by all of the biases and ran- city, Rochester, Minnesota, from 1935 to 1989
dom variation discussed earlier. For a clinical observa- (Fig. 1.7) (18). All-cause mortality at 30 years was
tion to be useful, internal validity is a necessary but 7%, half that of reported studies. The predicted
not sufficient condition. mortality in people without anorexia nervosa of
External validity is the degree to which the results the same age and sex was about the same, 6%.
of an observation hold true in other settings. Another Therefore, although some patients do die of an-
term for this is generalizability. For the individual orexia nervosa, most published studies greatly
clinician, it is an answer to the question, “Assuming overestimate the risk, presumably because they
that the results of a study are true, do they apply to report experience with relatively severe cases.
my patients as well?” Generalizability expresses the
12 Clinical Epidemiology: The Essentials
16
15
a great deal more as well, including value judgments
and weighing competing risks and benefits.
14
In recent years, medical decision making has
12
become a valued discipline in its own right. The
field includes qualitative studies of how clinicians
3-Year mortality (%)
and the like. However, most epidemiology books are We have organized this book primarily according
organized around research strategies such as clinical to the questions clinicians encounter when caring for
trials, surveys, case-control studies, and the like. This patients (Table 1.1). Figure 1.8 illustrates how these
way of organizing a book may serve those who perform questions correspond to the book’s chapters, taking
clinical research, but it is often awkward for clinicians. HIV infection as an example. The questions relate to
Population
at risk
Onset of disease
Primary infection
AIDS-defining illness
Kaposi sarcoma
Pneumocystis infection
Disseminated mycobacterium
avium infection
Figure 1.8 ■ Organization of this book in relation to the natural history of human immunode-
ficiency virus (HIV) infection. Chapters 11, 13, and 14 describe cross-cutting issues related to all points
in the natural history of disease.
14 Clinical Epidemiology: The Essentials
the entire natural history of disease, from the time Some strategies, such as cohort studies, are use-
people without HIV infection are first exposed to ful for answering several different kinds of clinical
risk, to when some acquire the disease and emerge as questions. For the purposes of presentation, we
patients, through complications of the disease, AIDS- have discussed each strategy primarily in one chap-
defining illness, to survival or death. ter and have simply referred to the discussion when
In each chapter, we describe research strategies the method is relevant to other questions in other
used to answer that chapter’s clinical questions. chapters.
Review Questions
Questions 1.1–1.6 are based on the following 1.3. Fewer patients who did not have surgery
clinical scenario. remained under care at the clinic 2 months
after surgery.
A 37-year-old-woman with low back pain for the past
A. Selection bias
4 weeks wants to know if you recommend surgery.
B. Measurement bias
You prefer to base your treatment recommendations
C. Confounding
on research evidence whenever possible. In the stron-
D. Chance
gest study you can find, investigators reviewed the
E. External validity (generalizability)
medical records of 40 consecutive men with low back
pain under care at their clinic—22 had been referred
1.4. The patients who were referred for surgery
for surgery, and the other 18 patients had remained
were younger and fitter than those who
under medical care without surgery. The study
remained under medical care.
compared rates of disabling pain after 2 months.
All of the surgically treated patients and 10 of the A. Selection bias
medically treated patients were still being seen in the B. Measurement bias
clinic throughout this time. Rates of pain relief were C. Confounding
slightly higher in the surgically treated patients. D. Chance
E. External validity (generalizability)
For each of the following statements, circle
the one response that best represents the 1.5. Compared with patients who had medical care
corresponding threat to validity. alone, patients who had surgery might have
been less likely to report whatever pain they
1.1. Because there are relatively few patients had and the treating physicians might have
in this study, it may give a misleading been less inclined to record pain in the medical
impression of the actual effectiveness of record.
surgery.
A. Selection bias
A. Selection bias B. Measurement bias
B. Measurement bias C. Confounding
C. Confounding D. Chance
D. Chance E. External validity (generalizability)
E. External validity (generalizability)
1.6. Patients without other medical conditions
1.2. The results of this study may not apply were both more likely to recover and more
to your patient, a woman, because all the likely to be referred for surgery.
patients in the study were men.
A. Selection bias
A. Selection bias B. Measurement bias
B. Measurement bias C. Confounding
C. Confounding D. Chance
D. Chance E. External validity (generalizability)
E. External validity (generalizability)
Chapter 1: Introduction 15
For questions 1.7–1.11, select the best answer. the rates of subsequent coronary events were
compared in employees who volunteered
1.7. Histamine is a mediator of inflammation in for the program and those who did not
patients with allergic rhinitis (“hay fever”). volunteer. The development of CHD was
Based on this fact, which of the following is determined by means of regular voluntary
true? checkups, including a careful history, an
A. Drugs that block the effects of histamines electrocardiogram, and a review of routine
will relieve symptoms. health records. Surprisingly, the members of
B. A fall in histamine levels in the nose is a the exercise group developed higher rates of
reliable marker of clinical success. CHD even though fewer of them smoked
C. Antihistamines may be effective, and cigarettes. This result is least likely to be
their effects on symptoms (e.g., itchy explained by which of the following?
nose, sneezing, and congestion) should be A. The volunteers were at higher risk
studied in patients with allergic rhinitis. for developing CHD than those not
D. Other mediators are not important. volunteering before the study began.
E. If laboratory studies of disease are B. The volunteers did not actually increase
convincing, clinical research is unnecessary. their exercise and the amount of exercise
was the same in the two groups.
1.8. Which of the following statements about C. Volunteers got more check-ups, and silent
samples of populations is incorrect? myocardial infarctions were, therefore,
A. Samples of a populations may have more likely to have been diagnosed in the
characteristics that differ from the exercise group.
population even though correct sampling
1.11. Ventricular premature depolarizations are
procedures were followed.
associated with an increased risk of sudden
B. Samples of populations are the only
death from a fatal arrhythmia, especially in
feasible way of studying the population.
people with other evidence of heart disease.
C. When populations are correctly sampled,
You have read there is a new drug for
external validity is ensured.
ventricular premature depolarizations.
D. Samples of populations should be selected in
What is the most important thing you
a way that every member of the population
would like to know about the drug before
has an equal chance of being chosen.
prescribing it to a patient?
1.9. You are making a treatment decision with a A. The drug’s mechanism of action.
72-year-old man with colon cancer. You are B. How well the drug prevents ventricular
aware of several good studies that have shown premature depolarizations in people using
that a certain drug combination prolongs the the drug compared to those who do not
life of patients with colon cancer. However, use the drug.
all the patients in these studies were much C. The rate of sudden death in similar people
younger. Which of the statements below is who do and do not take the drug.
correct?
A. Given these studies, the decision about Questions 1.12–1.15 are based on the following
this treatment is a matter of personal clinical scenario.
judgment.
B. Relying on these studies for your patient Because reports suggested estrogens increase the risk
is called internal validity. of clotting, a study compared the frequency of oral
C. The results in these studies are affected by contraceptive use among women admitted to a hos-
chance but not bias. pital with thrombophlebitis and a group of women
admitted for other reasons. Medical records were
1.10. A study was done to determine whether reviewed for indication of oral contraceptive use in
regular exercise lowers the risk of coronary the two groups. Women with thrombophlebitis were
heart disease (CHD). An exercise program found to have been using oral contraceptives more fre-
was offered to employees of a factory, and quently than the women admitted for other reasons.
16 Clinical Epidemiology: The Essentials
1.12. Women with thrombophlebitis may 1.14. The number of women in the study was small.
have reported the use of contraceptives
A. Selection bias
more completely than women without
B. Measurement bias
thrombophlebitis because they remembered
C. Confounding
hearing of the association.
D. Chance
A. Selection bias E. External validity (generalizability)
B. Measurement bias
C. Confounding 1.15. The women with thrombophlebitis were
D. Chance admitted to the hospital by doctors
E. External validity (generalizability) working in different neighborhoods than
the physicians of those that did not have
1.13. Doctors may have questioned women with thrombophlebitis.
thrombophlebitis more carefully about
A. Selection bias
contraceptive use than they did those
B. Measurement bias
without thrombophlebitis (and recorded the
C. Confounding
information more carefully in the medical
D. Chance
record) because they were aware that estrogen
E. External validity (generalizability)
could cause clotting.
A. Selection bias Answers are in Appendix A.
B. Measurement bias
REFERENCES
1. Home PD, Pocock SJ, Beck-Nielsen H, et al. Rosiglitazone 11. Sackett DL. Bias in analytic research. J Chronic Dis 1979;32:
evaluated for cardiovascular outcomes in oral agent combina- 51–63.
tion therapy for type 2 diabetes (RECORD): a multicentre, 12. Pickering TG, Hall JE, Appel LJ, et al. Recommendations
randomized, open-label trial. Lancet 2009;373:2125–2135. for blood pressure in humans and experimental animals. Part
2. Lipscombe LL, Gomes T, Levesque LE, et al. Thiazolidinedio- 1: Blood pressure measurement in humans. A statement for
nes and cardiovascular outcomes in older patients with diabe- professionals from the Subcommittee of Professional and
tes. JAMA 2007;298:2634–2643. Public Education of the American Heart Association Coun-
3. Nissen SE, Wolski K. Effect of rosiglitazone on the risk of cil on High Blood Pressure Research. Circulation 2005;111:
myocardial infarction and death from cardiovascular causes. 697–716.
N Engl J Med 2007;356:2457–2471. 13. Bjelakovic G, Nikolova D, Gluud LL, et al. Mortality in random-
4. Friedman GD. Primer of Epidemiology, 5th ed. New York: ized trials of antioxidant supplements for primary and secondary
Appleton and Lange; 2004. prevention: systematic review and meta-analysis. JAMA 2007;
5. Straus SE, Richardson WS, Glasziou P, et al. Evidence-Based 297(8):842–857.
Medicine: How to Practice and Teach EBM, 4th ed. New 14. Vevekananthan DP, Penn MS, Sapp SK, et al. Use of anti-
York: Churchill Livingstone; 2011. oxidant vitamins for the prevention of cardiovascular dis-
6. Stuebe AM. Level IV evidence—adverse anecdote and clinical ease: meta-analysis of randomized trials. Lancet 2003;361:
practice. N Engl J Med 2011;365(1):8–9. 2017–2023.
7. Murphy EA. The Logic of Medicine. Baltimore: Johns Hop- 15. Norman RJ, Nisenblat V. The effects of caffeine on fertility
kins University Press; 1976. and on pregnancy outcomes. In: Basow DS, ed. UpToDate.
8. Porta M. A Dictionary of Epidemiology, 5th ed. New York: Waltham, MA: UpToDate; 2011.
Oxford University Press; 2008. 16. Savitz DA, Chan RL, Herring AH, et al. Caffeine and miscar-
9. McCormack K, Scott N, Go PM, et al. Laparoscopic techniques riage risk. Epidemiology 2008;19:55–62.
versus open techniques for inguinal hernia repair. Cochrane 17. Sullivan PF. Mortality in anorexia nervosa. Am J Psychiatry
Database Systematic Review 2003;1:CD001785. Publication 1995;152:1073–1074.
History: Edited (no change to conclusions) 8 Oct 2008. 18. Korndorfer SR, Lucas AR, Suman VJ, et al. Long-term
10. Neumayer L, Giobbie-Hurder A, Jonasson O, et al. Open survival of patients with anorexia nervosa: a population-
mesh versus laparoscopic mesh repair of inguinal hernia. based study in Rochester, Minn. Mayo Clin Proc 2003;78:
N Eng J Med 2004;350:1819–1827. 278–284.
Chapter 2
Frequency
Here, it is necessary to count.
—P.C.A. Louis†
1787–1872a
KEY WORDS
Numerator Cohort studies
Example
Denominator Cumulative A 72-year-old man presents with slowly pro-
Prevalence incidence gressive urinary frequency, hesitancy, and drib-
Point prevalence Incidence density bling. A digital rectal examination reveals a
Period prevalence Person-time symmetrically enlarged prostate gland and no
Incidence Dynamic population nodules. Urinary flow measurements show a
Duration of disease Population at risk reduction in flow rate, and his serum prostate-
Case fatality rate Random sample specific antigen (PSA) is not elevated. The cli-
Survival rate Probability sample nician diagnoses benign prostatic hyperplasia
Complication rate Sampling fraction (BPH). In deciding on treatment, the clinician
Infant mortality rate Oversample and patient must weigh the benefits and haz-
Perinatal mortality Convenience samples ards of various therapeutic options. To simplify,
rate Grab samples let us say the options are medical therapy with
Prevalence studies Epidemic drugs or surgery. The patient might choose
Cross-sectional studies Pandemic medical treatment but runs the risk of worsening
Surveys Epidemic curve symptoms or obstructive renal disease because
Cohort Endemic the treatment is less immediately effective than
surgery. Or he might choose surgery, gaining
immediate relief of symptoms but at the risk
Chapter 1 outlined the questions that clinicians need
of operative mortality and long-term urinary
to answer as they care for patients. Answers are usu-
incontinence and impotence.
ally in the form of probabilities and only rarely as cer-
tainties. Frequencies obtained from clinical research
are the basis for probability estimates for the purposes
of patient care. This chapter describes basic expres- Decisions such as the one this patient and clinician
sions of frequency, how they are obtained from clini- face have traditionally relied on clinical judgment
cal research, and how to recognize threats to their based on experience at the bedside and in the clinics.
validity. In modern times, clinical research has become suffi-
ciently strong and extensive that it is possible to ground
clinical judgment in research-based probabilities—
†
A 19th Century physician and proponent of the “numerical frequencies. Probabilities of disease, improvement,
method” (relying on counts, not impressions) to understand the deterioration, cure, side effects, and death are the
natural history of diseases such as typhoid fever. basis for answering most clinical questions. For this
17
18 Clinical Epidemiology: The Essentials
patient, sound clinical decision making requires accu- event could have occurred (population). The two basic
rate estimates of how his symptoms and complica- measures of frequency are prevalence and incidence.
tions of treatment will change over time according to
which treatment is chosen. Prevalence
Prevalence is the fraction (proportion or percent) of
ARE WORDS SUITABLE a group of people possessing a clinical condition or
SUBSTITUTES FOR NUMBERS? outcome at a given point in time. Prevalence is mea-
sured by surveying a defined population and counting
Clinicians often communicate probabilities as words the number of people with and without the condition
(e.g., usually, sometimes, rarely) rather than as num- of interest. Point prevalence is measured at a sin-
bers. Substituting words for numbers is convenient gle point in time for each person (although actual
and avoids making a precise statement when one is measurements need not necessarily be made at the
uncertain about a probability. However, words are same point in calendar time for all the people in the
a poor substitute for numbers because there is little population). Period prevalence describes cases that
agreement about the meanings of commonly used were present at any time during a specified period
adjectives describing probabilities. of time.
Incidence
Example Incidence is the fraction or proportion of a group of
people initially free of the outcome of interest that devel-
Physicians were asked to assign percentage val- ops the condition over a given period of time. Incidence
ues to 13 expressions of probability (1). These refers then to new cases of disease occurring in a popula-
physicians generally agreed on probabilities tion initially free of the disease or new outcomes such as
corresponding to adjectives such as “always” or symptoms or complications occurring in patients with a
“never” describing very likely or very unlikely disease who are initially free of these problems.
events but not on expressions associated with Figure 2.1 illustrates the differences between inci-
less extreme probabilities. For example, the dence and prevalence. It shows the occurrence of
range of probabilities (from the top to the bot-
tom tenth of attending physicians) was 60% to
2010 2011 2012
90% for “usually,” 5% to 45% for sometimes,
and 1% to 30% for “seldom.” This suggests (as
authors of an earlier study had asserted) that
“difference of opinion among physicians regard-
ing the management of a problem may reflect
differences in the meaning ascribed to words
used to define probability” (2).
Table 2.1
Characteristics of Incidence and Prevalence
study would, therefore, miss nearly all these events Similarly, the prevalence of prostate cancer on
and underestimate the true burden of coronary heart autopsy is so much higher than its incidence that the
disease in the community. In contrast, diseases of majority of these cancers must never become symp-
long duration are well represented in prevalence sur- tomatic enough to be diagnosed during life.
veys, even when their incidence is low. The incidence
of inflammatory bowel disease in North America is
only about 2 to 14 per 100,000/year, but its preva- SOME OTHER RATES
lence is much higher, 37 to 246/100,000, reflecting
Table 2.2 summarizes some rates used in health care.
the chronic nature of the disease (4).
Most of them are expressions of events over time. For
The relationship among incidence, prevalence
example, a case fatality rate (or alternatively, the
and duration of disease in a steady state, in which
survival rate) is the proportion of people having a
none of the variables is changing much over time, is
disease who die of it (or who survive it). For acute dis-
approximated by the following expression:
eases such as Ebola virus infection, follow-up time may
Prevalence = Incidence × Average be implicit, assuming that deaths are counted over a
duration of the disease long enough period of time (in this case, a few weeks)
Alternatively, to account for all of them that might have occurred.
For chronic diseases such as cardiovascular disease or
Prevalence/Incidence = Duration cancer, it is more usual to specify the period of obser-
vation (e.g., the 5-year survival rate). Similarly, com-
plication rate, the proportion of people with a disease
or treatment who experience complications, assumes
Example that enough time has passed for the complications to
have occurred. These kinds of measures can be under-
The incidence and prevalence of ulcerative estimations if follow-up is not really long enough.
colitis were measured in Olmstead County, For example, surgical site infection rates have been
Minnesota, from 1984 to 1993 (5). Incidence underreported because they have been counted up to
was 8.3/100,000 person-years and prevalence the time of hospital discharge, whereas some wound
was 229/10,000 persons. The average dura- infections are first apparent after discharge (6).
tion of this disease can then be estimated as Other rates, such as infant mortality rate and
229/100,000 divided by 8.3/100,000 = 28 years. perinatal mortality rate (defined in Table 2.2) are
Thus, ulcerative colitis is a chronic disease con- approximations of incidence because the children
sistent with a long life expectancy. The assump- in the numerator are not necessarily those in the
tion of steady state was met because data from denominator. In the case of infant mortality rate for a
this same study showed that incidence changed given year, some of the children who die in that year
little during the interval of study. Although were born in the previous year; similarly, the last chil-
rates are different in different parts of the dren to be born in that year may die in the following
world and are changing over longer periods of year. These rates are constructed in this way to make
time, all reflect a chronic disease. measurement more feasible, while providing a useful
approximation of a true rate in a given year.
Table 2.2
Some Commonly Used Rates
Population
at risk
No
Sample
Yes
Example Community
Population
A study of the incidence of herpes zoster in-
fections (“shingles”) and its complications pro-
vides and example of both incidence density
and cumulative incidence. Investigators in Ol-
Die
mstead County, Minnesota reviewed medical
records of adult residents of the county (8).
Move out
Other studies showed that more than 98% of
Figure 2.3 ■ A dynamic population.
Chapter 2: Frequency 23
10
8
Population (%)
Overweight
6
Class I
4
Normal
weight Class II
2
Under- Class III
weight
0 Obese
10 15 20 25 30 35 40 45 50 55
Body mass index
Figure 2.4 ■ The prevalence of overweight and obesity in men, 2007 to
2008. (Data from Flegal KM, Carroll MD, Ogden CL, et al. Prevalence and trends in
obesity among US adults, 1999–2008. JAMA 2010;303(3):235–241.)
24 Clinical Epidemiology: The Essentials
Table 2.3
Classification of Obesity According to Example
the U.S. National Institutes of Health
and World Health Organization Many cases of prostate cancer remain indolent
2 and are not detected during life. With use of
Classification Body Mass Index (kg/m )
prostate-specific antigen (PSA), a blood test for
Underweight <18.5 prostate cancer, more of these indolent cases
Normal weight 18.5–24.9 are now found. The test is relatively sensitive
Overweight 25.0–29.9 and leads to prostate biopsies that discover oth-
erwise undetected cancers. The result of wide-
Obesity ≥30
spread PSA testing in the United States has been
Obesity Class I 30.0–34.9 a rapid rise in the reported incidence of prostate
Obesity Class II 35.0–39.9 cancer, more than double in a few years (11)
Obesity Class III ≥40
(Fig. 2.5). The rise in incidence probably does not
(“severe,” “extreme,” reflect an increase in the true incidence in the
or “morbid”) population because it has occurred so fast and
because similar increases have not been seen
Data from Flegal KM, Carroll MD, Ogden CL et al. Prevalence
and trends in obesity among US adults, 1999–2008. JAMA
in countries where PSA testing is less common.
2010;303:235–241. Incidence has subsequently fallen somewhat,
presumably because the reservoir of prevalent
cases, brought to attention by this new test,
has been exhausted. However, incidence has
pertains to clinical situations, whereas the higher not fallen to pre-PSA levels and seems to have
rate tells us something about the biology of this reached a higher plateau, suggesting that new
disease. (incident) cases are being diagnosed more fre-
Incidence can also change if a more sensitive ways quently since PSA testing was introduced.
of detecting disease is introduced.
250
Age-adjusted Incidence / 100,000
200
150
100
PSA Approval
50
0
1975 1980 1985 1990 1995 2000 2005 2007
Year of diagnosis
Figure 2.5 ■ Incidence depends on the intensity of efforts to find cases. Incidence of
prostate cancer in the United States during the widespread use of screening with prostate-
specific antigen (PSA). (Redrawn with permission from Wolf AMD, Wender RC, Etzioni RB
et al. American Cancer Society guideline for the early detection of prostate cancer: Update
2010. CA Cancer Journal for Clinicians 2010;60:70–98.)
Chapter 2: Frequency 25
What Is the Population? being selected. Probability samples are useful because
Defining the Denominator it is often more informative to include in the sample
a sufficient number of people in particular subgroups
A rate is useful only to the extent that the popula- of interest, such as ethnic minorities or the elderly. If
tion in which it is measured—the denominator of the members of these subgroups comprise only a small
rate—is clearly defined and right for the question. proportion of the population, a simple random sam-
Three characteristics of the denominator are espe- ple of the entire population might not include enough
cially important. of them. To remedy this, investigators can vary the
First, all members of the population should be sampling fraction, the fraction of all members of
susceptible to the outcome of interest; that is, they each subgroup included in the sample. Investigators
should comprise a population at risk. If members of can oversample low-frequency groups relative to the
the population cannot experience the event or condi- rest, that is, randomly select a larger fraction of them.
tion counted in the numerator, they do not belong in The final sample will still be representative of the
the denominator. For example, rates of cervical can- entire population if the different sampling fractions
cer should be assessed in women who still have a cer- are taken into account in the analysis.
vix; to the extent that cervical cancer rates are based On average, the characteristics of people in prob-
on populations that include women who have had ability samples are similar to those of the population
hysterectomies (or for that matter, men), true rates from which they were selected, particularly when the
will be underestimated. sample is large. To the extent that the sample differs
Second, the population should be relevant to the from the parent population, it is by chance and not
question being asked. For example, if we wanted to because of systematic error.
know the prevalence of HIV infection in the commu- Non-random samples are common in clinical
nity, we would study a random sample of all people research for practical reasons. They are called con-
in a region. But if we wanted to know the prevalence venience samples (because their main virtue is that
of HIV infection among people who use street drugs, they were convenient to obtain, such as samples of
we would study them. patients who are visiting a medical facility, are coop-
Third, the population should be described in erative, and are articulate) or grab samples (because
sufficient detail so that it is a useful basis for judg- the investigators just grabbed patients wherever they
ing to whom the results of the prevalence study could find them).
applies. What is at issue here is the generalizability Most patients described in the medical literature
of rates—deciding whether a reported rate applies and encountered by clinicians are biased samples
to the kind of patients that you are interested in. of their parent population. Typically, patients are
A huge gradient in rates of disease (e.g., for HIV included in research because they are under care in
infection) exists across practice settings from the an academic institution, are available, are willing to
general population to primary care practice to refer- be studied, are not afflicted with diseases other than
ral centers. Clinicians need to locate the reported the one under study, and perhaps also are particularly
rates on that spectrum if they are to use the infor- interesting, severely affected, or both. There is nothing
mation effectively. wrong with this practice as long as it is understood
to whom the results do (or do not) apply. However,
Does the Study Sample because of biased samples, the results of clinical
Represent the Population? research often leave thoughtful clinicians with a large
generalizability problem, from the research setting to
As mentioned in Chapter 1, it is rarely possible to their practice.
study all the people who have or might develop the
condition of interest. Usually, one takes a sample so
that the number studied is of manageable size. This DISTRIBUTION OF DISEASE BY
leads to a central question: Does the sample accu- TIME, PLACE, AND PERSON
rately represent the parent population?
Random samples are intended to produce repre- Epidemiology has been described as the study of the
sentative samples of the population. In a simple ran- determinants of the distribution of disease in popula-
dom sample, every individual in the population has tions. Major determinants are time, place, and person.
an equal probability of being selected. A more general Distribution according to these factors can provide
term, probability sample, is used when every per- strong clues to the causes and control of disease, as
son has a known (not necessarily equal) probability of well as to the need for health services.
26 Clinical Epidemiology: The Essentials
Time
and signs of a febrile respiratory illness, chest
An epidemic is a concentration of new cases in radiograph changes, lack of response to anti-
time. The term pandemic is used when a disease is biotics, and normal or decreased white blood
especially widespread, such as a global epidemic of cell count. Later, as more became known
particularly severe influenza (e.g., the one in 1918– about this new disease, laboratory testing
1919) and the more slowly developing but world- for the responsible coronavirus could be used
wide rise in HIV infection/AIDS. The existence of an to define a case. Cases were called “reported”
epidemic is recognized by an epidemic curve that to make clear that there was no assurance
shows the rise and fall of cases of a disease over time that all cases in the Beijing community were
in a population. detected.
Figure 2.6 also indicates when major con-
trol measures were instituted. The epidemic
declined in relation to aggressive quarantine
Example measures involving the closing of public gath-
ering places, identifying new cases early in their
Figure 2.6 shows the epidemic curve for Se- course, removing cases from the community,
vere Acute Respiratory Syndrome (SARS), in and isolating cases in facilities specifically for
Beijing, People’s Republic of China, during SARS. It is possible that the epidemic abated for
the spring of 2003 (12). In all, 2,521 proba- reasons other than these control measures, but
ble cases were reported during the epidemic. it is unlikely given that similar control measures
Cases were called “probable” because, at the in other places were also followed by a resolu-
time, there were no hard and fast criteria for tion of the epidemic. Whatever the cause, the
diagnosis. The working definition of a case decline in new cases allowed the World Health
included combinations of the following epi- Organization to lift its advisory against travel
demiologic and clinical phenomena: contact to Beijing so that the city could reopen public
with a patient with SARS or living or visiting places and resume normal international busi-
an area where SARS was active, symptoms ness and tourism.
100
SARS made reportable All patients with SARS
Contact tracing begins in designated hospitals
50
0
Mar 7 Mar 14 Mar 21 Mar 28 Apr 4 Apr 11 Apr 18 Apr 25 May 2 May 9 May 15 May 23 May 30
Date of hospitalization
Figure 2.6 ■ An epidemic curve. Probable cases of severe acute respiratory syndrome in Beijing March 2003 through
May 2003, in relation to control measures. (Adapted with permission from Pang X, Zhu Z, Xu F, et al. Evaluation of
control measures implemented in the severe acute respiratory syndrome outbreak in Beijing, 2003. JAMA 2003;290:
3215–3221.)
Chapter 2: Frequency 27
5.0 cases/10,000/year
Figure 2.7 ■ Colorectal cancer incidence for men according to area of the globe. (Data from Center MM, Jemal A,
Smith RA, et al. Worldwide variations in colorectal cancer. CA Cancer J Clin 2009;59:366–378.)
28 Clinical Epidemiology: The Essentials
by an infectious agent transmitted in semen and Hodgkin disease, aplastic anemia, or systemic lupus ery-
blood. Laboratory studies confirmed this hypothesis thematosus. In contrast, some referral hospitals are well
and discovered the human immunodeficiency virus. prepared for just these diseases, and appropriately so.
Identification of the kinds of people most affected also
led to special efforts to prevent spread of the disease in What Are Prevalence Studies Not
them—for example, by targeting education about safe Particularly Good For?
sex to those communities, closing public bathhouses,
and instituting safe-needle programs. Prevalence studies provide only weak evidence of
cause and effect. Causal questions are inherently about
new events arising over time; that is, they are about
USES OF PREVALENCE STUDIES incidence. One of the other limitations of prevalence
Properly performed prevalence studies are the very studies, for this purpose, is that it may be difficult to
best ways of answering some important questions and know whether the purported cause actually preceded
are a weak way of answering others. or followed the effect because the two are measured at
the same point in time. For example, if inpatients with
What Are Prevalence hyperglycemia are more often infected, is it because
hyperglycemia impairs immune function leading to
Studies Good For? infection or has the infection caused the hyperglyce-
Prevalence studies provide valuable information about mia? If a risk factor (e.g., family history or a genetic
what to expect in different clinical situations. marker) is certain to have preceded the onset of dis-
ease or outcome, interpretation of the cause-and-effect
sequence is less worrisome.
Another limitation is that prevalence may be the
Example result of incidence of disease, the main consideration
The approach to cervical lymphadenopathy
in causal questions, or it may be related to duration
depends on where and in whom it is seen.
of disease, an altogether different issue. With only
Children with persistent cervical adenopathy
information about prevalence, one cannot determine
seen in primary care practice have only a 0.4%
how much each of the two, incidence and duration,
probability that the node represents cancer,
contributes. Nevertheless, cross-sectional studies can
mainly lymphoma. Clinicians should have a
provide compelling hypotheses about cause and effect
high threshold for biopsy, the definitive way of
to be tested by stronger studies.
determining whether or not cancer is present.
The underlying message is that a well-performed
However, adults seen in primary care practice
cross-sectional study, or any other research design, is
have a 4% chance of having an underlying can-
not inherently strong or weak but is only in relation
cer of the head and neck. For them, clinicians
to the question it is intended to answer.
should have a lower threshold for lymph node
biopsy. Rates of malignancy in referral centers
are much higher, about 60% in adults with cer-
vical adenopathy, and biopsies are usually done.
Example
The situation is different in different parts of Children living on farms are less likely to have
the world. In resource-poor countries, myco- asthma than children who live in the same re-
bacterial infections are a more common cause gion but not on farms. Farm and urban environ-
of lymphadenopathy than cancer. Thus, knowl- ments differ in exposure to microbes, suggesting
edge of prevalence helps clinicians prioritize that exposure to microbes may protect against
the diagnostic possibilities in their particular asthma. But the environments differ in many
setting and for the particular patient at hand. other ways. To strengthen this hypothesis, in-
vestigators in Germany examined mattress dust
and also settled dust from children’s bedrooms
Prevalence of disease strongly affects the interpre- and found that greater microbial diversity was
tation of diagnostic test results, as will be described in inversely related to asthma, independent of
greater detail in Chapter 8. farming (14). Together, these observations sug-
Finally, prevalence is an important guide to planning gest that exposure to a wider range of microbes
health services. In primary care practice, being prepared may protect against asthma, a causal hypothesis
for diabetes, obesity, hypertension, and lipid disorders to be tested by stronger research.
should demand more attention than planning for
Chapter 2: Frequency 29
Review Questions
Read the following statements and mark the about 1/100 persons. On average, how many
best answer. years does the disease persist?
A. 10
2.1. Cancer registries report 40 new cases of
B. 25
bladder cancer per 100,000 men per year.
C. 33
Cases were from a complete count of all
D. 40
patients who developed bladder cancer in
E. 50
several regions of the United States, and the
number of men at risk was estimated from 2.6. Which of the following studies is not a cohort
the census data in those regions. Which rate study?
is this an example of?
A. The proportion of patients with stomach
A. Point prevalence cancer who survive 5 years
B. Period prevalence B. The risk of developing diabetes mellitus in
C. Incidence density children according to their weight
D. Cumulative incidence C. Complications of influenza vaccine
E. Complication rate among children vaccinated in 2011
D. The earlier course of disease in a group of
2.2. Sixty percent of adults in the U.S. population patients now under care in a clinic
have a serum cholesterol >200mg/dL (5.2 E. Patients admitted to an intensive care unit
mmol/L). Which rate is this an example of? and followed up for whether they are still
A. Point prevalence alive at the time of hospital discharge
B. Complication rate 2.7. A sample for a study of incidence of
C. Incidence density medication errors is obtained by enrolling
D. Cumulative Incidence every 10th patient admitted to a hospital.
E. Period prevalence What kind of sample is this?
2.3. You are reading a study of the prevalence of A. Stratified sample
uterine cervix infections and want to decide if B. Probability sample
the study is scientifically sound. Which of the C. Convenience sample
following is not important? D. Random sample
E. Oversample
A. Participants are followed up for a sufficient
period of time for anemia to occur. 2.8. Cohort studies of children with a first febrile
B. The study is done on a representative seizure have shown that they have a one in
sample of the population. three chance of having another seizure during
C. All members of the population are childhood. What kind of rate is this?
women.
D. Cervical infection is clearly defined. A. Point prevalence
E. The study is done on a sample from a B. Complication rate
defined population. C. Cumulative Incidence
D. Period prevalence
2.4. A probability sample of a defined population: E. Incidence density
2.10. Infection with a fungus, coccidioidomycosis, apparent after birth, most often in adolescence),
is common in the deserts of the southwestern which would be an appropriate cohort?
United States and in Mexico, but uncommon
A. Children born in North Carolina in 2012
elsewhere. Which of the following best
and examined for scoliosis until they are
describes this infection?
adults
A. Endemic B. Children who were referred to an
B. Pandemic orthopedic surgeon for treatment
C. Incident C. Children who were found to have scoliosis
D. Epidemic in a survey of children in North Carolina
E. Prevalent D. Children who have scoliosis and are
available for study
2.11. Twenty-six percent of adults report having E. Children who were randomly sampled
experienced back pain lasting at least a day in from the population of North Carolina
the prior 3 months. Which of the following in the spring of 2012
best describes this rate?
2.14. Which of the following are prevalence studies
A. Cumulative Incidence especially useful for?
B. Incidence density
C. Point prevalence A. Describing the incidence of disease
D. Complication rate B. Studying diseases that resolve rapidly
E. Period prevalence C. Estimating the duration of disease
D. Describing the proportion of people in a
2.12. Which of the following best describes a defined population with the condition of
“dynamic population”? interest
E. Establishing cause and effect
A. It is rapidly increasing in size.
B. It is uniquely suited for cohort studies. 2.15. Last year, 800,000 Americans died of heart
C. People are continually entering and disease or stoke. Which of the following best
leaving the population. describes this statistic?
D. It is the basis for measuring cumulative
incidence. A. Incidence density
E. It is the best kind of population for a B. Point prevalence
random sample. C. Cumulative Incidence
D. Period prevalence
2.13. For a study of the incidence of idiopathic E. None of the above
scoliosis (a deformity of the spine that becomes Answers are in Appendix A.
REFERENCES
1. Roberts DE, Gupta G. Letter to the editor. New Engl J Med before zoster vaccine introduction. Mayo Clin Proc 2007;82:
1987;316:550. 1341–1349.
2. Bryant GD, Norman GR. Expressions of probability: words 9. Flegal KM, Carroll MD, Ogden CL, et al. Prevalence and
and numbers. N Engl J Med 1980;302:411. trends in obesity among US adults, 1999-2008. JAMA 2010;
3. Toogood JH. What do we mean by “usually”? Lancet 1980;1:1094 303:235–241.
4. Loftus EV Jr. Clinical epidemiology of inflammatory bowel 10. Jenkins C, Costello J, Hodge L. Systematic review of preva-
disease: Incidence, prevalence, and environmental influences. lence of aspirin induced asthma and its implications for clini-
Gastroenterology 2004;126:1504–1517. cal practice. BMJ 2004;328:434–437.
5. Loftus EV Jr, Silverstein MD, Sandborn WJ, et al. Ulcerative 11. Wolf AMD, Wender RC, Etzioni RB, et al. American Cancer
colitis in Olmstead County, Minnesota, 1940-1993: inci- Society Guideline for the early detection of prostate cancer:
dence, prevalence, and survival. Gut 2000;46:336–343. update 2010. CA Cancer J Clin 2010;60:70–98.
6. Sands K, Vineyard G, Platt R. Surgical site infections occur- 12. Pang X, Zhu Z, Xu F, et al. Evaluation of control measures
ring after hospital discharge. J Infect Dis 1996;173:963–970. implemented in the severe acute respiratory syndrome out-
7. Andrade L, Caraveo-Anduaga JJ, Berglund P, et al. The epide- break in Beijing, 2003. JAMA 2003;290:3215–3221.
miology of major depressive episodes: results from the Inter- 13. Center MM, Jemal A, Smith RA, et al. Worldwide variations
national Consortium of Psychiatric Epidemiology (ICPE) in colorectal cancer. CA Cancer J Clin 2009;59:366–378.
surveys. Int J Methods Psychiatr Res 2003;12:3–21. 14. Ege MJ, Mayer M, Normand AC, et al. Exposure to environmen-
8. Yawn BP, Saddier P, Wollan PC, et al. A population-based tal microorganisms in childhood asthma. N Engl J Med 2011;
study of the incidence and complication rates of herpes zoster 364:701–709.
Chapter 3
Abnormality
. . . the medical meaning of “normal” has been lost in the shuffle of statistics.
—Alvan Feinstein
1977
31
32 Clinical Epidemiology: The Essentials
VALIDITY
(Accuracy)
High Low
A B
High
Frequency
RELIABILITY
(Precision)
C D
Low
Measurement
Figure 3.1 ■ Validity and reliability. A. High validity and high reliability. B. Low
validity and high reliability. C. High validity and low reliability. D. Low validity and
low reliability. The white lines represent the true values.
Example 40
400
300
Number of VPDs/hr Day 1
200
Day 2
100
Day 3
0
Noon 6 P.M. Midnight 6 A.M.
60 70 80 90 100 110
Diastolic blood pressure (mm Hg)
Figure 3.4 ■ Sources of variation in the measurement of diastolic (phase V) blood
pressure. The dashed line indicates the true blood pressure. Multiple sources of variation,
including within and among patients as well as intra- and inter-observer variation, all con-
tribute to blood pressure measurement results.
On the other hand, biased results are systemati- blood neutrophil counts in normal black men and
cally different from the true value, no matter how women.
many times they are repeated. For example, all of
the high values for VPDs shown in Figure 3.3 were Describing Distributions
recorded on the first day, and most of the low values
on the third day. The days were biased estimates of Presenting interval data as a frequency distribution
each other because of variation in VPD rate from day conveys the information in relatively fine detail, but
to day. it is often convenient to summarize distributions.
Indeed, summarization is imperative when a large
number of distributions are presented and compared.
DISTRIBUTIONS Two basic properties of distributions are used to
summarize them: central tendency, the middle of
Data that are measured on interval scales are often pre- the distribution, and dispersion, how spread out the
sented as a figure, called a frequency distribution, values are. Several ways of expressing central tendency
showing the number (or proportion) of a defined and dispersion, along with their advantages and dis-
group of people possessing the different values of the advantages, are illustrated in Figure 3.5 and summa-
measurement. Figure 3.5 shows the distribution of rized in Table 3.4.
Chapter 3: Abnormality 39
Mode
Actual Distributions
30
Median
Mean Distributions of clinical phenomena have many differ-
ent shapes. The frequency distributions of four com-
mon blood tests (for potassium, alkaline phosphatase,
glucose, and hemoglobin) are shown in Figure 3.6.
20
In general, most of the values appear near the middle
Percent
Table 3.4
Expressions of Central Tendency and Dispersion
2
a
∑(X − X ) –
, where X = each observation; X = mean of all observations; and N = number of observations.
N − 1
40 Clinical Epidemiology: The Essentials
30
20
Serum Alkaline
potassium 20 phosphatase
10
10
mEq/L Units
30 40
Plasma 30
20 glucose Hemoglobin
20
10
10
mg/100 mL g/100 mL
Figure 3.6 ■ Actual clinical distributions. (Data from Martin HF, Gudzinowicz BJ,
Fanger H. Normal Values in Clinical Chemistry. New York: Marcel Dekker; 1975.)
Standard deviations –3 –2 –1 0 +1 +2 +3
Percent of 68.26
area under
the curve 95.44
99.72
1 standard deviation of the mean, and about 95%, Another reason why normals and abnormals are
within 2 standard deviations. not seen as separate distributions is that even when
Although clinical distributions often resemble a people with and without a disease have substantially
normal distribution the resemblance is superficial. As different frequency distributions, the distributions
summarized in a perspective, “The experimental fact almost always overlap. When the two distributions
is that for most physiologic variables the distribution are mixed together, as they are in naturally occurring
is smooth, unimodal, and skewed, and that mean ±2 populations, the abnormals are usually not seen as
standard deviations does not cut off the desired 95%. separate because they comprise such a small propor-
We have no mathematical, statistical, or other theo- tion of the whole. The curve for people with disease
rems that enable us to predict the shape of the distri- is “swallowed up” by the larger curve for healthy, nor-
butions of physiologic measurements” (5). mal people.
The shapes of clinical distributions differ from
one another because many differences among people,
other than random variation, contribute to distribu-
tions of clinical measurements. Therefore, if distri- Example
butions of clinical measurements resemble normal
curves, it is largely by accident. Even so, it is often Phenylketonuria (PKU) is an inherited disease
assumed, as a matter of convenience (because means characterized by progressive mental retarda-
and standard deviations are relatively easy to calculate tion in childhood. A variety of mutant alleles
and manipulate mathematically) that clinical mea- coding for phenylalanine hydroxylase results
surements are “normally” distributed. in dysfunction of the enzyme and, with a nor-
mal diet, accumulation of phenylalanine that
results in symptoms. The genes are in principle
CRITERIA FOR ABNORMALITY either abnormal or not (Fig. 3.8A). The diagno-
sis, which becomes apparent in the first year of
It would be convenient if the frequency distributions life, is confirmed by persistently high phenyl-
of clinical measurements for normal and abnormal alanine levels (several times the usual range)
people were so different that these distributions and low tyrosine levels in the blood.
could be used to distinguish two distinct popula- It is common practice to screen newborns
tions. This is actually the case for some abnormal for PKU with a blood test for phenylalanine
genes. Sequences (genetic abnormalities coding) for a few days after birth, in time to treat before
the autosomal dominant condition familial adeno- there is irreversible damage. However, even
matous polyposis are either present or absent. People though the abnormal genes are present or ab-
with the abnormal gene develop hundreds of polyps sent (Fig. 3.8A), the test misclassifies some in-
in their colon, whereas people without the gene rarely fants because at that age there is an overlap in
have more than a few, but this is the exception that the distributions of serum phenylalanine con-
proves the rule. Far more often, the various genetic centrations in infants with and without PKU
abnormalities coding for the same disease produce a and because infants with PKU make up only a
range of expressions. Even the expression of a spe- small proportion of those screened, <1/10,000
cific genetic abnormality (such as substitution of a (Fig. 3.8B). Some newborns with PKU are in
single base pair) differs substantially from one person the normal range either because they have not
to another, presumably related to differences in the yet ingested enough phenylalanine-containing
rest of the genetic endowment as well as exposure to protein or because they have a combination
external causes of the disease. of alleles associated with mild disease. Some
Therefore, most distributions of clinical variables children who are not destined to develop PKU
are not easily divided into “normal” and “abnormal.” have relatively high levels—for example, be-
They are not inherently dichotomous and do not dis- cause their mothers have abnormal phenylala-
play sharp breaks or two peaks that characterize nor- nine metabolism. The test is set to be positive
mal and abnormal results. This is because disease is at the lower end of the overlap between nor-
usually acquired by degrees, so there is a smooth tran- mal and abnormal levels, to detect most infants
sition from low to high values with increasing degrees with the disease, even though only about 1 out
of dysfunction. Laboratory tests reflecting organ fail- of 10 infants with an abnormal screening test
ure, such as serum creatinine for kidney failure or turns out to have PKU.
ejection fraction for heart failure, behave in this way.
42 Clinical Epidemiology: The Essentials
Normal
make a direct statement about how infrequent a value
is without making assumptions about the shape of
the distribution from which it came.
Mutant
Despite this, the statistical definition of normality,
with the cutoff point at 2 standard deviations from
the mean, is most commonly used. However, it can
be ambiguous or misleading for several reasons:
0 2 4 6 8 10
■ If all values beyond an arbitrary statistical limit,
Blood phenylalanine (mg/dL)
say the 95th percentile, were considered abnor-
Figure 3.8 ■ Screening for phenylketonuria (PKU) in mal, then the frequency of all diseases would be
infants: dichotomous and overlapping distributions of the same (if one assumed the distribution was nor-
normal and abnormal. A. Alleles coding for phenylalanine
mal, 2.5% if we consider just the extreme high or
hydroxylase are either normal or mutant. B. The distributions
of blood phenylalanine levels in newborns with and without
low ends of the distribution). Yet, it is common
PKU overlap and are of greatly different magnitude. (The knowledge that diseases vary in frequency; diabe-
prevalence of PKU, which is <1/10,000, is exaggerated so tes and osteoarthritis are far more common than
that its distribution can be seen in the figure.) ovalocytosis and hairy cell leukemia.
■ There is no general relationship between the degree
of statistical unusualness and clinical disease. The
relationship is specific to the disease in question and
the setting. Thus, obesity is quite common in the
If there is no sharp dividing line between normal and
United States but uncommon in many developing
abnormal, and the clinician can choose where the line
countries. For some measurements, deviations from
is placed, what ground rules should be used to decide?
usual are associated with disease to an important
Three criteria have proven useful: being unusual, being
degree only at quite extreme values, well beyond the
sick, and being treatable. For a given measurement, the
95th or even the 99th percentile. Failure of organs
results of these approaches bear no necessary relation
such as the liver and kidneys becomes symptomatic
to one another, so what is considered abnormal by one
only when most of usual function is lost.
criterion might be normal by another.
■ Sometimes extreme values are actually beneficial.
For example, people with unusually low blood
Abnormal ⫽ Unusual
pressure are, on average, at lower risk of cardiovas-
Normal often refers to the frequently occurring or cular disease than people with more usual blood
usual condition. Whatever occurs often is considered pressures. People with unusually high bone density
normal, and what occurs infrequently is abnormal. are at lower than average risk of fractures.
Chapter 3: Abnormality 43
0
115 120 140 160 180 Abnormal ⫽ Treating the
Usual systolic blood pressure (mm Hg) Condition Leads to a Better
Figure 3.9 ■ Ischemic heart disease mortality for peo- Clinical Outcome
ple ages 40 to 49 years is related to systolic blood pres-
sure throughout the range of values occurring in most
It makes intuitive sense to define a clinical condition
people. There is no threshold between normal and abnormal. or finding as “abnormal” if treatment of it leads to
“Mortality” is presented as a multiple of the baseline rate. (Data a better outcome. This approach makes particularly
from Prospective Studies Collaboration. Age-specific relevance good sense for asymptomatic conditions. If a con-
of usual blood pressure to vascular mortality: a meta-analysis of dition is causing no trouble, and treatment makes
individual data for one million adults in 61 prospective studies. no difference, why try to treat it? However, even for
Lancet 2002;360:1903–1913.) symptomatic patients, it is sometimes difficult to
44 Clinical Epidemiology: The Essentials
A 20,000
Mortality rate / 100,000 person-years
16,000
12,000
8,000
4,000
<18.5 18.5–21.9 22.0–24.9 25.0–27.4 27.5–29.9 30.0–34.9 ≥35.0
2
Body mass index (kg/m )
B 40
Men with functional decline (%)
30
20
10
0
<18.5 18.5–21.9 22.0–24.9 25.0–27.4 27.5–29.9 30.0–34.9 ≥35.0
Body mass index (kg/m2)
Figure 3.10 ■ Abnormal as associated with disease and other patient outcomes. The re-
lationship between body mass index and (A) total mortality and (B) functional decline in men age
65 and older on Medicare. Body mass index is weight in kilograms divided by height in meters
squared. Mortality rates are adjusted for age and smoking. (Redrawn with permission from Wee
CC, Huskey KW, Ngo LH, et al. Obesity, race and risk for death or functional decline among Medi-
care beneficiaries. A cohort study. Ann Intern Med 2011;154:645–655.)
Chapter 3: Abnormality 45
distinguish between clinical findings that will and What is considered treatable changes with time. At
will not improve with treatment. Modern technology, their best, therapeutic decisions are grounded on evi-
especially newer imaging techniques, are now able to dence from well-conducted clinical trials (Chapter 9).
detect abnormalities in patients so well that it is not As new knowledge is acquired from the results of
always clear what is found is related to the patient’s clinical trials, the level at which treatment is consid-
complaint. The result is an increasingly common ered useful may change.
dilemma for both patients and clinicians.
Example Example
Folic acid, a vitamin that occurs mainly in green
Magnetic resonance imaging (MRI) of the knee
leafy vegetables, was discovered early in the
is frequently performed in middle-aged and
20th century. Diets low in folic acid were found
elderly patients presenting with knee pain; if a
to cause a vitamin deficiency syndrome with
meniscal tear is found, the symptoms are usual-
anemia. For many years, adequate intake of
ly attributed to the tear and arthroscopic men-
folic acid was defined in the “recommended
iscectomy may be performed. Nevertheless, the
dietary allowance” as the amount necessary to
prevalence of meniscal tears and their relation-
prevent anemia. More recently, however, rea-
ship to knee symptoms is not clear, especially in
sons for a new and higher level of “normal”
older people. A study was undertaken in which
intake have emerged. Low folic acid levels in
991 randomly selected middle-aged and elderly
early pregnancy were linked to neural tube de-
adults without prior knee surgery underwent
fects in infants, even though the folic acid levels
MRI scans of their knees and were asked (in-
were well above those needed to prevent ane-
dependently) about knee-joint symptoms (8).
mia. Subsequent studies showed that folic acid
Among people with radiographic evidence
supplementation in high-risk women prevent-
of osteoarthritis, 63% of those with frequent
ed about three-quarters of the malformations.
knee pain, aching, or stiffness had at least
Thus, a higher level for “normal” folic acid
one meniscal tear show on the MRI, a result
intake during childbearing years has emerged
that might be suspected. Surprisingly, the re-
because of new information about the levels
sult was almost the same (60%) among people
needed to prevent disease. The optimal level
without these symptoms. Among participants
is at least two times (and some suggest eight
without evidence of osteoarthritis, the com-
times) higher than the older criterion for nor-
parable percentages were 32% and 23%. The
mal intake. As a practical matter, these levels
authors concluded that meniscal tears are com-
of intake are feasible in most people only by
mon irrespective of knee symptoms, and often
vitamin supplements, not by diet alone.
accompany knee osteoarthritis. Ironically, sur-
gical treatment of meniscal repairs increases
the risk of osteoarthritis, raising the stakes for
determining if a meniscal tear is indeed the
cause of a patient’s knee pain. REGRESSION TO THE MEAN
When clinicians encounter an unexpectedly abnor-
mal test result, they tend to repeat the test. Often, the
Another reason to limit the definition of “abnor- second test result is closer to normal. Why does this
mal” to a condition that is treatable is that not every happen? Should it be reassuring?
condition conferring an increased risk can be success- Patients selected because they represent an extreme
fully treated: The removal of the condition may not value in a distribution can be expected, on average, to
remove risk, either because the condition itself is not have less extreme values on subsequent measurements.
a cause of disease but is only related to a cause or This phenomenon, called regression to the mean,
because irreversible damage has already occurred. To occurs for purely statistical reasons, not because the
label people abnormal can cause worry and a sense of patients have necessarily improved.
vulnerability that may not be justified for some health Regression to the mean arises in the following
problems if treatment cannot improve the outlook. way (Fig. 3.11): People are selected for inclusion in
46 Clinical Epidemiology: The Essentials
Review Questions
For each of the numbered clinical scenarios C. Nominal
below (3.1–3.5), select from the lettered options D. Ordinal
the most appropriate term for the type of data. E. Interval—Discrete
3.1. Deep tendon reflex grade 0 (no response), 3.3. Serum sodium 139 mg/dL.
1+ (somewhat diminished), 2+ (normal),
A. Interval—Continuous
3+ (brisker than average), and 4+ (very brisk).
B. Dichotomous
A. Interval—Continuous C. Nominal
B. Dichotomous D. Ordinal
C. Nominal E. Interval—Discrete
D. Ordinal
E. Interval—Discrete 3.4. Three seizures per month.
A. Interval—Continuous
3.2. Cancer recurrent/not recurrent 5 years after
B. Dichotomous
initial treatment.
C. Nominal
A. Interval—Continuous D. Ordinal
B. Dichotomous E. Interval—Discrete
Chapter 3: Abnormality 47
3.5. Causes of upper gastrointestinal bleeding: 3.8. “Abnormal” is commonly defined by all of
duodenal ulcer, gastritis, esophageal, or the following except:
other varices.
A. The level at which treatment has been
A. Interval—Continuous shown to be effective
B. Dichotomous B. The level at which death rate is increased
C. Nominal C. Statistically unusual values
D. Ordinal D. Values that do not correspond to a normal
E. Interval—Discrete distribution
E. The level at which there is an increased
risk of symptoms
For questions 3.6–3.10, choose the best answer.
3.9. All of the following statements are true except:
3.6. When it is not possible to verify measure-
ment of a phenomenon, such as itching, by A. The normal distribution describes the
the physical senses, which of the following distribution of most naturally occurring
can be said about its validity? phenomena.
B. The normal distribution includes 2.5%
A. It is questionable, and one should rely of people in each tail of the distribution
on “hard” measures such as laboratory (beyond 2 standard deviations from the
tests. mean).
B. It can be established by showing that C. The normal distribution is unimodal and
the same value is obtained when the symmetrical.
measurement is repeated by many D. The normal distribution is the most
different observers at different times. common basis for defining abnormal
C. It can be supported by showing that laboratory tests measured on interval scales.
the measurement is related to other
measures of phenomena such as the
3.10. You see a new patient who is a 71-year-old
presence of diseases that are known to
woman on no medicines and without history
cause itching.
of heart disease in herself or her family. She
D. It can be established by showing that
has never smoked and is not diabetic. Her
measurement results in a broad range
blood pressure is 115/75 mm Hg, she is
of values.
about 15 pounds overweight. A total choles-
E. It cannot be established.
terol test done 2 days ago was high at as 250
mg/dL and the HDL cholesterol was 59 mg/
3.7. A physician or nurse measures a patient’s
dL. The Framingham risk calculator estimates
heart rate by feeling the pulse for 10 seconds
that the patient’s risk of developing general
each time she comes to clinic. The rates
cardiovascular disease in the next 10 years
might differ from visit to visit because of all
is 9%. You know that treating cholesterol at
the following except:
the level found reduces cardiovascular risk.
A. The patient has a different pulse rate at The patient wants to know if she should start
different times. taking a statin. Which of the following state-
B. The measurement may misrepresent the ments is least correct?
true pulse by chance because of the brief
A. The patient is likely to have a lower
period of observation.
serum cholesterol the next time it is
C. The physician and nurse use different
measured.
techniques (e.g., different degrees of
B. The estimation of a 9% probability of
pressure on the pulse).
cardiovascular disease in the next 10 years
D. The pulse rate varies among patients.
could be influenced by chance.
E. An effective treatment was begun between
C. The patient should be given a prescription
visits.
for a statin to lower her risk of coronary
heart disease.
48 Clinical Epidemiology: The Essentials
10 A
15
Monitored fetal
Proportion of infants (%)
8 heart rate
10 130–150
Number of observations
4
0
2 B
5 Monitored fetal
heart rate <130
0
0 2,000 4,000 6,000 8,000
Birthweight (g) 0
3.11. Which statement about the central tendency Figure 3.13 ■ Observer variability. Comparing fetal
is incorrect? heart auscultation to electronic monitoring of fetal heart rate.
(Redrawn with permission from Day E, Maddem L, Wood C.
A. The mean birthweight is below 4,000 g. Auscultation of foetal heart rate: an assessment of its error
B. There is more than one mode birth and significance. Br Med J 1968;4:422–424.)
weight.
C. Mean and median birth weights are
Figure 3.13 compares fetal heart rates measured
similar.
by electronic monitoring (white middle bar),
to measurements by hospital staff in three
3.12. Which statement about dispersion is most
different circumstances: when fetal heart rate
correct?
of beats per minute by electronic monitoring
A. The range is the best way to describe the was normal (130–150), low (⬍130), and high
babies’ birth weights. (⬎150). Questions 3.14–3.16 relate to the figure.
B. Standard deviations should not be For each question, choose the best answer.
calculated because the distribution is
skewed. 3.14. The distribution of hospital staff measurements
C. Ninety-five percent of the birth weights around the electronic monitor measurement in
will fall within about 2 standard Panel A of Figure 3.13 could be due to:
deviations of the mean.
A. Chance
B. Inter-observer variability
3.13. One standard deviation of babies’ birth
C. Biased preference for normal results
weights encompasses approximately:
D. A and B
A. Weights from 2,000 to 4,000 g E. A and C
B. Weights from 3,000 to 4,000 g F. B and C
C. Weights from 2,000 to 6,000 g G. A, B, and C
Chapter 3: Abnormality 49
3.15. The distribution of hospital staff measure- 3.16. The distribution of hospital staff measurements
ments around the electronic monitor mea- around the electronic monitor measurement in
surement in Panel B of Figure 3.13 could be Panel C of Figure 3.13 could be due to:
due to:
A. Chance
A. Chance B. Inter-observer variability
B. Inter-observer variability C. Biased preference for normal results
C. Biased preference for normal results D. A and B
D. A and B E. A and C
E. A and C F. B and C
F. B and C G. A, B, and C
G. A, B, and C
Answers are in Appendix A.
REFERENCES
1. Sharma P, Parekh A, Uhl K. An innovative approach to deter- 6. Prospective Studies Collaboration. Age-specific relevance of
mine fetal risk: the FDA Office of Women’s Health pregnancy usual blood pressure to vascular mortality: a meta-analysis of
exposure registry web listing. Womens Health Issues 2008;18: individual data for one million adults in 61 prospective studies.
226–228. Lancet 2002;360:1903–1913.
2. Feinstein AR. The need for humanized science in evaluating 7. Wee CC, Huskey KW, Ngo LH, et al. Obesity, race and risk
medication. Lancet 1972;2:421–423. for death or functional decline among Medicare beneficiaries.
3. Rubenfeld GD, Caldwell E, Granton J, et al. Interobserver vari- A cohort study. Ann Intern Med 2011;154:645–655.
ability in applying a radiographic definition for ARDS. Chest 8. Englund M, Guermazi A, Gale D. Incidental meniscal findings
1999;116:1347–1353. on knee MRI in middle-aged and elderly persons. N Engl J
4. Morganroth J, Michelson EL, Horowitz LN, et al. Limitations Med 2008;359:1108–1115.
of routine long-term electrocardiographic monitoring to assess 9. Lazo M, Selvin E, Clark JM. Brief communication: clinical
ventricular ectopic frequency. Circulation 1978;58:408–414. implications of short-term variability in liver function test
5. Elveback LR, Guillier CL, Keating FR. Health, normality, and results. Ann Intern Med 2008;148:348–352.
the ghost of Gauss. JAMA 1970;211:69–75.
Chapter 4
50
Chapter 4: Risk: Basic Principles 51
Common Exposure to
Risk Factors
Low Incidence of Disease
Many risk factors, such as cigarette smoking or eat-
ing a diet high in sugar, salt, and fat, have become The incidence of most diseases, even ones thought
so common in Western societies that for many years to be “common,” is actually uncommon. Thus,
their dangers went unrecognized. Only by compar- although lung cancer is the most common cause of
ing patterns of disease among people with and with- cancer deaths in Americans, and people who smoke
out these risk factors, using cross-national studies are as much as 20 times more likely to develop
or investigating special subgroups—Mormons, for lung cancer than those that do not smoke, the
example, who do not smoke, or vegetarians who eat yearly incidence of lung cancer in people who have
diets low in cholesterol—were risks recognized that smoked heavily for 30 years, is 2 to 3 per 1,000.
were, in fact, large. It is now clear that about half In the average physician’s practice, years may pass
of lifetime users of tobacco will die because of their between new cases of lung cancer. It is difficult for
habit; if current smoking patterns persist, it is pre- the average clinician to draw conclusions about
dicted that in the 21st century, more than 1 billion risks from such infrequent events.
deaths globally will be attributed to smoking (1).
A relationship between the sleeping position of Small Risk
babies and the occurrence of sudden infant death
syndrome (SIDS) is another example of a common The effects of many risk factors for chronic disease
exposure to a risk factor and the dramatic effect asso- are small. To detect a small risk, a large number of
ciated with its frequency, an association that went people must be studied to observe a difference in dis-
unrecognized until relatively recently. ease rates between exposed and unexposed persons.
For example, drinking alcohol has been known to
increase the risk of breast cancer, but it was less clear
whether low levels of consumption, such as drinking
just one glass of wine or its equivalent a day, con-
Example ferred risk. A study of 2,400,000 women-years was
needed to find that women who averaged a glass a
SIDS, the sudden, unexplained death of an in- day increased their risk of developing breast cancer
fant younger than 1 year of age, is a leading 15% (4). Because of the large numbers of woman-
cause of infant mortality. Studies suggest that years in the study, chance is an unlikely explanation
there are many contributing factors. In the for the result, but even so, such a small effect could
late 1980s and 1990s, several investigations be due to bias. In contrast, it is not controversial that
found that babies who were placed face down hepatitis B infection is a risk factor for hepatoma,
in their cribs were three to nine times more because people with certain types of serologic evi-
likely to die of SIDS than those placed on their dence of hepatitis B infection are up to 60 times (not
backs. In 1992, the American Academy of Pe- just 1.15 times) more likely to develop liver cancer
diatrics issued a recommendation to place in- than those without it.
fants on their backs to sleep but indicated that
side positioning for sleep was an acceptable Multiple Causes and
alternative. The percentage of babies placed Multiple Effects
in the prone position for sleep fell from 70%
in 1992 to 24% in 1996, with a concomitant There is usually not a close, one-to-one relation-
ship between a risk factor and a particular disease.
Chapter 4: Risk: Basic Principles 53
disease occurs quickly after an unusual exposure, but risk factor for a disease improves the ability to predict
most diseases and most exposures do not conform to disease, that is, improves risk stratification.
such a pattern. For accurate information about risk,
clinicians must turn to the medical literature, partic-
ularly to carefully constructed studies that involve a
large number of patients.
Example
CVD is the most common cause of death glob-
PREDICTING RISK
ally. In the United States, despite decreasing in-
A single powerful risk factor, as in the case of hepa- cidence, CVD accounts for a third of all deaths.
titis B and hepatocellular cancer, can be very help- Over the past 50 years, several risk factors, in-
ful clinically, but most risk factors are not strong. cluding hypertension, hypercholesterolemia,
If drinking a glass of wine a day increases the risk cigarette smoking, diabetes mellitus, marked
of breast cancer by 15%, and the average 10-year obesity, physical inactivity, and family history
risk of developing breast cancer for women in their have been identified. When used in risk pre-
40s is 1 in 69 (or 1.45%), having a glass of wine diction tools, these risk factors help to predict
with dinner would increase the 10-year risk to risk of CVD. In clinical practice, risk predic-
about 1 in 60 (or 1.67%). Some women might not tion is most useful when people are stratified
think such a difference in the risk of breast cancer into either high or low risk—so that patients
is meaningful. and doctors feel comfortable either watching
or intervening. It is less clear what to do with
people calculated by a risk prediction tool to
Combining Multiple Risk be at moderate risk of CVD. Efforts have been
Factors to Predict Risk made to improve risk prediction tools for CVD
Because most chronic diseases are caused by several by adding risk factors that reclassifies persons
relatively weak risk factors acting together, statisti- initially classified at moderate risk to either
cally combining their effects can produce a more high- or low-risk categories. One such risk fac-
powerful prediction of risk than considering one risk tor is a plasma marker for inflammation—C-
factor at a time. Statistically combining risk factors reactive protein (CRP). A synthesis of studies
produces a risk prediction model or a risk predic- of CRP found that adding it to risk prediction
tion tool (also sometimes called a clinical predic- models improves risk stratification for CVD (7).
tion tool or a risk assessment tool). Risk prediction Figure 4.2 illustrates how adding the results of
tools are increasingly common in clinical medicine; the test for CRP improved the risk stratification
well-known models used for long-term predictions of a risk prediction tool for CVD in women by
include the Framingham Risk Score for predict- reclassifying women into higher or lower risk
ing cardiovascular events and the National Cancer categories that more accurately matched rates
Institute’s Breast Cancer Risk Assessment Tool for of CVD over a 10-year period (8).
predicting breast cancer occurrence. Shorter-term
hospital risk prediction tools include the Patient At Even if a risk factor improves a risk prediction
Risk of Re-admission Scores (PARR) and the Criti- model, a clinical trial is necessary that demonstrates
cal Care Early Warning Scores. Prediction tools have lowering or removing the risk factor protects patients.
also combined diagnostic test results, for example, In the case of CRP, such a trial has not yet been
to diagnose acute myocardial infarction when a reported, so it is possible that it is a marker rather than
patient presents with chest pain, or for diagnosing a causal factor for CVD.
the occurrence of pulmonary embolism. The statisti-
cal methods used to combine multiple risk factors are Risk Prediction in Individual
discussed in Chapter 11.
Risk prediction models help with two important
Patients and Groups
clinical activities. First, a good risk prediction model Risk prediction tools are often used to predict the
aids risk stratification, dividing groups of people into future for individuals, with the hope that each per-
subgroups with different risk levels (e.g., low, medium, son will know his or her risks, a hope summarized
and high). Using the risk stratification approach can by the term “personalized medicine.” As an exam-
also help determine whether adding a newly proposed ple, Table 4.1 summarizes the information used in
Chapter 4: Risk: Basic Principles 55
99 100 100
100 98 With CRP
80
72
60
53
40
35
20
9
0
Cardiovascular risk <5 5 to <10 10 to <20 ≥20
over 10 years (%)
Number of woman 6,965 633 248 65
Figure 4.2 ■ Effect of adding a new risk factor to a risk predic-
tion model. Comparison of risk prediction models for CVD over 10 years
among 7,911 non-diabetic women, with and without CRP as a risk fac-
tor. Adding CRP into the risk model improved risk stratification of the
women, especially to strata at higher risk by the model without CRP. (Data
from Ridker PM, Buring JE, Rifal N et al. Development and validation of
improved algorithms for the assessment of global cardiovascular risk in
women. JAMA 2007;297:611–619.)
compared with non-smokers. Even so, the smoker score than the non-diseased individual, the c-statistic
has about a 1 in 10 chance of developing lung cancer would be 1.0. In one study assessing discrimination
in the next 10 years. Most risk factors (and risk pre- of the NCI breast cancer risk tool, the c-statistic
diction tools) for most diseases are much weaker than was calculated as 0.58 (9). It is clear that this is not
the risk of lung cancer with smoking. a high c-statistic, but just what the meaning of val-
ues between 0.5 and 1.0 is difficult to understand
EVALUATING RISK clinically.
PREDICTION TOOLS The clearest (although rarest) method to under-
stand how well a risk prediction model discriminates
Determining how well a particular risk prediction is to compare visually the predictions for individuals
tool works is done by asking two questions: (i) how to the observed results for all individuals in the study.
accurately does the tool predict the proportion of dif- Figure 4.3A illustrates perfect discrimination by a
ferent groups of people who will develop the disease hypothetical risk prediction tool; the tool completely
(calibration), and (ii) how accurately does it identify separates people destined to develop disease from
individuals who will and will not develop the disease those destined not to develop disease. Figure 4.3B
(discrimination)? To answer these questions, the tool illustrates the ability of the NCI breast cancer risk
is tested on a large group of people who have been prediction tool to discriminate between women who
followed for several years (sometimes, decades) with subsequently did and did not develop breast can-
known outcomes of disease for each person in the cer over a 5-year period and visually shows what a
group. c-statistic of 0.58 means. Although the average risk
scores are slightly higher for the women who devel-
Calibration oped breast cancer, and the their curve on the graph
is slightly to the right of those who did not develop
Calibration, determining how well a prediction tool
breast cancer, the individual risk prediction scores of
correctly predicts the proportion of a group who
the two groups overlap substantially; there is no place
will develop disease, is conceptually and operation-
along the x-axis of risk that separates women into
ally simple. It is measured by comparing the number
groups who did and did not develop breast cancer.
of people in a group predicted or estimated (E) by
This is so even though the calibration of the model
the prediction tool to develop disease to the num-
was very good.
ber who are observed (O) to develop the disease.
Ratios of E/O close to 1.0 mean the risk tool is well
calibrated—it predicts a proportion of people that Sensitivity and Specificity of a
is very close to the actual proportion that develops Risk Prediction Tool
the disease. Evaluations of the NCI breast cancer risk Yet another way to assess a risk prediction tool’s ability
assessment tool have found it is highly accurate in to distinguish who will and will not develop disease is
predicting the proportion of women in a group who to determine its sensitivity and specificity (a topic that
will develop breast cancer in the next 5 years, with will be discussed more thoroughly in Chapters 8 and
E/O ratios close to 1.0. 10). Sensitivity of a risk prediction tool is the ability
of the tool to identify those individuals destined to
Discrimination develop a disease and is expressed as the percentage of
Discriminating among individuals in a group who people who the tool correctly identifies will develop
will and will not develop disease is difficult, even for the disease. A tool’s specificity is the ability to iden-
well-calibrated risk tools. The most common method tify individuals who will not develop the disease,
used to measure discrimination accuracy is to cal- expressed as percentage of people the tool correctly
culate a concordance statistic (often shortened to identifies who will not develop the disease. Looking
c-statistic). It estimates how often in pairs of ran- at Figure 4.3, a 5-year risk of 1.67% was chosen as a
domly selected individuals, one of whom went on to cut point between “low” and “high” risk. Using that
develop the disease of interest and one of whom did cut point, the sensitivity was estimated as 44% (44%
not, the risk prediction score was higher for the one of women who developed breast cancer had a risk
who developed disease. If the risk prediction tool did score ≥1.67%) and specificity was estimated as 66%
not improve prediction at all, the resulting estimate (66% of women who did not develop breast cancer
would be like a coin toss and the c-statistic would be had a risk score <1.67%). In other words, the risk
0.50. If the risk prediction tool worked perfectly, so prediction tool missed more than half the women
that in every pair the diseased individual had a higher who developed breast cancer over a 5-year period,
Chapter 4: Risk: Basic Principles 57
0.15
Risk Stratification
As already mentioned, and as shown in Figure 4.2,
0.10 risk stratification can be used to assess how well a
risk prediction tool works and to determine whether
adding a new risk factor improves the tool’s ability
0.05
to classify people correctly into clinically meaning-
ful risk groups. Better risk stratification improves a
0.00 tool’s calibration. Risk stratification may not dra-
Low risk x High risk
matically affect the tool’s discrimination ability.
5-year risk of breast cancer diagnosis
For example, examining Figure 4.2, the risk tool
that included CRP correctly assigned 99% of 6,965
B 0.25
women to the lowest risk stratum (<5% CVD events
over 10 years). The study found that CVD events
Each group of women (%)
Review Questions
For questions 4.1–4.10, select the best answer. 4.4. Figure 4.2 shows:
A. The risk model incorporating CRP
4.1. In the mid-20th century, chest surgeons
results assigned too many women to the
in Britain were impressed that they were
intermediate risk strata.
operating on more men with lung cancer,
B. The risk model incorporating CRP results
most of whom were smoking. How might the
predicts which individual women will
surgeons’ impression that smoking was a risk
develop CVD better than the risk model
factor for developing lung cancer have been
without CRP results.
wrong?
C. The number of women developing CVD
A. Smoking had become so common that over 10 years is likely highest in the group
more men would have a history of with a risk of <5%.
smoking, regardless of whether they
were undergoing operations for lung 4.5. A risk model for colon cancer estimates that
cancer. one of your patients has a 2% chance of devel-
B. Lung cancer is an uncommon cancer, oping colorectal cancer in the next 5 years. In
even among smokers. explaining this to your patient, which of the
C. Smoking confers a low risk of lung following statements is most correct?
cancer.
A. Because colorectal cancer is the second
D. There are other risk factors for lung
most common non-skin cancer in men,
cancer.
he should be concerned about it.
B. The model shows that your patient will
4.2. Risk factors are easier to recognize:
not develop colorectal cancer in the next
A. When exposure to a risk factor occurs a 5 years.
long time before the disease. C. The model shows that your patient is a
B. When exposure to the risk factor is member of a group of people in whom a
associated with a new disease. very small number will develop colorectal
C. When the risk factor is a marker rather cancer in the next 5 years.
than a cause of disease.
4.6. In general, risk prediction tools are best at:
4.3. Risk prediction models are useful for:
A. Predicting future disease in a given patient.
A. Predicting onset of disease B. Predicting future disease in a group of
B. Diagnosing disease patients.
C. Predicting prognosis C. Predicting which individuals will and will
D. All of the above not develop disease.
60 Clinical Epidemiology: The Essentials
4.7. When a risk factor is a marker for future disease: C. Most women developing breast cancer
over 5 years are at higher risk.
A. The risk factor can help identify people at
D. The risk model does not discriminate
increased risk of developing the disease.
very well.
B. Removing the risk factor can help prevent
the disease.
4.10. It is difficult for risk models to determine
C. The risk factor is not confounding a true
which individuals will and will not develop
causal relationship.
disease for all of the following reasons
4.8. A risk factor is generally least useful in:
except:
REFERENCES
1. Vineis P, Alavanja M, Buffler P, et al. Tobacco and cancer: recent cardiovascular disease in women. N Engl J Med 2000;342:
epidemiological evidence. J Natl Cancer Inst 2004;96:99–106. 836–843.
2. Willinger M, Hoffman HJ, Wu KT, et al. Factors associated 9. Rockhill B, Speigelman D, Byrne C, et al. Validation of the
with the transition to nonprone sleep positions of infants in Gail et al. model of breast cancer risk prediction and implica-
the United States: The National Infant Sleep Position Study. tions for chemoprevention. J Natl Cancer Inst 2001;93:358–
JAMA 1998;280:329–335. 366.
3. Li DK, Petitti DB, Willinger M, et al. Infant sleeping position 10. Wald NJ, Hackshaw AK, Frost CD. When can a risk fac-
and the risk of sudden infant death syndrome in California, tor be used as a worthwhile screening test? BMJ 1999;319:
1997–2000. Am J Epidemiol 2003;157:446–455. 1562–1565.
4. Chen WY, Rosner B, Hankinson SE, et al. Moderate alcohol 11. Rose G. Sick individuals and sick populations. Int J Epidemiol
consumption during adult life, drinking patterns and breast 1985;14:32–38.
cancer risk. JAMA 2011;306:1884–1890. 12. Han JH, Lindsell CJ, Storrow AB, et al. The role of cardiac
5. Humphrey LL, Rongwei F, Rogers K, et al. Homocysteine risk factor burden in diagnosing acute coronary syndromes in
level and coronary heart disease incidence: a systematic review the emergency department setting. Ann Emerg Med 2007;49:
and meta-analysis. Mayo Clin Proc 2008;83:1203–1212. 145–152.
6. Martí-Carvajal AJ, Solà I, Lathyris D, et al. Homocysteine 13. Panju AA, Hemmelgarn BR, Guyatt GH, et al. Is this patient
lowering interventions for preventing cardiovascular events. having a myocardial infarction? The rational clinical examina-
Cochrane Database Syst Rev 2009;4:CD006612. tion. JAMA 1998;280:1256–1263.
7. Buckley DI, Fu R, Freeman M, et al. C-reactive protein as a 14. Piccart-Gebhart MJ, Procter M, Leyland-Jones B, et al.
risk factor for coronary heart disease: a systematic review and Trastuzumab after adjuvant chemotherapy in HER2-positive
meta-analyses for the U.S. Preventive Services Task Force. Ann breast cancer. N Engl J Med 2005;353:1659–1672.
Intern Med 2009;151:483–495. 15. Romond EH, Perez EA, Bryant J, et al. Trastuzumab plus
8. Ridker PM, Hennekens CH, Buring JE, et al. C-reactive pro- adjuvant chemotherapy for operable HER2-positive breast
tein and other markers of inflammation in the prediction of cancer. N Engl J Med 2005;353:1673–1684.
Chapter 5
to impose possible risk factors on a group of healthy 1. They do not have the disease (or outcome) in
people for the purposes of scientific research. Second, question at the time they are assembled.
most people would balk at having their diets and 2. They should be observed over a meaningful
behaviors constrained by others for long periods of period of time in the natural history of the dis-
time. Finally, the experiment would have to go on ease in question so that there will be sufficient
for many years, which is difficult and expensive. As time for the risk to be expressed. For example,
a result, it is usually necessary to study risk in less if one wanted to learn whether neck irradiation
obtrusive ways. during childhood results in thyroid neoplasms,
Clinical studies in which the researcher gath- a 5-year follow-up would not be a fair test of
ers data by simply observing events as they happen, this hypothesis, because the usual time period
without playing an active part in what takes place, between radiation exposure and the onset of dis-
are called observational studies. Most studies of ease is considerably longer.
risk are observational studies and are either cohort 3. All members of the cohort should be observed
studies, described in the rest of this chapter, or case- over the full period of follow-up or methods
control studies, described in Chapter 6. must be used to account for dropouts. To the
extent that people drop out of the study and their
Cohorts reasons for dropping out are related in some way
to the outcome, the information provided by
As defined in Chapter 2, the term cohort is used
an incomplete cohort can misrepresent the true
to describe a group of people who have something
state of affairs.
in common when they are first assembled and
who are then observed for a period of time to see
what happens to them. Table 5.1 lists some of the
ways in which cohorts are used in clinical research. Cohort Studies
Whatever members of a cohort have in common,
The basic design of a cohort study is illustrated in
observations of them should fulfill three criteria if
Figure 5.1. A group of people (a cohort) is assembled,
the observations are to provide sound information
none of whom has experienced the outcome of
about risk of disease.
interest, but all of whom could experience it. (For
example, in a study of risk factors for endometrial
Table 5.1 cancer, each member of the cohort should have an
Cohorts and Their Purposes intact uterus.) Upon entry into the study, people in
the cohort are classified according to those character-
Characteristic To Assess istics (possible risk factors) that might be related to
in Common Effect of Example outcome. For each possible risk factor, members of
Age Age Life expectancy the cohort are classified either as exposed (i.e., pos-
for people age 70 sessing the factor in question, such as hypertension)
(regardless of birth date) or unexposed. All the members of the cohort are
Date of birth Calendar Tuberculosis rates for then observed over time to see which of them expe-
time people born in 1930 rience the outcome, say, cardiovascular disease, and
Exposure Risk factor Lung cancer in people the rates of the outcome events are compared in the
who smoke exposed and unexposed groups. It is then possible
to see whether potential risk factors are related to
Disease Prognosis Survival rate for patients
with brain cancer
subsequent outcome events. Other names for cohort
studies are incidence studies, which emphasize that
Therapeutic Treatment Improvement in survival patients are followed over time; prospective stud-
intervention for patients with
ies, which imply the forward direction in which the
Hodgkin lymphoma
given combination
patients are pursued; and longitudinal studies, which
chemotherapy call attention to the basic measure of new disease
events over time.
Preventive Prevention Reduction in incidence
The following is a description of a classic cohort
intervention of pneumonia after
pneumococcal
study that has made important contributions to our
vaccination understanding of cardiovascular disease risk factors
and to modern methods of conducting cohort studies.
Chapter 5: Risk: Exposure to Disease 63
Exposed
NO
COHORT Time
YES
Not exposed
NO
Figure 5.1 ■ Design of a cohort study of risk. Persons without disease are divided into two
groups—those exposed to a risk factor and those not exposed. Both groups are followed over time to
determine what proportion of each group develops disease.
Retrospective
(historical)
Cohort cohort
Follow-up
assembed
Prospective
Cohort cohort
Follow-up
assembed
Figure 5.2 ■ Retrospective and prospective cohort studies. Prospective cohorts are as-
sembled in the present and followed forward into the future. In contrast, retrospective cohorts
are made by going back into the past and assembling the cohort, for example, from medical
records, then following the group forward to the present.
must report vaccinations to the government in parison group, the investigators randomly
order to receive payment); 316 children were sampled a similar group of women not under-
diagnosed with autism, and another 422 with going the procedure and enriched the sample
autistic-spectrum disorders. The frequency of with women who subsequently developed
autism among children who had been vac- breast cancer. “Enrichment” was accomplished
cinated was similar (in fact, slightly less) to by knowing who among 666,800 eligible
that among children not receiving MMR vac- women developed breast cancer—through
cine. This, along with other studies, provided examination of the computerized database.
strong evidence against the suggestion that The investigators randomly sampled about 1%
MMR vaccine causes autism. Subsequently, of comparison women of a certain age who
the original study leading to alarm was inves- developed breast cancer,† but only about .01%
tigated for fraud and conflict of interest and of women who did not. Adjustments for the
was retracted by The Lancet in 2010 (5). sampling fractions were then made during the
analysis. The results showed that bilateral pro-
phylactic mastectomy was associated with a
99% reduction in breast cancer among women
Case-Cohort Studies at higher risk (6).
Another method using computerized medical data-
bases in cohort studies is the case-cohort design.
Conceptually, it is a modification of the retrospec-
tive cohort design that takes advantage of the abil- Advantages and Disadvantages
ity to determine the frequency of a given medical of Cohort Studies
condition in a large group of people. In a case-cohort
study, all exposed people in a cohort, but only a small Well-conducted cohort studies of risk, regardless
random sample of unexposed people are included in of type, are the best available substitutes for a true
the study and followed for some outcome of inter- experiment when experimentation is not possible.
est. For efficiency, the group of unexposed people is They follow the same logic as a clinical trial and
“enriched” with all those who subsequently suffer the allow measurement of exposure to a possible risk fac-
outcome of interest (i.e., become cases). The results tor while avoiding any possibility of bias that might
are then adjusted to reflect the sampling fractions occur if exposure were determined after the outcome
used to obtain the sample. This efficient approach to was already known. The most important scientific
a cohort study requires that frequencies of outcomes disadvantage of cohort studies (in fact, all observa-
be determined in the entire group of unexposed peo- tional studies) is that they are subject to a great many
ple; thus, the need for a large, computerized, medical more potential biases than are experiments. People
database. who are exposed to a certain risk factor in the natural
course of events are likely to differ in a great many
ways from a comparison group of people not exposed
to the factor. If some of these other differences are
also related to the disease in question, they could con-
Example found any association observed between the putative
risk factor and the disease.
Does prophylactic mastectomy protect women The uses, strengths, and limitations of the
who are at increased risk for breast cancer? A different types of cohort studies are summarized in
case-cohort study was done to examine this Table 5.2. Several of the advantages and disadvan-
question in six health maintenance organiza- tages apply regardless of type. However, the potential
tions, all of which had computerized data-
bases of diagnoses and surgical procedures on
their members. Investigators identified all 276 †
Strictly speaking, the study was a modification of a standard
women who underwent bilateral prophylactic case-cohort design that would have included all cases, not just
mastectomy over a number of years and fol- 1%, of 26,800 breast cancers that developed in the compari-
lowed them forward over time to determine son group of women not undergoing prophylactic mastectomy.
if they developed breast cancer. For the com- However, because breast cancer occurs commonly, a random
sample of the group sufficed.
66 Clinical Epidemiology: The Essentials
Table 5.2
Advantages and Disadvantages of Cohort Studies
Advantages Disadvantages
All Cohort Study Types
The only way of establishing incidence (i.e., absolute risk) Susceptible to confounding and other biases
directly
Follows the same logic as the clinical question: If persons
are exposed, do they get the disease?
Exposure can be elicited without the bias that might
occur if outcome were known before documentation of
exposure
Can assess the relationship between exposure and many
diseases
Prospective Cohort Studies
Can study a wide range of possible risk factors Inefficient because many more subjects must be enrolled
than experience the event of interest; therefore, cannot be
used for rare diseases
Can collect lifestyle and demographic data not available in Expensive because of resources necessary to study many
most medical records people over time
Can set up standardized ways of measuring exposure and Results not available for a long time
degree of exposure to risk factors
Assesses the relationship between disease and exposure to
only relatively few factors (i.e., those recorded at the outset
of the study)
Retrospective (Historical) Cohort Studies
More efficient than prospective cohort studies because data Range of possible risk factors that can be studied is
have already been collected for another purpose (i.e., during narrower than that possible with prospective cohort studies
patient care or for a registry)
Cheaper than prospective cohort studies because resources Cannot examine patient characteristics not available in the
not necessary to follow many people over time data set used
Faster than prospective cohort studies because patient Measurement of exposure and degree of exposure may not
outcomes have already occurred be standardized
Case-cohort Studies
All advantages of retrospective cohort studies apply All disadvantages of retrospective cohort studies apply
Even more efficient than retrospective cohort studies Difficult for readers to understand weighting procedures
because only a sample of unexposed group is analyzed used in the analysis
for difficulties with the quality of data is different between vaccination and autism, the data in histori-
for the three. In prospective studies, data can be col- cal cohort studies may not be of sufficient quality for
lected specifically for the purposes of the study and rigorous research.
with full anticipation of what is needed. It is thereby Prospective cohort studies can also collect data
possible to avoid measurement biases and some of on lifestyle and other characteristics that might
the confounders that might undermine the accuracy influence the results, and they can do so in standard
of the results. However, data for historical cohorts ways. Many of these characteristics are not routinely
are usually gathered for other purposes—often as available in retrospective and case-cohort studies,
part of medical records for patient care. Except for and those that are usually are not collected in stan-
carefully selected questions, such as the relationship dard ways.
Chapter 5: Risk: Exposure to Disease 67
Table 5.3
Measures of Effect
Table 5.4
Calculating Measures of Effect: Cigarette Smoking and Death from Lung Cancer in Mena
Simple Risks
Death rate (absolute risk or incidence) from lung cancer in 341.3/100,000/yr
smokers
Death rate (absolute risk or incidence) from lung cancer in 14.7/100,000/yr
non-smokers
Prevalence of cigarette smoking 32.1%
Lung cancer mortality rate in population 119.4/100,000/yr
Compared Risks
Attributable risk = 341.3/100,000/yr – 14.7/100,000/yr = 326.6/100,000/yr
Relative risk = 341.3/100,000/yr ÷ 14.7/100,000/yr = 23.2
Population-attributable risk = 326.6/100,000/yr × 0.321 = 104.8/100,000/yr
Population-attributable fraction = 104.8 /100,000/yr ÷ 119.4/100,000/yr = 0.88
a
Data from Thun MJ, Day-Lally CA, Calle EE, et al. Excess mortality among cigarette smokers: Changes in a 20-year interval. Am J Public Health
1995;85:1223–1230.
100,000 (3 to 4 lung cancer deaths per 1,000 ratios, discussed in Chapter 6) is the most commonly
smokers per year). reported result in studies of risk, not only because of
its computational convenience but also because it is
a common metric in studies with similar risk factors
Attributable Risk
but with different baseline incidence rates. Because
One might ask, “What is the additional risk (inci- relative risk indicates the strength of the association
dence) of disease following exposure, over and above between exposure and disease, it is a useful measure
that experienced by people who are not exposed?” of effect for studies of disease etiology.
The answer is expressed as attributable risk, the
absolute risk (or incidence) of disease in exposed
persons minus the absolute risk in non-exposed per- Interpreting Attributable and
sons. In Table 5.4, the attributable risk of lung cancer Relative Risk
death in smokers is calculated as 326.6 per 100,000 Although attributable and relative risk are calculated
per year. Attributable risk is the additional incidence from the same two components—the incidence (or
of disease related to exposure, taking into account absolute risk) of an outcome from an exposed and
the background incidence of disease from other unexposed group—the resulting size of the risk may
causes. Note that this way of comparing rates implies appear to be quite different depending on whether
that the risk factor is a cause and not just a marker. attributable or relative risk is used.
Because of the way it is calculated, attributable risk
is also called risk difference, the differences between
two absolute risks.
Table 5.5
Comparing Relative Risk and Attributable Risk in the Relationship of Bone Mineral
Density (BMD) T-scores, Fractures, and Age
A
15
10-year adjusted
13.1
10
Excess coronary
heart disease
8.2 attributable to
5 elevated blood
4.6 pressure
0
B
Population (%)
20 23
20
Prevalance of
13 elevated blood
10
pressure at
various levels
50
C
heart disease (%)
Excess coronary
in the population (population-attributable fraction). Paradoxically, then, physicians could save more lives
Note how important the prevalence of the risk factor with effective treatment of lower, rather than higher,
(smoking) is to these calculations. As smoking rates levels of hypertension. This fact, so counterintuitive
fall, the fraction of lung cancer due to smoking also to clinical thinking, has been termed “the prevention
falls. paradox” (8).
As discussed in Chapter 4, if a relatively weak risk Measures of population risk are less frequently
factor is very prevalent in a community, it could encountered in the clinical literature than are mea-
account for more disease than a very strong, but rare, sures of absolute, attributable, and relative risks, but
risk factor. Figure 5.3 illustrates this for hyperten- a particular clinical practice is as much a population
sion and the development of coronary heart disease. for the doctor as is a community for health policy-
Figure 5.3A shows the attributable (excess) risk of coro- makers. In addition, how the prevalence of exposure
nary heart disease according to various levels of hyper- affects community risk can be important in the care
tension among a group of about 2,500 men followed of individual patients. For instance, when patients
for 10 years. Risk increased with increasing blood cannot give a history or when exposure is difficult
pressure. However, few men had very high blood pres- for them to recognize, physicians depend on the
sure (Fig. 5.3B). As a result, the highest level of hyper- usual prevalence of exposure to estimate the likeli-
tension contributed only about a quarter of excess hood of various diseases. When considering treatable
coronary heart disease in the population (Fig. 5.3C). causes of cirrhosis in a North American patient, for
Chapter 5: Risk: Exposure to Disease 71
Randomization
This Other
Data source
study studies The best way to balance all extraneous variables
between groups is to randomly assign patients to
groups so that each patient has an equal chance of
falling into the exposed or unexposed group (see
Saturated fats Chapter 9). A special feature of randomization is that
Transfatty acids it balances not only variables known to affect out-
Confounding come and included in the study, but also unknown or
variables Smoking
unmeasured confounders. Unfortunately, it is usually
Exercise not possible to study risk or prognostic factors with
randomized trials.
Figure 5.4 ■ Example of confounding. The relationship
between folate intake and incidence of stroke was con- Restriction
founded by several cardiovascular risk and protective factors.
Patients who are enrolled in a study can be confined to
only those possessing a narrow range of characteristics,
between groups. Controlling is a general term for any a strategy called restriction. When this is done, cer-
process aimed at removing the effects of extraneous tain characteristics can be made similar in the groups
variables while examining the independent effects being compared. For example, the effect of prior
of individual variables. A variety of methods can cardiovascular disease on prognosis after acute myo-
be applied during the design or analysis of research cardial infarction could be studied in patients who
(summarized in Table 5.6 and described in the fol- had no history of cigarette smoking or hyperten-
lowing text). One or more of these strategies should sion. However, this approach is limiting. Although
be applied in any observational study that attempts restriction on entry to a study can certainly produce
to describe the effect of one variable independent of homogeneous groups of patients, it does so at the
other variables that might affect the outcome. The expense of generalizability. In the course of excluding
Table 5.6
Methods for Controlling Confounding
Phase of Study
Table 5.7
Example of Stratification: Hypothetical Death Rates after Coronary Bypass Surgery in Two
Hospitals, Stratified by Preoperative Risk
Hospital A Hospital B
Preoperative Risk Patients Deaths Rate (%) Patients Deaths Rate (%)
High 500 30 6 400 24 6
Medium 400 16 4 800 32 4
Low 300 2 0.67 1,200 8 0.67
Total 1,200 48 4 2,400 64 2.7
potential subjects, cohorts may no longer be repre- are especially strongly related to outcome, investiga-
sentative of most patients with the condition. Also, tors rely on other ways of controlling for bias as well.
after restriction, it is no longer possible, in that study,
to learn anything more about the effects of excluded Stratification
variables.
With stratification, data are analyzed and results
presented according to subgroups of patients, or
Matching strata, of similar risk or prognosis (other than the
Matching is another way of making patients in exposure of interest). An example of this approach is
two groups similar. In its simplest form, for each the analysis of differences in hospital morality for a
patient in the exposure group, one or more patients common surgical procedure, coronary bypass surgery
with the same characteristics (except for the factor (Table 5.7). This is especially relevant today because
of interest) would be selected for a comparison of several high-profile examples of “report cards”
group. Matching is typically done for variables that for doctors and hospitals, and the concern that the
are so strongly related to outcome that investigators reported differences may be related to patient rather
want to be sure they are not different in the groups than surgeon or hospital characteristics.
being compared. Often, patients are matched for Suppose we want to compare the operative mor-
age and sex because these variables are strongly tality rates for coronary bypass surgery at Hospitals A
related to risk or prognosis for many diseases, but and B. Overall, Hospital A noted 48 deaths in 1,200
matching for other variables, such as stage or sever- bypass operations (4%), and Hospital B experienced
ity of disease and prior treatments, may also be 64 deaths in 2,400 operations (2.6%).
useful. The crude rates suggest that Hospital B is supe-
Although matching is commonly done and can be rior. But is it really superior if everything else is
very useful, it has limitations. Matching controls bias equal? Perhaps the preoperative risk among patients
only for those variables involved in the match. Also, in Hospital A was higher than in Hospital B and
it is usually not possible to match for more than a that, rather than hospital care, accounted for the
few variables because of practical difficulties in find- difference in death rates. To see if this possibility
ing patients who meet all of the matching criteria. accounts for the observed difference in death rates,
Moreover, if categories for matching are relatively patients in each of these hospitals are grouped into
crude, there may be room for substantial differences strata of similar underlying preoperative risk based
between matched groups. For example, if women in a on age, prior myocardial function, extent of occlu-
study of risk for birth of a child with Down syndrome sive disease, and other characteristics. Then the oper-
were matched for maternal age within 10 years, there ative mortality rates within each stratum of risk are
could be a nearly 10-fold difference in frequency compared.
related to age if most of the women in one group Table 5.7 shows that when patients are divided
were 30 years old and most in the other 39 years by preoperative risk, the operative mortality rates in
old. Finally, as with restriction, once one matches on each risk stratum are identical in two hospitals: 6% in
a variable, its effects on outcomes can no longer be high-risk patients, 4% in medium-risk patients, and
evaluated in the study. For these reasons, although 0.67% in low-risk patients. The crude rates were mis-
matching may be done for a few characteristics that leading because of important differences in the risk
Chapter 5: Risk: Exposure to Disease 75
characteristics of the patients treated at the two hos- strategy to control for confounding when multiple
pitals: 42% of Hospital A’s patients and only 17% of variables need to be considered.
Hospital B’s patients were high risk.
An advantage of stratification is that it is a rela- Multivariable Adjustment
tively transparent way of recognizing and controlling
In most clinical situations, many variables act
for bias.
together to produce effects. The relationships among
these variables are complex. They may be related to
Standardization one another as well as to the outcome of interest.
The effect of one might be modified by the pres-
If an extraneous variable is especially strongly related
ence of others, and the joint effects of two or more
to outcomes, two rates can be compared without
might be greater than their individual effects taken
bias related to this variable if they are adjusted to
together.
equalize the weight given to that variable. This
Multivariable analysis makes it possible to con-
process, called standardization (or adjustment),
sider the effects of many variables simultaneously.
shows what the overall rate would be for each group
Other terms for this approach include mathematical
if strata-specific rates were applied to a population
modeling and multivariable adjustment. Modeling
made up of similar proportions of people in each
is used to adjust (control) for the effects of many
stratum.
variables simultaneously to determine the indepen-
To illustrate this process, suppose the operative
dent effects of one. This method also can select,
mortality in Hospitals A and B can be adjusted to a
from a large set of variables, those that contribute
common distribution of risk groups by giving each
independently to the overall variation in outcome.
risk stratum the same weight in the two hospitals.
Modeling can also arrange variables in order of the
Without adjustment, the risk strata receive different
strength of their contribution. There are several
weights in the two hospitals. The mortality rate of 6%
kinds of prototypic models, according to the design
for high-risk patients receives a weight of 500/1,200
and data in the study. Cohort and case-control stud-
in Hospital A and a much lower weight of 400/2,400
ies typically rely on logistic regression, which is
in Hospital B. The other risk strata were also weighted
used specifically for dichotomous outcomes. A Cox
differently in the two hospitals. The result is a crude
proportional hazard model is used when the out-
rate for Hospital A, which is the sum of the rate in
each stratum times its weight: (500/1,200 × 0.06) + come is the time to an event, as in survival analyses
(400/1,200 × 0.04) + (300/1,200 × 0.0067) = 0.04. (see Chapter 7).
Similarly, the crude rate for Hospital B is (400/2,400 Multivariable analysis of observational studies is
× 0.06) + (800/2,400 × 0.04) + (1,200/2,400 × the only feasible way of controlling for many vari-
0.0067) = 0.027. ables simultaneously during the analysis phase of a
If equal weights were used when comparing the study. Randomization also controls for multiple vari-
two hospitals, the comparison would be fair (free of ables, but during the design and conduct phases of a
the effect of different proportions in the various risk study. Matching can account for only a few variables
groups). The choice of weights does not matter as at a time, and stratified analyses of many variables run
long as it is the same in the two hospitals. Weights the risk of having too few patients in some strata. The
could be based on those existing in either of the hos- disadvantage of modeling is that for most of us it is
pitals or any reference population. For example, if a “black box,” making it difficult to recognize where
each stratum were weighted 1/3, then the standard- the method might be misleading. At its best, model-
ized rate for Hospital A = (1/3 × 0.06) + (1/3 × ing is used in addition to, not in place of, matching
0.04) + (1/3 × 0.0067) = 0.035, which is exactly and stratified analysis.
the same as the standardized rate for Hospital B.
Overall Strategy for Control
The consequence of giving equal weight to strata in
each hospital is to remove the apparent excess risk
of Confounding
of Hospital A. Except for randomization, all ways of dealing with
Standardization is commonly used in relatively extraneous differences between groups share a limita-
crude comparisons to adjust for a single variable such tion: They are effective only for those variables that
as age that is obviously different in groups being com- are singled out for consideration. They do not deal
pared. For example, the crude results of the folate/ with risk or prognostic factors that are not known at
stroke example were adjusted for age, as discussed the time of the study or those that are known but not
earlier. Standardization is less useful as a stand-alone taken into account.
76 Clinical Epidemiology: The Essentials
60
OBSERVATIONAL STUDIES 50
AND CAUSE 40
Review Questions
For question 5.1, select the best answer. 5.2. What was the relative risk of stroke of smokers
compared to non-smokers in their 40s?
5.1. Which of the following statements is not
A. 1.4
correct for both prospective and retrospective
B. 4.0
cohort studies?
C. 22.3
A. They measure incidence of disease directly. D. 30.2
B. They allow assessment of possible E. 72.8
associations between exposure and many F. 80.7
diseases.
C. They allow investigators to decide
beforehand what data to collect. 5.3. What was the attributable risk per 1,000
D. They avoid bias that might occur if people of stroke among smokers compared
measurement of exposure is made after to non-smokers in their 60s?
the outcome of interest is known. A. 1.4
B. 4.0
Questions 5.2–5.4 are based on the following C. 22.3
example:
D. 30.2
E. 72.8
A study was done examining the relationship of
F. 80.7
smoking, stroke, and age (13). The 12-year inci-
dence per 1,000 persons (absolute risk) of stroke
according to age and smoking status was: 5.4. Which of the following statements about the
study results is incorrect?
Age Non-smokers Smokers
45–49 7.4 29.7 A. To calculate population-attributable risk
of smoking among people in their 60s,
65–69 80.2 110.4
additional data are needed.
78 Clinical Epidemiology: The Essentials
B. More cases of stroke due to smoking 5.6. What is the attributable risk of DVT for
occurred in people in their 60s than in women taking OCs who do not carry the
their 40s. mutation for factor V Leiden compared to those
C. When relative risk is calculated, the not taking OCs and not carrying the mutation?
results reflect information about the
A. 0.8/10,000/yr
incidence in exposed and unexposed
B. 1.3/10,000/yr
persons, whereas the results for
C. 2.2/10,000/yr
attributable risk do not.
D. 9.5/10,000/yr
D. The calculated relative risk is a stronger
E. 25.5/10,000/yr
argument for smoking as cause of
stroke for persons in their 40s than the
5.7. What is the attributable risk of DVT for
calculated risk for persons in their 60s.
women taking OCs who carry factor V
E. Depending on the question asked, age
Leiden compared to women on OCs but
could be considered either a confounding
not carrying the mutation?
variable or an effect modifier in the
study. A. 0.8/10,000/yr
B. 1.3/10,000/yr
C. 2.2/10,000/yr
Questions 5.5–5.11 are based on the following
example:
D. 9.5/10,000/yr
E. 25.5/10,000/yr
Deep venous thrombosis (DVT) is a serious
5.8. In a population with 100,000 white women,
condition that occasionally can lead to pulmonary
all of whom take OCs, what is the population-
embolism and death (14). The incidence of DVT
attributable risk for DVT in women who are
is increased by several genetic and environmental
heterozygous for factor V Leiden?
factors, including oral contraceptives (OCs) and a
genetic mutation, factor V Leiden. These two risk A. 0.8/10,000/yr
factors, OCs and factor V Leiden, interact. Hetero- B. 1.3/10,000/yr
zygotes for factor V Leiden have 4 to 10 times the C. 2.2/10,000/yr
risk of DVT of the general population. In women D. 9.5/10,000/yr
without the genetic mutation, incidence of DVT E. 25.5/10,000/yr
rises from about 0.8/10,000 women/yr among those
not on OCs to 3.0/10,000 women/yr for those 5.9. What is the relative risk of DVT in women
taking the pill. The baseline incidence of DVT in taking OCs and heterozygous for factor V
heterozygotes for factor V Leiden is 5.7/10,000 Leiden compared to women who take
women/yr, rising to 28.5/10,000 women/yr among OCs but do not carry the mutation?
those taking OCs. Mutations for factor V Leiden
A. 3.8
occur in about 5% of whites but are absent in
B. 7.1
Africans and Asians.
C. 9.5
D. 28.5
For questions 5.5–5.10., select the one best E. 35.6
answer.
5.10. What is the relative risk of DVT in women
5.5. What is the absolute risk of DVT in women taking OCs and without the mutation
who do not have the mutation and do not compared to women without the mutation
take OCs? who are not taking OCs?
A. 0.8/10,000/yr A. 3.8
B. 1.3/10,000/yr B. 7.1
C. 2.2/10,000/yr C. 9.5
D. 9.5/10,000/yr D. 28.5
E. 25.5/10,000/yr E. 35.6
Chapter 5: Risk: Exposure to Disease 79
5.11. Given the information in this study and users died as often as non-users. However,
calculations for questions 5.5–5.10, which aspirin users were sicker and had illnesses
of the following statements about risk of more likely to be treated with aspirin. Which
developing DVT is incorrect? of the following methods is the best way to
account for the propensity of people to take
A. Factor V Leiden modifies the effect of
aspirin?
OCs on the annual risk of developing
DVT by increasing risk from about 3 per A. Calculate the absolute risk of
10,000 to about 30 per 10,000 women. cardiovascular death in the two groups and
B. Being heterozygous for factor V Leiden the risk difference attributable to using
confers about twice the risk for DVT as aspirin.
taking OCs. B. Create subgroups of aspirin users and
C. Women heterozygous for factor V Leiden non-users with similar indications for
should be advised against taking OCs using the medication and compare death
because of the high relative risk for DVT rates among the subgroups.
among such women. C. For each person using aspirin, match a
non-user on age, sex, and comorbidity
For question 5.12, select the best answer and compare death rates in the two
groups.
5.12. In a study to determine if regularly taking
aspirin prevents cardiovascular death, aspirin
REFERENCES
1. Dawber TR. The Framingham Study: The Epidemiology of results from the National Osteoporosis Risk Assessment
Atherosclerotic Disease. Cambridge, MA: Harvard University (NORA), Osteoporosis Int 2006; 17:565–574.
Press; 1980. 8. Hofman A, Vandenbroucke JP. Geoffrey Rose’s big idea. Br
2. Kannel WB, Feinleib M, McNamara PM, et al. An investiga- Med J 1992;305:1519–1520.
tion of coronary heart disease in families. The Framingham 9. Reulen RC, Winter DL, Frobisher C, et al. long-term cause-
Offspring Study. Am J Epidemiol 1979;110:281–290. specific mortality among survivors of childhood cancers.
3. Wen CP, Wai J PM, Tsai MK, et al. Minimum amount of phys- JAMA 2010;304:172–179.
ical activity for reduced mortality and extended life expectancy: 10. Al-Delaimy WK, Rexrode KM, Hu FB, et al. Folate
a prospective cohort study. Lancet 2011;378:1244–1253. intake and risk of stroke among women. Stroke 2004;35:
4. Madsen KM, Hviid A, Vestergaard M, et al. A population- 1259–1263.
based study of measles, mumps, and rubella vaccination and 11. Rothman KJ. Modern Epidemiology. Boston: Little Brown
autism. N Engl J Med 2002;347:1477–1482. and Co.; 1986.
5. The Editors of The Lancet. Retraction—ileal-lymphoid-nod- 12. Patrono C, Rodriguez LAG, Landolfi R, et al. Low-dose aspi-
ular hyperplasia, non-specific colitis, and pervasive develop- rin for the prevention of atherothrombosis. N Engl J Med
mental disorder in children. Lancet 2010;375:445. 2005;353:2373–2383.
6. Geiger AM, Yu O, Herrinton LJ, et al. (on behalf of the CRN 13. Psaty BM, Koepsell TD, Manolio TA, et al. Risk ratios and
PROTECTS Group). A case-cohort study of bilateral prophy- risk differences in estimating the effect of risk factors for car-
lactic mastectomy efficacy in community practices. Am J Epi- diovascular disease in the elderly. J Clin Epidemiol 1990;43:
demiol 2004;159:S99. 961–970.
7. Siris ES, Brenneman SK, Barrett-Connor E, et al. The effect of 14. Vandenbroucke JP, Rosing J, Bloemenkemp KW, et al. Oral
age and bone mineral density on the absolute, excess, and rela- contraceptives and the risk of venous thrombosis. N Engl J
tive risk of fracture in postmenopausal women aged 55–99: Med 2001;344:1527–1535.
Chapter 6
80
Chapter 6: Risk: From Disease to Exposure 81
YES
CASES
(Have disease)
NO
Time
Research
YES
CONTROLS
(Do not have
disease)
NO
USED HELMET
Head
Injury
DID NOT CONTROLLED
USE HELMET
Age, Sex
Nationality Skiers and snowboarders
Skill level at 8 major Norweigian
Equipment used ski resorts
Ski school attendance
USED HELMET Rented or owned equipment
No
Head
Injury
DID NOT
USE HELMET
ESTIMATE OF
RELATIVE RISK
Figure 6.2 ■ A case-control study of helmet use and head injuries among skiers and snowboarders. (Summary
of Sulheim S, Holme I, Ekeland A, et al. Helmet use and risk of head injuries in alpine skiers and snowboarders. JAMA
2006;295:919–924.)
Chapter 6: Risk: From Disease to Exposure 83
The word control comes up in other situations, ing research question in case-control studies is about
too. It is used in experimental studies to refer to ordinary occurrences of disease and exposures.
people, animals, or biologic materials that have not Also, it is difficult in this situation to be confident
been exposed to the study intervention. In diagnostic that controls, however they are chosen, are truly simi-
laboratories, “controls” refer to specimens that have a lar to cases in all ways other than exposure, which is
known amount of the material being tested for. As a critical to the validity of this kind of study (see the
verb, control is used to describe the process of taking Selecting Controls section). Fortunately, it is rarely
into account, neutralizing, or subtracting the effects necessary to take this scientific risk because there are
of variables that are extraneous to the main research many databases that make true population sampling
question. Here, the term is used in the context of possible.
case-control studies to refer to people who do not However the cases might be identified, it should
have the disease or outcome under study. be possible for both them and controls to be exposed
to the risk factor and to experience the outcome. For
example, in a case-control study of exercise and sud-
DESIGN OF CASE-CONTROL den death, cases and control would have to be equally
STUDIES able to exercise (if they chose to) to be eligible.
It goes without saying that diagnosis should
The validity of case-control studies depends on the be rigorously confirmed for cases (and excluded
care with which cases and controls are selected, how for controls), and the criteria made explicit. In the
well exposure is measured, and how completely bisphosphonates study, investigators agreed on
potentially confounding variables are controlled. explicit criteria for atypical fractures of the femur and
reviewed all radiographs, not just reports of them, to
Selecting Cases classify fracture type. One investigator then reviewed
The cases in case-control research should be new (inci- a random sample of radiographs for a second time
dent) cases, not existing (prevalent) ones. The reasons without knowing how each had been classified, and
are based on the concepts discussed in Chapter 2. The there was complete agreement between the original
prevalence of a disease at a point in time is a function and the second classifications.
of both the incidence and duration of that disease.
Duration is in turn determined by the rate at which Selecting Controls
patients leave the disease state (because of recovery
Above all, the validity of case-control studies depends
or death) or persist in it (because of a slow course or
on the comparability of cases and controls. To be
successful palliation). It follows from these relation-
comparable, cases and controls should be members of
ships that risk factors for prevalent disease may be risk
the same base population and have an equal opportu-
factors for incidence, duration, or both; the relative
nity of being exposed. The best approach to meeting
contributions of the two cannot be determined. For
these requirements is to ensure that controls are a ran-
example, if prevalent cases were studied, an exposure
dom sample of all non-cases in the same population
that caused a rapidly lethal form of the disease would
or cohort that produced the cases.
result in fewer cases that were exposed, reducing rela-
tive risk and thereby suggesting that exposure is less The Population Approach
harmful than it really is or even that it is protective.
At best, a case-control study should include all the Studies in which cases and controls are a complete
cases or a representative sample of all cases that arise or random sample of a defined population are called
in a defined population. For example, the bisphos- population-based case-control studies. In prac-
phonates study included all residents of Sweden in tice, most of these populations are dynamic—that is,
2008 and the helmets study all skiers and snowboard- continually changing, with people moving in and out
ers in eight major resorts in Norway (accounting for of the population—as described in Chapter 2 (3). This
55% of all ski runs in the country). might bias the result, especially if cases and controls
Some case-control studies, especially older ones, are sampled over a long period of time and exposure
have identified cases in hospitals and referral cen- is changing rapidly during this time. This concern
ters where uncommon diseases are most likely to be can be laid to rest if there is evidence that population
found. This way of choosing cases is convenient, but turnover is in fact so small as to have little effect on
it raises validity problems. These centers may attract the study results or if cases and controls are matched
particularly severe or atypical cases or those with on calendar time—that is, controls are selected on the
unusual exposures—the wrong sample if the underly- same date as the onset of disease in the cases.
84 Clinical Epidemiology: The Essentials
The Cohort Approach a way that the selection seems to produce controls
that are comparable to cases. For example, if cases
Another way of ensuring that cases and controls are are selected from a hospital ward, the controls might
comparable is to draw them from the same cohort. In be selected from patients with different diseases,
this situation, the study is said to be a nested case- apparently unrelated to the exposure and disease of
control study (it is “nested” in the cohort).
interest, in the same hospital. As pointed out ear-
In the era of large databases and powerful comput- lier, for most risk factors and diseases, case-control
ers, why not just analyze cohort data as a cohort study studies in health care settings are more fallible than
rather than a case-control study? After all, the inef- population- or cohort-based sampling because hos-
ficiency of including many exposed members of the pitalized patients are usually a biased sample of all
cohort, even though few of them will experience the people in the community, the people to whom the
outcome, could be overcome by computing power. results should apply.
The usual reason for case-control analyses of cohort Another approach is to obtain controls from the
data is that some of the study variables, especially some community served by the hospital. However, many
covariates, may not be available in the cohort data- hospitals do not draw patients exclusively from the
base and, therefore, have to be gathered from other surrounding community; some people in the com-
sources for each patient in the study. Obtaining the munity go to other hospitals, and some people in
missing information from medical records, question- other communities pass up their own neighborhood
naires, genetic analyses, and linkage to other databases hospital to go to the study hospital. As a result, cases
can be very expensive and time-consuming. Therefore, and controls may be systematically different in ways
there is a practical advantage to having to assemble that distort the exposure-disease relationship.
this information only for cases and a sample of non-
cases in the cohort, not every member of the cohort. Multiple Control Groups
With nested case-control studies, there is an
opportunity to obtain both a crude measures of If none of the available control groups seems ideal,
incidence from a cohort analysis and a strong esti- one can see how choice of controls affects results
mate of relative risk, that takes into account a rich by selecting several control groups with apparently
set of covariates, from a case-control analysis. With complementary scientific strengths and weaknesses.
this information one has the full set of risk described Similar estimates of relative risk obtained using dif-
in Chapter 5—absolute risk for exposed and non- ferent control groups is evidence against bias because
exposed people, relative risk, attributable risk, and it is unlikely that the same biases would affect other-
population risks. wise dissimilar groups in the same direction and to
The bisphosphonate example illustrates the advan- the same extent. If the estimates of relative risks are
tages of complementary cohort and case control anal- different, it is a signal that one or more are biased
yses. A cohort analysis, taking only age into account, and the reasons need to be investigated.
showed that the increase in absolute risk of atypi-
cal fractures related to bisphosphonate use was five
cases per 10,000 patient-years. Collection of data on
covariates was done by linking to other databases and Example
was presumably too resource-intensive to be done on
In the helmets and head injury example (2),
the entire national sample. With these data for cases
the main control group was uninjured people
and controls, a much more credible estimate of rela-
skiing or snowboarding on the same hills on
tive risk was possible in the case-control analysis. The
the same days, but one could imagine disad-
estimate of relative risk of 33 from the case-control
vantages to these controls, such as their not
analysis was consistent with the crude relative risk
having similar risk-taking behavior to cases.
from the cohort analysis (not accounting for potential
To examine the effect of choice of control
confounders other than age), which was 47. Because
group on results, the investigators repeated the
of the two analyses, both cohort and case-control, the
analyses with a different control group—skiers
authors could point out that the relative risk of atypi-
with other injuries. The estimated relative risk
cal fracture was large but the absolute risk was small.
was similar—a reduction in risk of 55% rather
Hospital and Community Controls than 60% with the original control group—
suggesting that choice of control group did not
If population- or cohort-based sampling is not pos- substantially affect results.
sible, a fallback position is to select controls in such
Chapter 6: Risk: From Disease to Exposure 85
Multiple Controls per Case income, race, and ethnicity tend to be related to each
other, so if one matches on one, it will obscure effects
Having several control groups per case group should of the others. Matching on diseases with the same
not be confused with having several controls for each treatment would result in overmatching for studies of
case. If the number of cases is limited, as is often so the effects of that treatment. For example, in a study of
with rare diseases, the study can provide more infor- non-steroidal anti-inflammatory drugs (NSAIDs) and
mation if there is more than one control per case. renal failure, if cases and controls were matched for
More controls produce a gain in the ability to detect the presence of arthritic symptoms, which are com-
an increase or decrease in risk if it exists, a property of monly treated with NSAIDs, matched pairs would
a study called “statistical power” (see Chapter 11). As have an artificially similar history of NSAID use.
a practical matter, the gain is worthwhile up to about A disadvantage of matching is that once a variable
three or four controls per case, after which little is is matched for, and so made similar in cases and con-
gained by including even more controls. trols, it is no longer possible to learn how it affects the
exposure–disease relationship. Also, for many studies
Matching it is not possible to find matched controls for more
If some characteristics seem especially strongly related than a few case characteristics. This can be overcome,
to either exposure or disease, such that one would to some extent, if the number of potential controls
want to be sure that they occur similarly in cases is huge or if the matching criteria are relaxed (e.g.,
and controls, they can be matched. With matching, by matching age within a 5-year range rather than
for each case with a set of characteristics, the study the same year). In summary, matching is a useful way
includes one or more controls that possess the same of controlling for confounding, but it can limit the
characteristics. Researchers commonly match for age questions that can be asked in the study and can cause
and sex, because these are often strongly related to rather than remove bias.
both exposure and disease, but matching may extend
beyond these demographic characteristics (e.g., to Measuring Exposure
risk profile or disease severity) when other factors are The validity of case-control studies also depends on
known to be strongly associated with an exposure or avoiding misclassification when measuring expo-
outcome. Matching increases the useful information sure. The safest approach is to depend on complete,
obtainable from a set of cases and controls by reducing accurate records that were collected before disease
differences between groups in determinants of disease developed. Examples include pharmacy records for
other than the one being considered, thereby allowing studies of prescription drug risks, surgical records for
a more powerful (sensitive) measure of association. studies of surgical complications, and stored blood
Sometimes, cases and controls are made compa- specimens for studies of risk related to biomolecular
rable by umbrella matching, matching on a vari- abnormalities. With such records, knowledge of dis-
able such as hospital or community that is a proxy ease status cannot bias reporting of exposure.
for many other variables that could confound the However, many important exposures can only be
exposure–disease relationship and would be difficult measured by asking cases and controls or their proxies
to measure one at a time, if that were possible at all. about them. Among these are exercise, diet, and over-
Examples of variables that might be captured under the-counter and recreational drug use. The following
an umbrella include social disadvantage related to example illustrates how investigators can “harden” data
income, education, race, and ethnicity; propensity to from interviews that are inherently vulnerable to bias.
seek health care or follow medical advice; and local
patterns of health care.
Matching can be overdone, biasing study results.
Overmatching can occur if investigators match on Example
variables so closely related to exposure that expo-
sure rates in cases and controls becomes more simi- What are the risk factors for suicide in China?
lar than they are in the population. The result is to Investigators studied 519 people who had com-
make the observed estimate of relative risk closer to mitted suicide and 536 people who had died
1 (no effect). There are many reasons why the match- from other injuries (4). Both groups came
ing variable might be related to exposure. It may be from 23 geographically representative sites in
part of the chain of events leading from exposure to China. Exposure was measured by interviews
disease. Other variables might be highly related to each with family members and close associates. The
other because they have similar root causes; education,
86 Clinical Epidemiology: The Essentials
Multiple Exposures
bleeding, loss of appetite, increased urinary fre-
Thus far, we have described case-control studies of a quency, abdominal pain, rectal bleeding, and
single, dichotomous exposure, but case-control stud- abdominal bloating. After excluding symptoms
ies are an efficient means of examining a far richer reported in the 180 days before diagnosis (to get
array of exposures: the effects of multiple exposures, a better estimate of “early” symptoms), three
various doses of the same exposure, and exposures remained independently associated with ovarian
that are early symptoms (not risk factors) of disease. cancer: abdominal distension, urinary frequency,
and abdominal pain.
Cases Noncases
Exposed a b a+b
Not
exposed c d c+d
a+c b+d
Figure 6.3 ■ Calculation of relative risk from a cohort study and odds
ratio (estimated relative risk) from a case-control study.
88 Clinical Epidemiology: The Essentials
[a/(a + b)] and unexposed [c/(c + d)] members of the The meaning of the odds ratio is analogous to
cohort. It was also possible to calculate the relative risk: that of relative risk obtained from cohort studies. If
the frequency of exposure is higher among cases, the
Relative risk odds ratio will exceed 1, indicating increased risk.
Incidence of disease in the exposed The stronger the association is between the exposure
=
Incidence of diisease in the unexposed and disease, the higher the odds ratio. Conversely, if
the frequency of exposure is lower among cases, the
a / (a + b ) odds ratio will be <1, indicating protection. Because
=
c / (c + d ) of the similarity of the information conveyed by
an odds ratio and the relative risk, and the meaning
The case-control study, on the other hand, began more readily attached to relative risk, odds ratios are
with the selection of a group of cases of atypical frac- often reported as estimated relative risks.
ture (a + c) and a group of controls with other fractures An odds ratio is approximately equal to a relative
(b + d ). There is no way of knowing disease rates risk when the incidence of disease is low. To see this
because these groups are determined not by nature but mathematically, look at the formula for relative risk
by the investigators’ selection criteria. Therefore, it is in Figure 6.3. If the number of cases in the exposed
not possible to compute the incidence of disease among group (a) is small relative to the number of non-cases
people exposed and not exposed to bisphosphonates; in that group (b) then a/(a + b) is approximately
and it is not possible to calculate a relative risk. What equal to a/b. Similarly, if the number of cases in the
does have meaning, however, is the relative frequency of non-exposed group (c) is small relative to non-cases in
exposure to bisphosphonates among cases and controls. that group (d), then c/(c + d) is approximated by c/d.
One approach to comparing the frequency of Then, relative risk = a/b divided by c/d, which simpli-
exposure among cases and controls provides a mea- fied to ad/bc, the odds ratio.
sure of risk that is conceptually and mathematically How low must the rates be for the odds ratio to be an
similar to relative risk. The odds ratio is defined as accurate estimate of relative risk? The answer depends
the odds that a case is exposed divided by the odds on the size of the relative risk (7). In general, bias in
that a control is exposed: the estimate of relative risk becomes large enough to
( a +c c )
comes become more frequent, the odds ratio tends to
overestimate the relative risk when it is >1 and underes-
timate the relative risk when it is <1. Fortunately, most
Example
This example also illustrates how case-control and
A large outbreak of gastroenteritis, with many cohort studies, laboratory studies of the responsible
cases complicated by hemolytic-uremic syn- organism, and “shoe-leather” epidemiology during
drome (a potentially fatal condition with acute trace-back acted in concert to identify the underlying
cause of the epidemic.
90 Clinical Epidemiology: The Essentials
250
200
Number of cases
150
100
50
0
5 10 15 20 25 30 5 10 15 20 25 30 5
May June July
Date of disease onset
Figure 6.4 ■ Epidemic curve of an outbreak of Shiga-toxin-producing Escherichia coli infection in Germany.
(Redrawn with permission from Frank C, Werber D, Cramer JP, et al. Epidemic profile of shiga-toxin-producing
Escherichia coli 0104:H4 outbreak in Germany. N Engl J Med 2011;365:1771–1780.)
Review Questions
Read the following and select the best response. more than 520,000 participants, of vitamin
D concentration and the risk of colon cancer.
6.1. In a case-control study of oral contraceptives They studied 1,248 cases of incident colon
and myocardial infarction (heart attack), cancer arising in the cohort and an equal
exposure to birth control pills was abstracted number of controls, sampled from the same
from medical records at the time of the myo- cohort and matched by age, sex, and study
cardial infarction. Results might be biased center. Vitamin D was measured in blood
toward finding an association by all of the samples taken years before diagnosis. Vitamin
following except: D levels were lower in patients with colon
cancer, independent of a rich array of poten-
A. Physicians might have asked about use
tially confounding variables. The study results
of birth control pill use more carefully in
could be described by any of the following
cases.
except:
B. Having a myocardial infarction might
have led to oral contraceptive use. A. Vitamin D levels were associated with
C. Physicians might have been more likely to colorectal cancer.
record birth control use in cases. B. Vitamin D deficiency was a risk factor for
D. Medical record abstractors might have colorectal cancer.
looked for evidence of oral contraceptive C. Nesting the study in a large cohort was a
use more carefully if they knew a patient strength of the study.
had had a myocardial infarction. D. The results might have been confounded
E. Patients might have recalled exposure with unmeasured variables related to
more readily when they had a heart attack. vitamin D levels and colorectal cancer.
E. Vitamin D deficiency was a cause of
6.2. Investigators in Europe did a case-control colorectal cancer.
study, nested in a multicountry cohort of
Chapter 6: Risk: From Disease to Exposure 91
6.3. Which of the following is the most direct 6.8. Case-control studies can be used to study all
result of a case-control study? of the following except:
A. Prevalence A. The early symptoms of stomach cancer
B. Risk difference B. Risk factors for sudden infant death
C. Relative risk syndrome
D. Incidence C. The incidence of suicide in the adult
E. Odds ratio population
D. The protective effect of aspirin
6.4. The epidemic curve for an acute infectious E. Modes of transmission of an infectious
disease describes: disease
A. The usual incubation period for the causal
6.9. In a case-control study of exercise and sudden
agent
cardiac death, matching would be useful:
B. A comparison of illness over time in
exposed versus non-exposed people A. To control for all potential confounding
C. The onset of illness in cases over time variables in the study
D. The duration of illness, on average, in B. To make cases and controls similar to
affected individuals each other with respect to a few major
E. The distribution of time from infection to characteristics
first symptoms C. To make it possible to examine the effects
of the matched variables on estimated
6.5. Which of the following is the best reason for relative risk
doing a case-control analysis of a cohort study? D. To test whether the right controls were
A. Case-control studies are a feasible way of chosen for the cases in the study
controlling for confounders not found in E. To increase the generalizability of the study
the cohort dataset.
6.10. In a case-control study of whether prolonged
B. Case-control studies can provide all the
air travel is a risk factor for venous thrombo-
same information more easily.
embolism, 60 out of 100 cases and 40 out of
C. Case-control studies can determine
100 controls had prolonged air travel. What
incidence of disease in exposed and non-
was the crude odds ratio from this study?
exposed members of the cohort.
D. Case-control studies are in general A. 0.44
stronger than cohort studies. B. 1.5
C. 2.25
6.6. The best way to identify cases is to obtain D. 3.0
them from: E. Not possible to calculate
A. A sample from the general (dynamic)
6.11. A population-based case-control study would
population
be especially useful for studying:
B. Primary care physicians’ offices
C. A community A. The population attributable risk of disease
D. A cohort representative of the population B. Multiple outcomes (diseases)
E. A hospital C. The incidence of rare diseases
D. The prevalence of disease
6.7. What is the best reason to include multiple E. Risk factors for disease
control groups in a case-control study?
6.12. The prevalence odds ratio of rheumatoid
A. To obtain a stronger estimate of relative risk
arthritis provides an estimate of:
B. There are a limited number of cases and
an ample number of potential controls. A. The relative risk of arthritis
C. To control for confounding B. The attributable risk of arthritis
D. To increase the generalizability of the result C. Risk factors for the duration of arthritis
E. The main control group may be D. The association between a patient
systematically different from cases (other characteristic and prevalence of arthritis
than on the exposure of interest). E. Risk factors for the incidence of arthritis
92 Clinical Epidemiology: The Essentials
6.13. In an outbreak of acute gastroenteritis, a case- B. Are complications more common with
control study would be especially useful for fiberoptic cholecystectomy than with
identifying: conventional (open) surgery?
C. Is drinking alcohol a risk factor for breast
A. Characteristics of the people affected
cancer?
B. The number of people affected over time
D. How often do complications occur after
C. The microbe or toxin causing the
fiberoptic cholecystectomy?
outbreak
E. How effective are antibiotics for otitis
D. The mode of transmission
media?
E. Where the causative agent originated
6.16. In a case-control study of airplane flight
6.14. Sampling cases and controls from a defined
and thrombophlebitis, all of the following
population or cohort accomplishes which of
conditions should be met for the odds ratio
the following?
to be a reasonable estimate of relative risk
A. It is the only way of including incident except:
(new) cases of disease.
A. Controls were sampled from the same
B. It avoids the need for inclusion and
population as cases.
exclusion criteria.
B. Cases and controls met the same inclusion
C. It tends to include cases and controls
and exclusion criteria.
that are similar to each other except for
C. Other variables that might be related to
exposure.
air travel and thrombosis were controlled
D. It matches cases and controls on
for.
important variables.
D. Cases and controls were equally susceptible
E. It ensures that the results are generalizable.
to developing thrombophlebitis (e.g., were
equally mobile, weight, recent trauma,
6.15. Case-control studies would be useful for
previous VTE) other than for air travel.
answering all of the following questions
E. The incidence of thrombophlebitis was
except:
more than 5/100.
A. Do cholesterol-lowering drugs prevent
coronary heart disease? Answers are in Appendix A.
REFERENCES
1. Schilcher J, Michaelsson K, Aspenberg P. Bisphosphonate use 6. Hamilton W, Peters TJ, Bankhead C, et al. Risk of ovarian
and atypical fractures of the femoral head. New Engl J Med cancer in women with symptoms in primary care: population-
2011;364:1728–1737. based case-control study. BMJ 2009;339:b2998. doi:10.1136/
2. Sulheim S, Holme I, Ekeland A, et al. Helmet use and risk bmj.b2998
of head injuries in alpine skiers and snowboarders. JAMA 7. Feinstein AR. The bias caused by high value of incidence for p1
2006;295:919–924. in the odds ratio assumption that 1-p1 is approximately equal
3. Knol MJ, Vandenbroucke JP, Scott P, et al. What do case- to 1. J Chron Dis 1986;39:485–487.
control studies estimate? Survey of methods and assumptions 8. Frank C, Werber D, Cramer JP, et al. Epidemic profile of
in published case-control research. Am J Epidemiol 2008;168: shiga-toxin-producing Escherichia coli 0104:H4 outbreak in
1073–1081. Germany. N Engl J Med 2011;365:1771–1780.
4. Phillips MR, Yang G, Zhang Y, et al. Risk factors for suicide 9. Buchholz U, Bernard H, Werber D et al. German outbreak of
in China: a national case-control psychological autopsy study. Escherichia coli 0104:H4 associated with sprouts. N Engl J Med
Lancet 2002;360:1728–1736. 2011;365:1763–1770.
5. Psaty BM, Koepsell TD, LoGerfo JP, et al. Beta-blockers and
primary prevention of coronary heart disease in patients with
high blood pressure. JAMA 1989;261:2087–2094.
Chapter 7
Prognosis
He, who would rightly distinguish those that will survive or die, as well as those that will
be subject to disease a longer or shorter time, ought, from his knowledge and attention,
to be able to form an estimate of all symptoms, and rationally to weigh their powers by
comparison.
—Hippocrates
460–375 B.C.
93
94 Clinical Epidemiology: The Essentials
The Patients Are Different Clinicians can often form good estimates of short-
term prognosis from their own personal experience.
Studies of risk factors usually deal with healthy people, However, they may be less able to sort out, without
whereas studies of prognostic factors are of sick people. the assistance of research, the various factors that are
related to long-term prognosis or the complex ways
The Outcomes Are Different in which prognostic factors are related to one another.
For risk, the event being counted is usually the onset
of disease. For prognosis, consequences of disease are CLINICAL COURSE AND NATURAL
counted, including death, complications, disability, HISTORY OF DISEASE
and suffering.
Prognosis can be described as either the clinical course
The Rates Are Different or natural history of disease. The term clinical course
describes the evolution (prognosis) of a disease that has
Risk factors are usually for low-probability events.
come under medical care and has been treated in a vari-
Yearly rates for the onset of various diseases are on
ety of ways that affect the subsequent course of events.
the order of 1/1,000 to 1/100,000 or less. As a result,
Patients usually receive medical care at some time in the
relationships between exposure and disease are diffi-
course of their illness when they have diseases that cause
cult to confirm in the course of day-to-day clinical
symptoms such as pain, failure to thrive, disfigurement,
experiences, even for astute clinicians. Prognosis, on
or unusual behavior. Examples include type 1 diabetes
the other hand, describes relatively frequent events.
mellitus, carcinoma of the lung, and rabies. After such
For example, several percent of patients with acute
a disease is recognized, it is likely to be treated.
myocardial infarction die before leaving the hospital.
The prognosis of disease without medical interven-
tion is termed the natural history of disease. Natural
The Factors May be Different history describes how patients fare if nothing is done
Variables associated with an increased risk are not about their disease. A great many health conditions do
necessarily the same as those marking a worse progno- not come under medical care, even in countries with
sis. Often, they are considerably different for a given advanced health care systems. They remain unrecog-
disease. For example, the number of well-established nized because they are asymptomatic (e.g., many can-
risk factors for cardiovascular disease (hypertension, cers of the prostate are occult and slow growing) and
smoking, dyslipidemia, diabetes, and family history are, therefore, unrecognized in life. For others, such as
of coronary heart disease) is inversely related to the osteoarthritis, mild depression, or low-grade anemia,
risk of dying in the hospital after a first myocardial people may consider their symptoms to be one of the
infarction (Fig. 7.1) (1). ordinary discomforts of daily living, not a disease and,
therefore, not seek medical care for them.
Exposed
NO
COHORT Time
YES
Not exposed
NO
Figure 7.2 ■ Design of a cohort study of risk.
ELEMENTS OF PROGNOSTIC
STUDIES less, complication rates in newborns were much
higher than in the general population. Neona-
Figure 7.2 shows the basic design of a cohort study of tal morbidity (one or more complications) oc-
prognosis. At best, studies of prognosis are of a defined curred in 80% of infants and rates of congenital
clinical or geographic population, begin observation malformations and unusually large newborns
at a specified point in time in the course of disease, (macrosomia) were three-fold to 12-fold higher
follow-up all patients for an adequate period of time, than in the general population. This study sug-
and measure clinically important outcomes. gests that good control of blood sugar alone
was not sufficient to prevent complications of
Patient Sample pregnancy in women with type 1 diabetes.
The purpose of representative sampling from a defined
population is to assure that study results have the
greatest possible generalizability. It is sometimes possi- Even without national medical records, popula-
ble to study prognosis in a complete sample of patients tion-based studies are possible. In the United States,
with new-onset disease in large regions. In some coun- the Network of Organ Sharing collects data on all
tries, the existence of national medical records makes patients with transplants, and the Surveillance, Epi-
population-based studies of prognosis possible. demiology, and End Results (SEER) program collects
incidence and survival data on all patients with new-
onset cancers in several large areas of the country,
comprising 28% of the U.S. population. For primary
Example care questions, in the United States and elsewhere,
individual practices in communities have banded
Dutch investigators studied the risk of compli- together into “primary care research networks” to col-
cations of pregnancy in women with type 1 dia- lect research data on their patients’ care.
betes mellitus (4). The sample included all of Most studies of prognosis, especially for less common
the 323 women in the Netherlands with type diseases, are of local patients. For these studies, it is
1 diabetes who had become pregnant during a especially important to provide the information that
1-year period and had been under care in one users can rely on to decide whether the results gener-
of the nation’s 118 hospitals. Most pregnan- alize to their own situation: patients’ characteristics
cies were planned, and during pregnancy, most (e.g., age, severity of disease, and comorbidity), the
women took folic acid supplements and had setting where they were found (e.g., primary care
good control of their blood sugar. Neverthe- practices, community hospitals, or referral centers),
and how they were sampled (e.g., complete, random,
96 Clinical Epidemiology: The Essentials
80 80
60 60
40 40
20 20
Percent surviving
0 1 2 3 4 5 0 1 2 3 4 5
Years Years
80 80
60 60
40 40
20 20
0 1 2 3 4 5 0 1 2 3 4 5
Years Years
Figure 7.3 ■ A limitation of 5-year survival rates: Four conditions with the same
5-year survival rate of 10%.
condition of interest and are at the same point in the the point at which they dropped out. Also, it would
course of their illness (e.g., onset of symptoms, diag- be necessary to wait until all of the cohort’s members
nosis, or beginning of treatment), and then keep them had reached each point in follow-up before the prob-
all under observation until all experience the outcome ability of surviving to that point could be calculated.
or not. For a small cohort, one might then represent Because patients ordinarily become available for a
these patients’ clinical course, as shown in Figure 7.4A. study over a period of time, at any point in calendar
The plot of survival against time displays steps cor- time, there would be a relatively long follow-up for
responding to the death of each of the 10 patients in patients who had entered the study first, but only brief
the cohort. If the number of patients were increased, experience with those who had entered more recently.
the size of the steps would diminish. If a very large The last patient who entered the study would have to
number of patients were studied, the figure would reach each year of follow-up before any information
approximate a smooth curve (Fig. 7.4B). This infor- on survival to that year would be available.
mation could then be used to predict the year-by-year,
or even week-by-week, prognosis of similar patients.
Survival Curves
Unfortunately, obtaining the information in this
way is impractical for several reasons. Some of the To make efficient use of all available data from each
patients might drop out of the study before the end patient in the cohort, survival analysis has been
of the follow-up period, perhaps because of another developed to estimate the survival of a cohort over
illness, a move from the study area, or dissatisfac- time. The usual method is called Kaplan-Meier
tion with the study. These patients would have to be analysis, after its originators. Survival analysis can
excluded from the cohort even though considerable be applied to any outcomes that are dichotomous
effort had been exerted to gather data on them until and occur only once during follow-up (e.g., time to
Chapter 7: Prognosis 99
Number of patients
80 80
60 60
40 40
20 20
0 1 2 3 4 5 0 1 2 3 4 5
Time (years) Time (years)
Figure 7.4 ■ Survival of two cohorts, small and large, when all members are
observed for the full period of follow-up.
coronary event or to recurrence of cancer). A more The probability of surviving to any point in time
general term, useful when an event other than sur- is estimated from the cumulative probability of sur-
vival is described, is time-to-event analysis. viving each of the time intervals that preceded it.
Figure 7.5 shows a simplified survival curve. On Time intervals can be made as small as necessary;
the vertical axis is the estimated probability of surviv- in Kaplan-Meir analyses, the intervals are between
ing, and on the horizontal axis is the period of time each new event, such as death, and the preceding
from the beginning of observation (zero time). one, however short or long that is. Most of the time,
8 at risk
20 5 at risk
Probability of
1 died
survival (%)
3 4 5
Time (years)
100
80
Probability of
survival (%)
60
40
20
0
1 2 3 4 5
Time (years)
Figure 7.5 ■ Example of a survival curve, with detail for one part of the curve.
100 Clinical Epidemiology: The Essentials
no one dies and the probability of surviving is 1. that the estimates on the left-hand side of the curve
When a patient dies, the probability of surviving at are sound, because more patients are at risk early in
that moment is calculated as the ratio of the number follow-up. But on the right-hand side, at the tail of
of patients surviving to the number at risk of dying the curve, the number of patients on whom esti-
at that point in time. Patients who have already mates of survival are based may become relatively
died, dropped out, or have not yet been followed small because deaths, dropouts, and late entrants to
up to that point are not at risk of dying and are, the study, so that fewer and fewer patients are fol-
therefore, not used to estimate survival for that time. lowed for that length of time. As a result, estimates
The probability of surviving does not change during of survival toward the end of the follow-up period
intervals in which no one dies, so it is recalculated are imprecise and can be strongly affected by what
only when there is a death. Although the probability happens to relatively few patients. For example, in
at any given interval is not very accurate, because Figure 7.5, only one patient was under observation
either nothing has happened or there has been only at year 5. If that one remaining patient happened to
one event in a large cohort, the overall probability of die, the probability of surviving would fall from 8%
surviving up to each point in time (the product of to zero. Clearly, this would be a too literal reading
all preceding probabilities) is remarkably accurate. of the data. Therefore, estimates of survival at the
When patients are lost from the study at any point tails of survival curves must be interpreted with
in time, they are referred to as censored and are no caution.
longer counted in the denominator from that point Finally, the shape of many survival curves gives
forward. the impression that outcome events occurs more fre-
A part of the survival curve in Figure 7.5 (from quently early in follow-up than later on, when the
3 to 5 years after zero time) is presented in detail to slope approaches a plateau. But this impression is
illustrate the data used to estimate survival: patients deceptive. As time passes, rates of survival are being
at risk, patients no longer at risk (censored), and applied to a diminishing number of patients, causing
patients experiencing outcome events at each point the slope of the curve to flatten even if the rate of
in time. outcome events did not change.
Variations on basic survival curves increase the As with any estimate, Kaplan-Meier estimates of
amount of information they convey. Including the num- time to event depend on assumptions. It is assumed
bers of patients at risk at various points in time gives that being censored is not related to prognosis. To
some idea of the contribution of chance to the observed the extent that this is not true, a survival analysis
rates, especially toward the end of follow-up. The may yield biased estimates of survival in cohorts. The
vertical axis can show the proportion with, rather Kaplan-Meier method may not be accurate enough
than without, the outcome event; the resulting curve if there are competing risks—more than one kind of
will sweep upward and to the right. The precision of outcome event—and the outcomes are not indepen-
survival estimates, which declines with time because dent of each other such that one event changes the
fewer and fewer patients are still under observation, probability of experiencing the other. For example,
can be shown by confidence intervals at various patients with cancer who develop an infection related
points in time (see Chapter 11). Tics are sometimes to aggressive chemotherapy and drop out for that
added to the survival curves to indicate each time a reason may have had a different chance of dying of
patient is censored. the cancer. There are other methods for estimating
cumulative incidence in the presence of competing
risks.
Interpreting Survival Curves
Several points must be kept in mind when interpret- IDENTIFYING PROGNOSTIC
ing survival curves. First, the vertical axis represents FACTORS
the estimated probability of surviving for members of
the cohort, not the cumulative incidence of surviving Often, studies go beyond a simple description of
if all members of the cohort were followed up. prognosis in a homogeneous group of patients to
Second, points on a survival curve are the best compare prognosis in patients with different char-
estimate, for a given set of data, of the probability acteristics, that is, they identify prognostic factors.
of survival for members of a cohort. However, the Multiple survival curves, one for patients with each
precision of these estimates depends on the number of the characteristics, are represented on the same
of patients on whom the estimate is based, as do all figure where they can be visually (and statistically)
observations of samples. One can be more confident compared.
Chapter 7: Prognosis 101
CASE SERIES
Example A case series is a description of the course of disease
in a small number of cases, a few dozen at most. An
Patients with renal cell carcinoma, like many
even smaller report, with fewer than 10 patients, is
other cancers, have widely different chances of
called a case report. Cases are typically found at a
surviving over the several years after diagnosis.
clinic or referral center and then followed forward in
Prognosis varies according to characteristics of
time to describe the course of disease and backward
the cancer and patient, such as stage (how far
in time to describe what came earlier.
the cancer has spread, from being limited to the
Such reports can make an important contribution
kidney at one extreme to metastases to distant
to understanding of disease, primarily by describing
organs at the other), grade (how abnormal the
experiences with newly defined syndromes or uncom-
cancer cells appear), and performance status
mon conditions. The reason for introducing case
(how well patients are able to care for them-
series into a chapter about prognosis is that they may
selves). A study combined these three charac-
masquerade as true cohort studies even though they
teristics into five prognostic groups (Fig. 7.6) (7).
do not have comparable strengths.
In the most favorable group, more than 90% of
patients were alive at 8 years, whereas in the
least favorable group, all were dead at 3 years.
This information might be especially useful in
helping patients and doctors understand what
lies ahead, and it is much more informative
Example
than simply saying that “overall, 70% of pa- Physicians in emergency departments see
tients with renal cell carcinoma survive 5 years.” patients with bites from North American
rattlesnakes. These bites are relatively uncom-
mon at any one place, making it difficult to
The effects of one prognostic factor relative to the carry out large cohort studies of their clini-
effects of another can be summarized from data in cal course, so physicians must rely mainly on
a time-to-even analysis by a hazard ratio, which is case series. An example is a description of the
analogous to a risk ratio (relative risk). Also, survival clinical course of all 24 children managed at
curves can be compared after taking into account a children’s hospital in California during a
other factors related to prognosis so that the indepen- 10-year period (8). Nineteen of the children
dent effect of just one variable is examined. were actually injected with venom, and they
were managed with the aggressive use of
antivenin. Three had surgical treatment to
remove soft-tissue debris or relieve tissue
pressure. There were no serious reactions to
100 Group 1
antivenin, and all patients left the hospital
80 without functional impairment.
Survival (%)
60 Group 2
40 Group 3
Physicians caring for children with rattlesnake
bites would be grateful for this and other case series
20 Group 4 if there were no better information to guide their
Group 5
care, but that is not to say that the case series pro-
0
vided a complete and fully reliable picture of snake-
0 24 48 72 96
Months bite care. Of all children bitten in that region, some
may have been doing so well after a bite that they
Figure 7.6 ■ Example of prognostic stratification. Sur-
vival from surgery in a patient with renal cell cancer accord-
were not sent to the referral center. Others might
ing to prognostic strata. (Redrawn with permission from have been doing so badly that they were rushed to
Zisman A, Pantuck AJ, Dorey F, et al. Improved prognosti- the nearest hospital or even died before reaching a
fication for renal cell carcinoma using an integrated staging hospital at all. In other words, the case series does
system. J Clin Oncol 2001;19:1649–1657.) not describe the clinical course of all children from
102 Clinical Epidemiology: The Essentials
They are a separate consideration from confounding Sampling bias can also be misleading when prog-
and effect modification (Chapter 5). nosis is compared across groups and sampling has
There are an almost infinite variety of systematic produced groups that are systematically different
errors, many of which are given specific names, but with respect to prognosis, even before the factor of
some are more basic. They can be recognized more interest is considered. In the Bell palsy example, older
easily when one knows where they are most likely to patients might have had a worse prognosis because
occur in the course of a study. With that in mind, we they are the ones who had an underlying herpes virus
describe some possibilities for bias in cohort studies infection, not because of their age.
and discuss them in relation to the following study Is this not confounding? Strictly speaking, it is not
of prognosis. because the study is for the purpose of prediction, not
to identify independent “causes” of recovery. Also, it
is not plausible to consider such phenomena as sever-
ity of palsy or onset of recovery as causes because they
are probably part of the chain of events leading from
Example disease to recovery. But to the extent that one wants
Bell’s palsy is the sudden, one-sided, unex-
to show that prognostic factors are independent pre-
plained onset of weakness of the face in the
dictors of outcome, the same approaches as are used
area innervated by the facial nerve. Often, the
for confounding (Chapter 5) can be used to establish
cause is unknown, although some cases are as-
independence.
sociated with herpes simplex or herpes zoster
infections, diabetes, pregnancy, and a variety Migration Bias
of other conditions. What is the clinical course?
Migration bias is present when some patients drop
Investigators in Denmark followed up 1,701 pa-
tients with Bell palsy once a month until func-
out of the study during follow-up and they are sys-
tion returned or 1 year (11). Recovery reached
tematically different from those who remain. It is
its maximum within 3 weeks in 85% of patients
often the case that some members of the original
and in 3 to 5 months in the remaining 15%. By
cohort leave a study over time. (Patients are assured
that time, 71% of patients had recovered fully;
that this is their right as part of the ethical conduct of
for the rest, the remaining loss of function was
research on humans.) If dropout occurs randomly,
slight in 12%, mild in 13%, and severe in 4%.
such that the characteristics of lost patients are on
Recovery was related to younger age, less ini-
average similar to patients who remain, then there
tial palsy, and earlier onset of recovery.
would be no bias. This is so regardless of whether
the number of dropouts is large or similar in the
cohorts being compared, but ordinarily the charac-
teristics of lost patients are not the same as those
When thinking about the validity of this study, who remain in a study. Dropping out tends to be
one should consider at least the following. related to prognosis. For example, patients who are
doing especially well or badly with their Bell palsy
may be more likely to leave the study, as would those
Sampling Bias
who need care for other illnesses, for whom the extra
Sampling bias has occurred when the patients in visits related to the study would be burdensome.
a study are not like other patients with the condi- This would distort the main (descriptive) results of
tion. Were patients in this study like others with Bell the study—rate and completeness of recovery. If the
palsy? The answer depends on the user’s perspec- study also aims to identify prognostic factors (e.g.,
tive. Patients were “from the Copenhagen area” and recovery in old versus young patients), that also
apparently under the care of an ear, nose, and throat could be biased by patients dropping out, for the
specialist, so the results generalize to other referred same reasons.
patients (as long as we accept that Bell palsy in Den- Migration bias might be seen as an example of
mark is similar to this condition in other parts of the selection bias because patients who were still in the
world). However, mild cases might not have been study when outcomes were measured were selected
included because they were managed by local clini- from all those who began in the study. Migration bias
cians and quickly recovered or not brought to medi- might be considered an example of measurement bias
cal attention at all, which limits the applicability in because patients who migrate out of the study are no
primary care settings. longer available when outcomes are measured.
104 Clinical Epidemiology: The Essentials
Measurement Bias
cigarette products in saliva. Yet in a cohort
Measurement bias is present when members of the
study of cigarette smoking and coronary heart
cohort are not all assessed similarly for outcome. In the disease (CHD), misclassification of smoking
Bell palsy study, all members of the cohort were exam- could not be different in people who did or
ined by a common protocol every month until they did not develop CHD because the outcome
were no longer improving, ruling out this possibility. was not known at the time exposure was as-
If it had been left to individual patients and physicians sessed. Even so, to the extent that smoking
whether, when, and how they were examined, this is incorrectly classified, it reduces whatever
would have diminished confidence in the description differences in CHD rates in smokers and non-
of time to and completeness of recovery. Measurement smokers that might have existed if all patients
bias also comes into play if prognostic groups are com- had been correctly classified, making a “null”
pared and patients in one group have a systematically effect more likely. At the extreme, if classifying
better chance of having outcomes detected than those smoking status were totally at random, there
in another. Some outcomes, such as death, cardiovas- could be no association between smoking and
cular catastrophes, and major cancers, are so obvious CHD.
that they are unlikely to be missed. But for less clear-
cut outcomes, including specific cause of death, sub-
clinical disease, side effects, or disability, measurement
bias can occur because of differences in the methods
with which the outcome is sought or classified. BIAS, PERHAPS, BUT DOES
Measurement bias can be minimized in three gen- IT MATTER?
eral ways: (i) examine all members of the cohort equally
for outcome events; (ii) if comparisons of prognostic Clinical epidemiology is not an error-finding game.
groups are made, ensure that researchers are unaware of Rather, it is meant to characterize the credibility of
the group to which each patient belongs; and (iii) set a study so that clinicians can decide how much to
up careful rules for deciding if an outcome event has rely on its results when making high-stakes decisions
occurred (and follow the rules). To help readers under- about patients. It would be irresponsible to ignore
stand the extent of these kinds of biases in a given study, results of studies that meet high standards, just as
it is usual practice to include, with reports of the study, clinical decisions need not be bound by the results of
a flow diagram describing how the number of partici- weak studies.
pants changed as the study progressed and why. It is With this in mind, it is not enough to recognize
also helpful to compare the characteristics of patients that bias might be present in a study. One must go on
in and out of the study after sampling and follow-up. to determine if bias is actually present in the particu-
lar study. Beyond that, one must decide whether the
Bias from “Non-differential” consequences of bias are sufficiently large that they
Misclassification change the conclusions in a clinically important way.
If damage to the study’s conclusions is not very great,
Until now, we have been discussing how the results of then the presence of bias is of little practical conse-
a study can be biased when there are systematic differ- quence and the study is still useful.
ences in how exposure or disease groups are classified.
But bias can also result if misclassification is “non-
differential,” that is, it occurs similarly in the groups SENSITIVITY ANALYSIS
being compared. In this case, the bias is toward find-
ing no effect. One way to decide how much bias might change the
conclusions of a study is to do a sensitivity anal-
ysis, that is, to show how much larger or smaller
the observed results might have been under various
Example assumptions about the missing data or potentially
biased measurements. A best-case/worst-case
When cigarette smoking is assessed by simply analysis tests the effects of the most extreme possible
asking people whether they smoke, there is assumptions but is an unreasonably severe test for the
substantial misclassification relative to a gold effects of bias in most situations. More often, sensi-
standard such as the presence or absence of tivity analyses test the effects of somewhat unlikely
values, as in the following example.
Chapter 7: Prognosis 105
Review Questions
Read the following and select the best response. of the children were no longer in the study
when outcome (a second seizure) was assessed
7.1. For a study of the risk of esophageal cancer in at 1 year. Which of the following would have
patients with Barrett esophagus (a precancer- the greatest effect on study results?
ous lesion), which of the following times in
A. Why the children dropped out
the course of disease is the best example of
B. When in the course of follow-up the
zero time?
children left the study
A. Diagnosis of Barrett esophagus for each C. Whether dropping out was related to
patient prognosis
B. Death of each patient D. Whether the number of children dropping
C. Diagnosis of esophageal cancer for each out is similar in the groups
patient
D. Calendar time when the first patient is 7.3. A cohort study of prostate cancer care com-
enrolled in the study pares rates of incontinence in patients who
E. Calendar time when no patient remains in were treated with surgery versus medical care
the study alone. Incontinence is assessed from review
of medical records. Which of the following is
7.2. A cohort study describes the recurrence of not an example of measurement bias?
seizure within 1 year in children hospital-
A. Men were more likely to tell their surgeon
ized with a first febrile seizure. It compared
about incontinence.
recurrence in children who had infection
B. Surgeons were less likely to record
versus immunization as an underlying cause
complications of their surgery in the record.
for fever at the time of the first seizure. Some
106 Clinical Epidemiology: The Essentials
C. Men who got surgery were more likely to 7.7. Which of the following kinds of studies
have follow-up visits. cannot be used to identify prognostic factors?
D. Chart abstractors used their judgment
A. Prevalence study
is deciding whether incontinence was
B. Time-to-event analysis
present or not.
C. Case-control study
E. Rates of incontinence were higher in the
D. Cohort study
men who got surgery.
7.8. Which of the following best describes the
7.4. A clinical prediction rule has been developed
information in a survival curve?
to classify the prognosis of community-
acquired pneumonia. Which of the following A. An unbiased estimate of survival even if
is most characteristic of such a clinical predic- some patients leave the study
tion rule? B. The estimated probability of survival from
zero time
A. Calculating a score is simple.
C. The proportion of a cohort still alive at
B. The clinical data are readily available.
the end of follow-up
C. Multiple prognostic factors are included.
D. The rate at which original members of the
D. The results are used to guide further
cohort leave the study
management of the patient.
E. The cumulative survival of a cohort over
E. All of the above.
time
7.5. A study describes the clinical course of
7.9. Which of the following is the most appropri-
patients who have an uncommon neurologic
ate sample for a study of prognosis?
disease. Patients are identified at a referral
center that specializes in this disease. Their A. Members of the general population
medical records are reviewed for patients’ B. Patients in primary care in the community
characteristics, treatments, and their current C. Patients admitted to a community
disease status. Which of the following best hospital
describes this kind of study? D. Patients referred to a specialist
E. It depends on who will use the results of
A. Cohort study
the study.
B. Case-control study
C. Case series
7.10. Investigators wish to describe the clinical
D. Cross-sectional study
course of multiple sclerosis. They take advan-
E. A randomized controlled trial
tage of a clinical trial, already completed, in
which control patients received usual care.
7.6. A study used time-to-event analysis to
Patients in the trial had been identified at
describe the survival from diagnosis of 100
referral centers, had been enrolled at the time
patients with congestive heart failure. By the
of diagnosis, and had met rigorous entry
third year, 60 patients have been censored.
criteria. After 10 years, all patients had been
Which of the following would not be a reason
examined yearly and remained under observa-
for one of these patients being censored?
tion, and 40% were still able to walk. Which
A. The patient died of another cause before of the following most limits the credibility of
year 3. this study?
B. The patient decided not to continue in the
A. Inconsistent zero time
study.
B. Generalizability
C. The patient developed another disease that
C. Measurement bias
could be fatal.
D. Migration bias
D. The patient had been enrolled in the study
E. Failure to use time-to-event methods
for less than 3 years.
Chapter 7: Prognosis 107
7.11. Many different clinical prediction rules A. In this study, it is the rate of continued
have been developed to assess the severity of smoking divided by the rate of quitting.
community-acquired pneumonia. Which of B. It can be estimated from a case-control
the following is the most important reason study of smoking and amputation.
for choosing one to use? C. It cannot be adjusted for the presence
or absence of other factors related to
A. The prediction rule classifies patients into
prognosis.
groups with very different prognosis.
D. It conveys information similar to relative
B. The prediction rule has been validated in
risk.
different settings.
E. It is calculated from the cumulative
C. Many variables are included in the rule.
incidence of amputation in smokers and
D. Prognostic factors include state-of-the-
quitters.
science diagnostic tests.
E. The score is calculated using computers.
7.13. In a time-to-event analyses, the event:
7.12. In a study of patients who smoke and A. Can occur only once
developed peripheral arterial disease, the B. Is dichotomous
hazard ratio for amputation in patients who C. Both A and B
continued to smoke, relative to those who D. Neither A nor B
quit smoking, is 5. Which of the following
best characterizes the hazard ratio? Answers are in Appendix A.
REFERENCES
1. Canto JG, Kiefe CI, Rogers WJ, et al. Number if coronary cer. The Will Rogers phenomenon revisited. Arch Intern Med
heart disease risk factors and mortality in patients with first 2008;168:1541–1549.
myocardial infarction. JAMA 2011;306:2120–2127. 7. Zisman A, Pantuck AJ, Dorey F, et al. Improved prognosti-
2. Talley NJ, Zinsmeister AR, Van Dyke C, et al. Epidemiol- fication for renal cell carcinoma using an integrated staging
ogy of colonic symptoms and the irritable bowel syndrome. system. J Clin Oncol 2001;19:1649–1657.
Gastroenterology 1991;101:927–934. 8. Shaw BA, Hosalkar HS. Rattlesnake bites in children: anti-
3. Ford AC, Forman D, Bailey AG, et al. Irritable bowel syn- venin treatment and surgical indications. J Bone Joint Surg
drome: A 10-year natural history of symptoms and factors that Am 2002;84-A(9):1624.
influence consultation behavior. Am J Gastroenterol 2007; 9. Gage BF, Waterman AD, Shannon W, et al. Validation of
103:1229–1239. clinical classification schemes for predicting stroke. Results
4. Evers IM, de Valk HW, Visser GHA. Risk of complications from the National Registry of Atrial Fibrillation. JAMA 2001;
of pregnancy in women with type 1 diabetes: Nationwide 285:2864–2870.
prospective study in the Netherlands. BMJ 2004;328:915– 10. Go AS, Hylek EM, Chang Y, et al. Anticoagulation therapy for
918. stroke prevention in atrial fibrillation. How well do randomized
5. Feinstein AR, Sosin DM, Wells CK. The Will Rogers phenom- trials translate into clinical practice? JAMA 2003;290:2685–2692.
enon: stage migration and new diagnostic techniques as a source 11. Peitersen E. Bell’s palsy: the spontaneous course of 2,500
of misleading statistics for survival in cancer. N Engl J Med peripheral facial nerve palsies of different etiologies. Acta
1985;312:1604–1608. Otolaryngol 2002;(549):4–30.
6. Chee KG, Nguyen DV, Brown M, et al. Positron emission 12. Ramlow J, Alexander M, LaPorte R, et al. Epidemiology of
tomography and improved survival in patients with lung can- post-polio syndrome. Am J Epidemiol 1992;136:769–785.
Chapter 8
Diagnosis
Appearances to the mind are of four kinds. Things either are what they appear to be; or
they neither are, nor appear to be; or they are, and do not appear to be; or they are not,
yet appear to be. Rightly to aim in all these cases is the wise man’s task.
—Epictetus†
2nd century A.D.
DISEASE
Example Present Absent
Because it is almost always more costly, more without requiring further testing on people with
dangerous, or both to use more accurate ways of negative test results. (See the following text for defini-
establishing the truth, clinicians and patients prefer tions of sensitivity and specificity.)
simpler tests to the rigorous gold standard, at least
initially. Chest x-rays and sputum smears are used to Lack of Information on Test
determine the cause of pneumonia, rather than bron- Results in the Nondiseased
choscopy and lung biopsy for examination of the dis-
eased lung tissue. Electrocardiograms and blood tests Some types of tests are commonly abnormal in peo-
are used first to investigate the possibility of acute ple without disease or complaints. When this is so,
myocardial infarction, rather than catheterization the test’s performance can be grossly misleading when
or imaging procedures. The simpler tests are used as the test is applied to patients with the condition or
proxies for more elaborate but more accurate or pre- complaint.
cise ways of establishing the presence of disease, with
the understanding that some risk of misclassification
results. This risk is justified by the safety and conve-
nience of the simpler tests. But simpler tests are only Example
useful when the risks of misclassification are known
and are acceptably low. This requires a sound com- Magnetic resonance imaging (MRI) of the lum-
parison of their accuracy to an appropriate standard. bar spine is used in the evaluation of patients
with low back pain. Many patients with back
pain show herniated intervertebral discs on MRI,
Lack of Information
and the pain is often attributed to the finding.
on Negative Tests But, how often are vertebral disc abnormalities
The goal of all clinical studies aimed at describing found in people who do not have back pain?
the value of diagnostic tests should be to obtain data Several studies done on subjects without a his-
for all four of the cells shown in Figure 8.1. With- tory of back pain or sciatica have found her-
out all these data, it is not possible to fully evaluate niated discs in 22% to 58% and bulging discs
the accuracy of the test. Most information about the in 24% to 79% of asymptomatic subjects with
value of a diagnostic test is obtained from clinical, not mean ages of 35 to more than 60 years old (3).
research, settings. Under these circumstances, physi- In other words, vertebral disc abnormalities are
cians are using the test in the care of patients. Because common and may be a coincidental finding in a
of ethical concerns, they usually do not feel justified patient with back pain.
in proceeding with more exhaustive evaluation when
preliminary diagnostic tests are negative. They are
naturally reluctant to initiate an aggressive workup, Lack of Objective Standards
with its associated risk and expense, unless prelimi-
nary tests are positive. As a result, data on the number
for Disease
of true negatives and false negatives generated by a For some conditions, there are simply no hard-and-
test (cells c and d in Fig. 8.1) tend to be much less fast criteria for diagnosis. Angina pectoris is one of
complete in the medical literature than data collected these. The clinical manifestations were described
about positive test results. nearly a century ago, yet there is still no better way
This problem can arise in studies of screening tests to substantiate the presence of angina pectoris than a
because individuals with negative tests usually are not carefully taken history. Certainly, a great many objec-
subjected to further testing, especially if the testing tively measurable phenomena are related to this clini-
involves invasive procedures such as biopsies. One cal syndrome, for example, the presence of coronary
method that can get around this problem is to make artery stenosis on angiography, delayed perfusion on
use of stored blood or tissue banks. An investigation a thallium stress test, and characteristic abnormalities
of prostate-specific antigen (PSA) testing for prostate on electrocardiograms both at rest and with exercise.
cancer examined stored blood from men who subse- All are more commonly found in patients believed to
quently developed prostate cancer and men who did have angina pectoris, but none is so closely tied to the
not develop prostate cancer (2). The results showed clinical syndrome that it can serve as the standard by
that for a PSA level of 4.0 ng/mL, sensitivity over which the condition is considered present or absent.
the subsequent 4 years was 73% and specificity was Other examples of medical conditions difficult to
91%. The investigators were able to fill in all four cells diagnose because of the lack of simple gold standard
Chapter 8: Diagnosis 111
DISEASE
Present Absent
a
Positive a b +PV =
a+b
TEST
Negative c d d
–PV =
c+d
a d a+c
Se = Sp = P=
a+c b+d a+b+c+d
a c Se = Sensitivity
a+c a+c Sp = Specificity
LR+ = LR– = P = Prevalence
b d LR = Likelihood ratio
b+d b+d PV = Predictive value
DISEASE
DVT according to gold standard
(Compression ultrasonography
and/or 3-month follow-up)
Present Absent
55
Positive 55 198 +PV =
253
= 22%
TEST
D-dimer
assay for
diagnosis
of DVT Negative 1 302 302
–PV = = 100%
303
55 302 56
Se = = 98% Sp = = 60% P= = 10%
56 500 556
Se = Sensitivity
55 1 Sp = Specificity
55 + 1 55 + 1 P = Prevalence
LR+ = = 2.5 LR– = = 0.03
198 302 LR = Likelihood ratio
198 + 302 302 + 198 PV = Predictive value
Specificity (%)
80 60 40 20 0
100
50
80
100
125
80 150 Cutoff points 20
(pg/mL)
(True-positive rate)
1-Sensitivity (%)
Sensitivity (%)
60 40
40 60
20 80
0
20 40 60 80 100
1-Specificity (%)
(False-positive rate)
Figure 8.4 ■ A receiver operator characteristic (ROC) curve. The accuracy of
B-type natriuretic peptide (BNP) in the emergency diagnosis of heart failure with vari-
ous cutoff levels of BNP between dyspnea due to congestive heart failure and other
causes. (Adapted with permission from Maisel AS, Krishnaswamy P, Nowak RM, et al.
Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart
failure. N Engl J Med 2002;347:161–167.)
100
BNP
50
80 EF
100
80
125 60
150
55
50
Sensitivity (%)
60
45
40
40 35
20
0
0 20 40 60 80 100
1-Specificity (%)
Figure 8.5 ■ ROC curves for the BNP and left ventricular ejection fractions
by echocardiograms in the emergency diagnosis of congestive heart fail-
ure in patients presenting with acute dyspnea. Overall, BNP is more sensi-
tive and more specific than ejection fractions, resulting in more area under the
curve. (Redrawn with permission from Steg PG, Joubin L, McCord J, et al. B-type
natriuretic peptide and echocardiographic determination of ejection fraction in
the diagnosis of congestive heart failure in patients with acute dyspnea. Chest
2005;128:21–29.)
under the curve (0.89) than that for ejection fraction wanting later when more experience with it has accu-
(0.78). It is also easier and faster to obtain in an emer- mulated. Initial enthusiasm followed by disappoint-
gency setting—test characteristics important in clini- ment arises not from any dishonesty on the part of
cal situations when quick results are needed. early investigators or unfair skepticism by the medical
Obviously, tests that are both sensitive and specific community later. Rather, it is related to limitations in
are highly sought after and can be of enormous value. the methods by which the properties of the test were
However, practitioners must frequently work with established in the first place. Sensitivity and specificity
tests that are not both highly sensitive and specific. In may be inaccurately described because an improper
these instances, they must use other means of circum- gold standard has been chosen, as discussed earlier in
venting the trade-off between sensitivity and specific- this chapter. In addition, two other issues related to
ity. The most common way is to use the results of the selection of diseased and nondiseased patients can
several tests together, as discussed later in this chapter. profoundly affect the determination of sensitivity and
specificity as well. They are the spectrum of patients
ESTABLISHING SENSITIVITY to which the test is applied and bias in judging the
AND SPECIFICITY test’s performance. Statistical uncertainty, related to
studying a relatively small number of patients, also
Often, a new diagnostic test is described in glow- can lead to inaccurate estimates of sensitivity and
ing terms when first introduced, only to be found specificity.
116 Clinical Epidemiology: The Essentials
Bias
Some people in whom disease is suspected may The sensitivity and specificity of a test should be estab-
have other conditions that cause a positive test, lished independently of the means by which the true
thereby increasing the false-positive rate and decreas- diagnosis is established. Otherwise, there could be a
ing specificity. In the example of guidelines for ovar- biased assessment of the test’s properties. As already
ian cancer evaluation, specificity was low for all cancer pointed out, if the test is evaluated using data obtained
stages (60%). One reason for this is that levels of the during the course of a clinical evaluation of patients
cancer marker, CA-125, recommended by guide- suspected of having the disease in question, a positive
lines, are elevated by many diseases and conditions test may prompt the clinician to continue pursuing
Chapter 8: Diagnosis 117
Sensitivity (%)
intervals
a test are being assessed, the test result should not be 60
part of the information used to establish the diag-
nosis. In studying DVT diagnosis by D-dimer assay
(discussed earlier), the investigators made sure that 40
the physicians performing the gold standard tests
(ultrasonography and follow-up assessment) were 20
unaware of the results of the D-dimer assays so that
the results of the D-dimer assays could not influence
(bias) the interpretation of ultrasonography (10). 0
In the course of routine clinical care, this kind of 0 10 20 30 40 50
bias can be used to advantage, especially if the test Number of people observed
result is subjectively interpreted. Many radiologic Figure 8.6 ■ The precision of an estimate of sensitivity.
imaging interpretations are subjective, and it is easy The 95% confidence interval for an observed sensitivity of
to be influenced by the clinical information pro- 75%, according to the number of people observed.
vided. All clinicians have experienced having imaging
studies overread because of a clinical impression or,
of sensitivity and specificity. Therefore, reported val-
conversely, of going back over old studies in which
ues for sensitivity and specificity should not be taken
a finding was missed because a clinical fact was not
too literally if a small number of patients is studied.
communicated at the time and, therefore, attention
Figure 8.6 shows how the precision of estimates
was not directed to a particular area. Both to mini-
of sensitivity increases as the number of people on
mize and to take advantage of these biases, some
which the estimate is based increases. In this particu-
radiologists prefer to read imaging studies twice, first
lar example, the observed sensitivity of the diagnostic
without, then with the clinical information.
test is 75%. Figure 8.6 shows that if this estimate is
All the biases discussed tend to increase the agree-
based on only 10 patients, by chance alone, the true
ment between the test and the gold standard. That is,
sensitivity could be as low as 45% and as high as
they tend to make the test seem more accurate than
nearly 100%. When more patients are studied, the
it actually is.
95% confidence interval narrows and the precision of
the estimate increases.
Chance
Values for sensitivity and specificity are usually esti- PREDICTIVE VALUE
mated from observations on relatively small samples
of people with and without the disease of interest. Sensitivity and specificity are properties of a test that
Because of chance (random variation) in any one should be taken into account when deciding whether
sample, particularly if it is small, the true sensitiv- to use the test. However, once the results of a diag-
ity and specificity of the test can be misrepresented, nostic test are available, whether positive or negative,
even if there is no bias in the study. The particular the sensitivity and specificity of the test are no longer
values observed are compatible with a range of true relevant because these values are obtained in persons
values, typically characterized by the “95% confi- known to have or not to have the disease. But if one
dence intervals” (see Chapter 12).† The width of this knew the disease status of the patient, it would not
range of values defines the precision of the estimates be necessary to order the test! For the clinician, the
dilemma is to determine if the patient has the dis-
†
ease, given the results of a test. (In fact, clinicians are
The 95% confidence interval of a proportion is easily estimated by usually more concerned with this question than the
the following formula, based on the binomial theorem:
sensitivity and specificity of the test.)
p (1 − p )
p ± 2
N
Definitions
where p is the observed proportion and N is the number of people The probability of disease, given the results of a test, is
observed. To be more exact, multiply by 1.96 instead of 2. called the predictive value of the test (see Fig. 8.2).
118 Clinical Epidemiology: The Essentials
45-year-old asymptomatic
man with no risk factors
45-year-old asymptomatic
man with hypercholesterolemia, 55-year-old man
hypertension, and diabetes with typical angina
100
80
Posttest probability of CAD (%)
Positive
stress test
60
40
Negative
stress test
20
0
20 40 60 80 100
the intensity of their diagnostic evaluations may need These terms should be familiar to most readers
to be adjusted to suit the specific setting. because they are used in everyday conversation. For
example, we may say that the odds are 4:1 that the
Implications for Interpreting New England Patriots football team will win tonight
the Medical Literature or that they have an 80% probability of winning.
C +
Figure 8.10 ■ Parallel and serial testing. In parallel testing, all tests are done at the
same time. In serial testing, each subsequent test is done only when the previous test
result is positive.
When multiple different tests are performed and referral centers seem to diagnose disease that local
all are positive or all are negative, the interpretation physicians miss.) However, false-positive diagnoses
is straightforward. All too often, however, some are are also more likely to be made (thus, the propen-
positive and others are negative. Interpretation is sity for overdiagnosing in such centers as well). In
then more complicated. This section discusses the summary, parallel testing is particularly useful when
principles by which multiple tests are applied and the clinician is faced with the need for a very sen-
interpreted. sitive testing strategy but has available only two or
Multiple diagnostic tests can be applied in two more relatively insensitive tests that measure different
basic ways (Fig. 8.10). They can be used in parallel clinical phenomena. By using the tests in parallel, the
testing (i.e., all at once), and a positive result of any net effect is a more sensitive diagnostic strategy. The
test is considered evidence for disease. Or they can price, however, is further evaluation or treatment of
be done in serial testing (i.e., consecutively), with some patients without the disease.
the decision to order the next test in the series based
on the results of the previous test. For serial testing,
all tests must give a positive result in order for the
diagnosis to be made because the diagnostic process Example
is stopped with a negative result.
Neither of two tests used for diagnosing ovar-
Parallel Testing ian cancer, CA-125 and transvaginal ultrasound,
is a sensitive test when used alone. A study
Physicians usually order tests in parallel when rapid was done in which 28,506 women underwent
assessment is necessary, as in hospitalized or emer- both tests in a trial of ovarian cancer screening
gency patients, or for ambulatory patients who can- (15). If either test result was abnormal (a par-
not return easily because they are not mobile or have allel testing strategy), women were referred
come from a long distance for evaluation. for further evaluation. Positive predictive val-
Multiple tests in parallel generally increase the ues were determined for each test individually
sensitivity and, therefore, the negative predictive as well as when both tests were abnormal (a
value for a given disease prevalence, above those of serial testing strategy). The authors did not cal-
each individual test. On the other hand, specific- culate sensitivities and specificities of the tests;
ity and positive predictive value are lower than for to do so, they would have needed to include
each individual test. That is, disease is less likely to be any cases of ovarian cancer that occurred in the
missed. (Parallel testing is probably one reason why
Chapter 8: Diagnosis 127
Table 8.4
Test Characteristics of CA-125 and Transvaginal Ultrasound (TVU)
Table 8.5 ends up surer that positive test results represent disease
An Example of a Diagnostic Decision- but runs an increased risk that disease will be missed.
Making Rule: Predictors of Group A Serial testing is particularly useful when none of the
Streptococcus (GAS) Pharyngitis. individual tests available to a clinician is highly specific.
Modified Centor Score Physicians most often use serial testing strategies in
clinical situations where rapid assessment of patients
Criteria Points is not required, such as in office practices and hospi-
tal clinics in which ambulatory patients are followed
Temperature >38°C 1
(100.4°F)
over time. (In acute care settings, time is the enemy
of a serial testing approach.) Serial testing is also used
Absence of cough 1 when some of the tests are expensive or risky, with
Swollen, tender anterior 1 these tests being employed only after simpler and safer
cervical nodes tests suggest the presence of disease. For example,
Tonsillar swelling or 1 maternal age, blood tests, and ultrasound are used
exudate to identify pregnancies at higher risk of delivering a
Age 1 baby with Down syndrome. Mothers found to be at
high risk by those tests are then offered chorionic vil-
3–14 years 1
lus sampling or amniocentesis, both of which entail
15–44 years 0 risk of fetal loss. Serial testing leads to less laboratory
≥45 years –1 use than parallel testing because additional evaluation
is contingent on prior test results. However, serial
Probability of a Positive
Score Culture for GAS (%)
testing takes more time because additional tests are
ordered only after the results of previous ones become
≤0 1–2.5 available. Usually, the test that is less risky, less inva-
1 5–10 sive, easier to do, and cheaper should be done first. If
2 11–17 these factors are not in play, performing the test with
the highest specificity is usually more efficient, requir-
3 28–35
ing fewer patients to undergo both tests.
≥4 51–53
Adapted with permission from McIsaac WJ, Kellner JD, Aufricht P, Serial Likelihood Ratios
et al. Empirical validation of guidelines for the management of
pharyngitis in children and adults. JAMA 2004;291:1587–1595. When a series of tests is used, an overall probability
can be calculated using the likelihood ratio for each
test result, as shown in Figure 8.11. The prevalence
Serial Testing of disease before testing is first converted to pretest
Serial testing maximizes specificity and positive pre- odds. As each test is done, the posttest odds of one
dictive value but lowers sensitivity and the negative become the pretest odds for the next. In the end, a
predictive value, as demonstrated in Table 8.4. One new probability of disease is found that takes into
PRETEST PROBABILITY
POSTTEST PROBABILITY
Figure 8.11 ■ Use of likelihood ratios in serial testing. As each test is completed, its posttest
odds become the pretest odds for the subsequent test.
Chapter 8: Diagnosis 129
account the information contributed by all the tests test included in the decision rule contributed inde-
in the series. pendently to the diagnosis. To the degree that the two
tests are not contributing independent information,
multiple testing is less useful. For example, if two tests
Assumption of Independence
are used in parallel with 60% and 80% sensitivities,
When multiple tests are used, the accuracy of the and the better test identifies all the cases found by the
final result depends on whether the additional infor- less sensitive test, the combined sensitivity cannot be
mation contributed by each test is somewhat inde- higher than 80%. If they are completely independent
pendent of that already available from the preceding of each other, then the sensitivity of parallel testing
ones so that the next test does not simply duplicate would be 92% (80% + [60% × 20%]).
known information. For example, in the diagnosis of The premise of independence underlies the entire
endocarditis, it is likely that fever (an indication of approach to the use of multiple tests. However, it
inflammation), a new heart murmur (an indication seems unlikely that tests for most diseases are fully
of valve destruction), and Osler nodes (an indication independent of each other. If the assumption that the
of emboli) each add independent, useful information. tests are completely independent is wrong, calcula-
In the example of pharyngitis, the investigators used tion of the probability of disease from several tests
statistical techniques to ensure that each diagnostic would tend to overestimate the tests’ value.
Review Questions
Questions 8.1–8.10 are based on the following 8.4. If the doctor thought the patient did not
example. For each question, select the best have sinusitis because the patient did not
answer. have facial pain, for what percent of patients
A study was made of symptoms and physical findings in would she be correct?
247 patients evaluated for sinusitis. The final diagnosis A. 38%
was made according to x-ray findings (gold standard) B. 48%
(17). Ninety-five patients had sinusitis, and 49 of them C. 52%
also had facial pain. One hundred fifty-two did not have D. 61%
sinusitis, and 79 of these patients had facial pain.
8.5. How common was sinusitis in this study?
8.1. What is the sensitivity of facial pain for
sinusitis in this study? A. 38%
B. 48%
A. 38% C. 52%
B. 48% D. 61%
C. 52%
D. 61% 8.6. What is the posttest probability of sinusitis in
8.2. What is the specificity? patients with facial pain in this study?
A. 38% A. 38%
B. 48% B. 48%
C. 52% C. 52%
D. 61% D. 61%
Both the positive and negative likelihood ratios for
8.3. If the doctor thought the patient had sinusitis facial pain were 1.0. When clinicians asked several
because the patient had facial pain, for what other questions about patient symptoms and took
percent of patients would she be correct? physical examination results into account, the likeli-
A. 38% hood ratios for their overall clinical impressions as to
B. 48% whether patients had sinusitis were as follows: “high
C. 52% probability,” LR 4.7; “intermediate probability,” LR
D. 61% 1.4, and “low probability,” LR 0.4.
130 Clinical Epidemiology: The Essentials
8.7. What is the probability of sinusitis in patients C. Predictive value of facial pain will be
assigned a “high probability” by clinicians? higher in a clinical setting in which the
prevalence of sinusitis is 10%.
A. ∼10%
B. ∼20%
C. ∼45% For questions 8.11 and 8.12, select the best
D. ∼75% answer.
E. ∼90%
8.11. Which of the following statements is most
8.8. What is the probability of sinusitis in patients
assigned an “intermediate probability” by correct?
clinicians? A. Using diagnostic tests in parallel increases
A. ∼10% specificity and lowers positive predictive
B. ∼20% value.
C. ∼45% B. Using diagnostic tests in series increases
D. ∼75% sensitivity and lowers positive predictive
E. ∼90% value.
C. Using diagnostic tests in parallel increases
8.9. What is the probability of sinusitis in patients sensitivity and positive predictive value.
assigned a “low probability” by clinicians? D. Using diagnostic tests in series increases
specificity and positive predictive value.
A. ∼10%
B. ∼20% 8.12. Which of the following statements is most
C. ∼45% correct?
D. ∼75%
E. ∼90% A. When using diagnostic tests in parallel
or series, each test should contribute
8.10. Given the answers for questions 8.7–8.9, information independently.
which of the following statements is incorrect? B. When using diagnostic tests in series, the
test with the lowest sensitivity should be
A. A clinical impression of “high probability” of used first.
sinusitis is more useful in the management C. When using diagnostic tests in parallel,
of patients than one of “low probability.” the test with the highest specificity should
B. A clinical impression of “intermediate be used first.
probability’ is approximately equivalent to
a coin toss. Answers are in Appendix A.
REFERENCES
1. Chobanian AV, Bakris GL, Black HR, et al. The Seventh 6. Maisel AS, Krishnaswamy P, Nowak RM, et al. Rapid mea-
Report of the Joint National Committee on Prevention, surement of B-type natriuretic peptide in the emergency diag-
Detection, Evaluation, and Treatment of High Blood Pressure: nosis of heart failure. N Engl J Med 2002;347:161–167.
The JNC 7 Report. JAMA 2003:289:2560–2571. 7. Steg PG, Joubin L, McCord J, et al. B-type natriuretic peptide
2. Gann PH, Hennekens CH, Stampfer MJ. A prospective evalu- and echocardiographic determination of ejection fraction in
ation of plasma prostate-specific antigen for detection of pros- the diagnosis of congestive heart failure in patients with acute
tatic cancer. JAMA 1995;273:289–294. dyspnea. Chest 2005;128:21–29.
3. Wheeler SG, Wipf JE, Staiger TO, et al. Approach to the diag- 8. Dearking AC, Aletti GD, McGree ME, et al. How relevant
nosis and evaluation of low back pain in adults. In: Basow DS, are ACOG and SGO guidelines for referral of adnexal mass?
ed. UpToDate. Waltham, MA; UpToDate; 2011. Obstet Gynecol 2007;110:841–848.
4. Pickhardt PJ, Choi JR, Hwang I, et al. Computed tomo- 9. Bobo JK, Lee NC, Thames SF. Findings from 752,081 clini-
graphic virtual colonoscopy to screen for colorectal neoplasia cal breast examinations reported to a national screening pro-
in asymptomatic adults. N Engl J Med 2003;349:2191–2200. gram from 1995 through 1998. J Natl Cancer Inst 2000;92:
5. Bates SM, Kearon C, Crowther M, et al. A diagnostic strat- 971–976.
egy involving a quantitative latex D-dimer assay reliably 10. Bates SM, Grand’Maison A, Johnston M, et al. A latex
excludes deep venous thrombosis. Ann Intern Med 2003;138: D-dimer reliably excludes venous thromboembolism. Arch
787–794. Intern Med 2001;161:447–453.
Chapter 8: Diagnosis 131
11. Wells PS, Owen C, Doucette S, et al. Does this patient have 15. Buys SS, Partridge E, Greene M, et al. Ovarian cancer screen-
deep vein thrombosis? The rational clinical examination. ing in the Prostate, Lung, Colorectal and Ovarian (PLCO)
JAMA 2006;295:199–207. cancer screening trial: findings from the initial screen of a
12. Garber AM, Hlatky MA. Stress testing for the diagnosis of cor- randomized trial. Am J Obstet Gynecol 2005;193:1630–
onary heart disease. In Basow DS, ed. UpToDate. Waltham, 1639.
MA; UpToDate; 2012. 16. McIsaac WJ, Kellner JD, Aufricht P, et al. Empirical validation
13. Heffner JE, Sahn SA, Brown LK. Multilevel likelihood ratios of guidelines for the management of pharyngitis in children
for identifying exudative pleural effusions. Chest 2002;121: and adults. JAMA 2004;291:1587–1595.
1916–1920. 17. Williams JW, Simel DL, Roberts L, et al. Clinical evaluation
14. McGee S. Simplifying likelihood ratios. J Gen Intern Med 2002; for sinusitis. Making the diagnosis by history and physical
17:646–649. examination. Ann Intern Med 1992;117:705–710.
Chapter 9
Treatment
Treatments should be given “not because they ought to work, but because they do work.”
—L.H. Opie
1980
KEY WORDS After the nature of a patient’s illness has been estab-
lished and its expected course predicted, the next
Hypotheses Masking question is, what can be done about it? Is there a
Treatment Allocation treatment that improves the outcome of disease? This
Intervention concealment chapter describes the evidence used to decide whether
Comparative Single-blind a well-intentioned treatment is actually effective.
effectiveness Double-blind
Experimental studies Open label IDEAS AND EVIDENCE
Clinical trials Composite outcomes
Randomized Health-related quality The discovery of effective new treatments requires
controlled trials of life both rich sources of promising possibilities and rigor-
Equipoise Health status ous ways of establishing that the treatments are, in
Inclusion criteria Efficacy trials fact, effective (Fig. 9.1).
Exclusion criteria Effectiveness trials
Comorbidity Intention-to-treat Ideas
Large simple trials analysis
Ideas about what might be a useful treatment arise
Practical clinical trials Explanatory analyses
from virtually any activity within medicine. These
Pragmatic clinical Per-protocol
ideas are called hypotheses to the extent that they
trials Superiority trials
are assertions about the natural world that are made
Hawthorne effect Non-inferiority trials
for the purposes of empiric testing.
Placebo Inferiority margin
Some therapeutic hypotheses are suggested by the
Placebo effect Cluster randomized
mechanisms of disease at the molecular level. Drugs
Random allocation trials
for antibiotic-resistant bacteria are developed through
Randomization Cross-over trials
knowledge of the mechanism of resistance and hor-
Baseline Trials of N ⫽ 1
mone analogues are variations on the structure of
characteristics Confounding by
native hormones. Other hypotheses about treatments
Stratified indication
have come from astute observations by clinicians,
randomization Phase I trials
shared with their colleagues in case reports. Others
Compliance Phase II trials
are discovered by accident: The drug minoxidil, which
Adherence Phase III trials
was developed for hypertension, was found to improve
Run-in period Postmarketing
male pattern baldness; and tamoxifen, developed for
Cross-over surveillance
contraception, was found to prevent breast cancer
Blinding
in high-risk women. Traditional medicines, some
of which are supported by centuries of experience,
132
Chapter 9: Treatment 133
Case reports
Tests of
Biology hypotheses
Epidemiology Observational
Hypotheses studies EVIDENCE-BASED
Clinical IDEAS (Proposed answers)
observation Experimental INFORMATION
studies
Imagination
Reasoning
YES
Treated group
NO
SAMPLE Randomization
YES
Control group
NO
Figure 9.2 ■ The structure of a randomized controlled trial.
interest. Using randomization, the patients are then whether one is more harmful than the other. Of
divided into two (or more) groups of comparable course, as with any human research, patients must
prognosis. One group, called the experimental group, fully understand the consequences of participating in
is exposed to an intervention that is believed to be the study, know that they can withdraw at any time
better than current alternatives. The other group, without compromising their health care, and freely
called a control (or comparison) group, is treated give their consent to participate. In addition, the trial
the same in all ways except that its members are not must be stopped whenever there is convincing evi-
exposed to the experimental intervention. Patients in dence of effectiveness, harm, or futility in continuing.
the control group may receive a placebo, usual care,
or the current best available treatment. The course of Sampling
disease is then recorded in both groups, and differ-
Clinical trials typically require patients to meet rig-
ences in outcome are attributed to the intervention.
orous inclusion and exclusion criteria. These are
The main reason for structuring clinical trials in
intended to increase the homogeneity of patients in
this way is to avoid confounding when comparing the
the study, to strengthen internal validity, and to make
respective effects of two or more kinds of treatments.
it easier to distinguish the “signal” (treatment effect)
The validity of clinical trials depends on how well
from the “noise” (bias and chance).
they have created equal distribution of all determi-
Among the usual inclusion criteria is that patients
nants of prognosis, other than the one being tested, in
really do have the condition being studied. To be on
treated and control patients.
the safe side, study patients must meet strict diagnos-
Individual elements of clinical trials are described
tic criteria. Patients with unusual, mild, or equivocal
in detail in the following text.
manifestations of disease may be left out in the pro-
cess, restricting generalizability.
Ethics Of the many possible exclusion criteria, several
account for most of the losses:
Under what circumstances is it ethical to assign
treatment at random, rather than as decided by the 1. Patients with comorbidity (diseases other than the
patient and physician? The general principle, called one being studied) are typically excluded because
equipoise, is that randomization is ethical when the care and outcome of these other diseases can
there is no compelling reason to believe that either muddy the contrast between experimental and
of the randomly allocated treatments is better than comparison treatments and their outcomes.
the other. Usually it is believed that the experimen- 2. Patients are excluded if they are not expected to
tal intervention might be better than the control but live long enough to experience the outcome events
that has not been conclusively established by strong of interest.
research. The primary outcome must be benefit; 3. Patients with contraindications to one of the treat-
treatments cannot be randomly allocated to discover ments cannot be randomized.
136 Clinical Epidemiology: The Essentials
4. Patients who refuse to participate in a trial are care is otherwise the same as usual, without a great deal
excluded, for ethical reasons described earlier in of extra testing that is part of some trials. Follow-up
the chapter. is for a simple, clinically important outcome, such as
5. Patients who do not cooperate during the early discharge from the hospital alive. This approach not
stages of the trial are also excluded. This avoids only improves generalizability, it also makes it easier
wasted effort and the reduction in internal validity to recruit large numbers of participants at a reason-
that occurs when patients do not take their assigned able cost so that moderate effect sizes (large effects are
intervention, move in and out of treatment groups, unlikely for most clinical questions) can be detected.
or leave the trial altogether. Practical clinical trials (also called pragmatic
clinical trials) are designed to answer real-world
For these reasons, patients in clinical trials are usu-
questions in the actual care of patients by including
ally a highly selected, biased sample of all patients with
the kinds of patients and interventions found in ordi-
the condition of interest. As heterogeneity is restricted,
nary patient care settings.
the internal validity of the study is improved; in other
words, there is less opportunity for differences in out-
come that are not related to treatment itself. However,
exclusions come at the price of diminished general- Example
izability: Patients in the trial are not like most other
patients seen in day-to-day care. Severe ankle sprains are a common problem
among patients visiting emergency depart-
ments. Various treatments are in common use.
The Collaborative Ankle Support Trial Group
Example enrolled 584 patients with severe ankle sprain
in eight emergency departments in the United
Figure 9.3 summarizes how patients were
Kingdom in a randomized trial of four com-
selected for a randomized controlled trial of
monly used treatments: tubular compression
asthma management (3). Investigators invited
bandage and three types of mechanical sup-
1,410 patients with asthma in 81 general
port (4). Quality of ankle function at 3 months
practices in Scotland to participate. Only 458
was best after a below-the-knee cast was used
of those invited, about one-third, agreed to
for 10 days and worst when a tubular compres-
participate and could be contacted. An addi-
sion bandage was used; the tubular compres-
tional 199 were excluded, mainly because they
sion bandage was being used in 75% of centers
did not meet eligibility criteria, leaving 259
in the United Kingdom at the time. Two less
patients (18% of those invited) to be random-
effective forms of mechanical support were
ized. Although the study invited patients from
several times more expensive than the cast.
community practices, those who actually par-
All treatment groups improved over time and
ticipated in the trial were highly selected and
there was no difference in outcome among
perhaps unlike most patients in the community.
them at 9 months.
Randomized 259
Figure 9.3 ■ Sampling of patients for a randomized controlled trial of asthma management. (Data from Hawkins G,
McMahon AD, Twaddle S, et al. Stepping down inhaled corticosteroids in asthma: randomized controlled trial. BMJ 2003;326:
1115–1121.)
138 Clinical Epidemiology: The Essentials
to standardize the intervention so that it can be easily special interest and attention because of the study,
described and reproduced in other settings, investi- regardless of the specific nature of the interven-
gators may cater to their scientific, not their clinical tion they might be receiving. This phenomenon is
colleagues by studying treatments that are not feasible called the Hawthorne effect. The reasons are not
in usual practice. clear, but some seem likely: Patients want to please
Second, does the intervention reflect the normal them and make them feel successful. Also, patients
complexity of real-world treatment? Clinicians regu- who volunteer for trials want to do their part to see
larly construct treatment plans with many compo- that “good” results are obtained.
nents. Single, highly specific interventions make for ■ Usual Care. Do patients given the experimental treat-
tidy science because they can be described precisely ment do better than those receiving usual care—
and applied in a reproducible way, but they may have whatever individual doctors and patients decide?
weak effects. Multifaceted interventions, which are This is the only meaningful (and ethical) question if
often more effective, are also amenable to careful eval- usual care is already known to be effective.
uation as long as their essence can be communicated ■ Placebo Treatment. Do treated patients do bet-
and applied in other settings. For example, a random- ter than similar patients given a placebo—an
ized trial of fall prevention in acute care hospitals stud- intervention intended to be indistinguishable (in
ied the effects of a fall risk assessment scale, with inter- physical appearance, color, taste, or smell) from
ventions tailored to each patient’s specific risks (5). the active treatment but does not have a specific,
Third, is the intervention in question suffi- known mechanism of action? Sugar pills and
ciently different from alternative managements that saline injections are examples of placebos. It has
it is reasonable to expect that the outcome will be been shown that placebos, given with convic-
affected? Some diseases can be reversed by treating tion, relieve severe, unpleasant symptoms, such
a single, dominant cause. Treating hyperthyroidism as postoperative pain, nausea, or itching, in about
with radioisotope ablation or surgery is one example. one-third of patients, a phenomenon called the
However, most diseases arise from a combination of placebo effect. Placebos have the added advan-
factors acting in concert. Interventions that change tage of making it difficult for study patients to
only one of them, and only a small amount, cannot know which intervention they have received (see
be expected to result in strong treatment effects. If the “Blinding” in the following text).
conclusion of a trial evaluating such interventions is ■ Another Intervention. The comparator may be the
that a new treatment is not effective when used alone, current best treatment. The point of a “compara-
it should come as no surprise. For this reason, the first tive effectiveness” study is to find out whether a
trials of a new treatment tend to enroll those patients new treatment is better than the one in current use.
who are most likely to respond to treatment and to
Changes in outcome related to these comparators
maximize dose and compliance.
are cumulative, as diagrammed in Figure 9.4.
Comparison Groups
The value of an intervention is judged in relation to New
some alternative course of action. The question is Intervention
not only whether a comparison is used, but also how
appropriate it is for the research question. Results can Placebo
be measured against one or more of several kinds of effect
Improvement
comparison groups.
Usual
■ No Intervention. Do patients who are offered the care
experimental treatment end up better off than
those offered nothing at all? Comparing treat- Hawthorne
ment with no treatment measures the total effects effect
of care and of being in a study, both specific and
Natural
nonspecific. history
■ Being Part of a Study. Do treated patients do better
than other patients who just participate in a study?
A great deal of special attention is directed toward Figure 9.4 ■ Total effects of treatment are the sum of
patients in clinical trials. People have a tendency to spontaneous improvement (natural history) as well as
change their behavior when they are the target of nonspecific and specific responses.
Chapter 9: Treatment 139
Treatment and
control groups
test were available. Of 499 patients with sep-
Strata Final study
T
groups tic shock enrolled in the trial, 233 (47%) had
1 R adrenal insufficiency. The main analysis was
T of this subgroup, those who it was believed
C
might respond to hydrocortisone, not of all
patients randomized. Among patients with
T poor response to corticotrophin, there was no
Eligible difference in survival in patients treated with
2 R
patients hydrocortisone, compared with placebo.
C
T C
3 R
Because response to corticotrophin was a charac-
C teristic that existed before randomization, it had been
Stratification Randomization randomly allocated, so the advantages of a randomized
Figure 9.5 ■ Diagram of stratified randomization. T = controlled trial were preserved. However, the ineffi-
treated group, C = control group, and R = randomization. ciency of enrolling and gathering data on patients who
would not contribute to the study’s results could not be
avoided. Also, the number of patients in the study was
they were thought to have when they entered the reduced, making it more difficult to detect differences
trial. Others drop out, do not take their medications, in survival if they existed. Nevertheless, this kind of
are taken out of the study because of side effects or trial has the important advantage of providing infor-
other illnesses, or somehow obtain the other study mation on both the consequences of a decision that
treatment or treatments that are not part of the study a clinician must make before all the relevant informa-
at all. In this way, treatment groups that might have tion is available as well as effectiveness of treatment in
been comparable just after randomization become the subset of patients most likely to respond (see the
less so as time passes. Intention-to-Treat and Explanatory Trials section in
this chapter).
Patients May Not Have the
Disease Being Studied Compliance
It is sometimes necessary (both in clinical trials and Compliance is the extent to which patients follow
in practice) to begin treatment before it is certain medical advice. The term adherence is preferred by
whether the patient actually has the disease for which some people because it connotes a less subservient
the treatment is designed. relationship between patient and doctor. Compliance
is another characteristic that comes into play after
randomization.
Although noncompliance suggests a kind of willful
Example neglect of good advice, other factors also contribute.
Patients may misunderstand which drugs and doses
Hydrocortisone may improve survival from are intended, run out of prescription medications,
septic shock, especially in patients with abnor- confuse various preparations of the same drug, or
mal adrenal response to shock, as measured have no money or insurance to pay for drugs. Taken
by an inappropriately small rise in plasma together, non-compliance may limit the usefulness
cortisol after administration of corticotropin. of treatments that have been shown to work under
Treatment must begin before the results of favorable conditions.
the test are available. Investigators in Israel In general, compliance marks a better prognosis,
and Europe studied whether hydrocortisone apart from treatment. Patients in randomized trials
improved survival to 28 days in patients with who were compliant with placebo had better out-
septic shock (7). Patients were randomized to comes than those who were not (8).
hydrocortisone or placebo, and treatment was Compliance is particularly important in medical
begun before results of the corticotrophin care outside the hospital. In hospitals, many factors
act to constrain patients’ personal behavior and render
Chapter 9: Treatment 141
them compliant. Hospitalized patients are gener- For example, in a study of asthma treatment, they
ally sicker and more frightened. They are in strange may receive not only the experimental drug but also
surroundings, dependent upon the skill and atten- different doses of their usual drugs and make greater
tion of the staff for everything, including their life. efforts to control allergens in the home. If these occur
What is more, doctors, nurses, and pharmacists have unequally in the two groups and affect outcomes,
developed a well-organized system for ensuring that they can introduce systematic differences between the
patients receive what is ordered for them. As a result, groups that were not present when the groups were
clinical experience and medical literature developed formed.
on the wards may underestimate the importance of
compliance outside the hospital, where most patients Blinding
and doctors are and where following doctors’ orders
Participants in a trial may change their behavior or
is less common.
reporting of outcomes in a systematic way (i.e., be
In clinical trials, patients are typically selected to
biased) if they are aware of which patients are receiv-
be compliant. During a run-in period, in which
ing which treatment. One way to minimize this effect
placebo is given and compliance monitored, non-
is by blinding, an attempt to make the various par-
compliant patients can be detected and excluded
ticipants in a study unaware of the treatment group
before randomization.
patients have been randomized to so that this knowl-
edge cannot cause them to act differently, and thereby
Cross-over diminish the internal validity of the study. Masking
Patients may move from one randomly allocated treat- is a more appropriate metaphor, but blinding is the
ment to another during follow-up, a phenomenon time-honored term.
called cross-over. If exchanges between treatment Blinding can take place in a clinical trial at four
groups take place on a large scale, it can diminish the levels (Fig. 9.6). First, those responsible for allocating
observed differences in treatment effect compared to patients to treatment groups should not know which
what might have been observed if the original groups treatment will be assigned next making it impossible
had remained intact. for them to break the randomization plan. Alloca-
tion concealment is a term for this form of blind-
ing. Without it, some investigators might be tempted
Cointerventions
to enter patients in the trial out of order to ensure
After randomization, patients may receive a variety that individuals get the treatment that seems best
of interventions other than the ones being studied. for them. Second, patients should be unaware of
YES
Treated group
NO
SAMPLE Randomization
YES
Control group
NO
Options for describing effect size in clinical trials diseases. Most randomized trials are designed in this
are summarized in Table 9.3. The options are simi- way.
lar to summaries of risk and prognosis but related to Second, does treatment help under ordinary cir-
change in outcome resulting from the intervention. cumstances? Trials designed to answer this kind of
question are called effectiveness trials. All the usual
EFFICACY AND EFFECTIVENESS elements of patient care are part of effectiveness trials.
Patients may not take their assigned treatment. Some
Clinical trials may describe the results of an interven- may drop out of the study, and others find ways to
tion in ideal or in real-world situations (Fig. 9.7). take the treatment they were not assigned. The doc-
First, can treatment help under ideal circum- tors and facilities may not be the best. In short, effec-
stances? Trials that answer this question are called tiveness trials describe results as most patients would
efficacy trials. Elements of ideal circumstances experience them. The difference between efficacy and
include patients who accept the interventions effectiveness has been described as the “implementa-
offered to them, follow instructions faithfully, get tion gap,” the gap between ideal care and ordinary
the best possible care, and do not have care for other care, and is a target for improvement in its own right.
Compliance
Types of patients
Practicality
Cost
Efficacy trials usually precede effectiveness trial. of whether these patients actually received the
The rationale is that if treatment under the best cir- treatment they were supposed to receive. This way
cumstances is not effective, then effectiveness under of analyzing trial results is called an intention-to-
ordinary circumstances is impossible. Also, if an treat analysis. An advantage of this approach is
effectiveness trial were done first and it showed no that the question corresponds to the one actually
effect, the result could have been because the treat- faced by clinicians; they either offer a treatment
ment at its best is just not effective or that the treat- or not. Also, the groups compared are as origi-
ment really is effective but was not received. nally randomized, so this comparison has the full
strength of a randomized trial. The disadvantage is
Intention-to-Treat and that to the extent that many patients do not receive
the treatment to which they were randomized, dif-
Explanatory Trials
ferences in effectiveness will tend to be obscured,
A related issue is whether the results of a random- increasing the chances of observing a misleadingly
ized controlled trial should be analyzed and presented small effect or no statistical effect at all. If the study
according to the treatment to which the patients were shows no difference, it will be uncertain whether
randomized or according to the one they actually the problem is the treatment itself or that it was
received (Fig. 9.8). not received.
One question is: Which treatment choice is best Another question is whether the experimen-
at the time the decision must be made? To answer tal treatment itself is better. For this question, the
this question, analysis is according to which group proper analysis is according to the treatment each
the patients were assigned (randomized), regardless patient actually received, regardless of the treatment
INTENTION TO TREAT
Experimental
Analysis
according to
SAMPLE treatment
assigned
Control
EXPLANATORY
Experimental
Analysis
according to
SAMPLE treatment
received
Control
events are counted in patients according to the treat- the subgroups existed before randomization, patients
ment their cluster was assigned. Randomization of in each subgroup have been randomly allocated to
groups, not the individuals in them, may be prefera- treatment groups. As a consequence, results in each
ble for several reasons. It may just be more practical to subgroup represent, in effect, a small trial within a
randomized clusters than individuals. Patients within trial. The characteristics of a given patient (e.g., the
clusters may be more similar to each other than to patient might be elderly and have severe disease but
patients in other clusters, and this source of variation, no comorbidity) can be matched more specifically to
apart from the study treatments themselves, should those of one of the subgroups than it can to the trial
be taken into account in the study. If patients were as a whole. Treatment effectiveness in the matched
randomized within clusters, they or their physicians subgroup will more closely approximate that of the
might learn from each other about both treatments individual patient and will be limited mainly by sta-
and this might affect their behaviors. For example, tistical risks of false-positive and false-negative con-
how successfully could a physician be at randomly clusions, which are described in Chapter 11.
treating some patients with urinary tract infection one
way and other patients another way? Similarly, could Effectiveness in Individual Patients
a hospital establish a new plan to prevent intravenous
A treatment that is effective on average may not work
catheter infections in some of its intensive care units
on an individual patient. Therefore, results of valid
and not others when physicians see patients in both
clinical research provide a good reason to begin treat-
settings over time? For these reasons, randomizing
ing a patient, but experience with that patient is a
clusters rather than patients can be the best approach
better reason to continue or not continue. When
in some circumstances.
managing an individual patient, it is prudent to ask
There are other variations on the usual (“parallel
the following series of questions:
group”) randomized controlled trials. Cross-over
trials expose patients first to one of two randomly ■ Is the treatment known (by randomized controlled
allocated treatments and later to the other. If it can be trials) to be efficacious for any patients?
assumed that effects of the first exposure are no longer ■ Is the treatment known to be effective, on average,
present by the time of the second exposure, perhaps in patients like mine?
because treatment is short-lived or there has been ■ Is the treatment working in my patient?
a “wash-out” period between exposures, then each ■ Are the benefits worth the discomforts and risks
patient will have been exposed to each treatment in (according to the patient’s values and preferences)?
random order. This controls for differences in respon-
By asking these questions and not simply follow-
siveness among patients not related treatment effects.
ing the results of trials alone, one can guard against
ill-founded choice of treatment or stubborn persis-
TAILORING THE RESULTS OF tence in the face of poor results.
TRIALS TO INDIVIDUAL PATIENTS
Trials of N ⫽ 1
Clinical trials describe what happens on average. They
Rigorous clinical trials, with proper attention to bias
involve pooling the experience of many patients who
and chance, can be carried out on individual patients,
may be dissimilar, both to one another and to the
one at a time. The method, called trials of N ⫽ 1,
patients to whom the trial results will be generalized.
is an improvement over the time-honored process of
How can estimates of treatment effect be obtained
trial and error. A patient is given one treatment or
that more closely match individual patients?
another, such as an active treatment or placebo, in
random order, each for a brief period of time. The
Subgroups patient and physician are blinded to which treatment
Patients in clinical trials can be sorted into subgroups, is given. Outcomes, such as a simple preference for a
each with a specific characteristic (or combination of treatment or a symptom score, are assessed after each
characteristics) such as age, severity of disease, and period. After many repetitions patterns of responses
comorbidity that might cause a different treatment are analyzed statistically, much as one would for a
effect. That is, the data are examined for effect modifi- more usual randomized controlled trial. This method
cation. The number of such subgroups is limited only is useful for deciding on the care of individual patients
by the number of patients in the subgroups, which when activity of disease is unpredictable, response to
has to be large enough to provide reasonably stable treatment is prompt, and there is no carryover effect
estimates. As long as the characteristics used to define from period to period. Examples of diseases for which
Chapter 9: Treatment 147
the method can be used include migraine headaches, various ways and go on to manifest the effects. When
asthma, and fibromyalgia. For all their intellectual experience with these patients is captured and prop-
appeal, however, trials of N = 1 are rarely done and erly analyzed, it can complement the information
even more rarely published. available from randomized trials, suggest where new
trials are needed, and provide answers where trials are
ALTERNATIVES TO RANDOMIZED not yet available.
CONTROLLED TRIALS
Randomized controlled trials are the gold standard
for studies of the effectiveness of interventions. Example
Only large randomized trials can definitively elimi-
nate confounding as an alternative explanation for Non–small cell lung cancer is common in the
observed results. elderly. Palliative chemotherapy for advanced
disease prolongs life, yet the elderly receive
Limitations of Randomized Trials chemotherapy less often than younger patients,
However, the availability of several well-conducted perhaps because of concern that they would be
randomized controlled trials does not necessarily set- more likely to develop adverse events such as
tle a question. For example, after 21 randomized con- fever and infections, nerve damage, or deep
trolled trials over three decades (one is described in venous thrombosis. To see if elderly patients in
an example earlier in this chapter) the effectiveness of the community really are more likely to expe-
corticosteroids for septic shock remains controversial. rience adverse events, everything else being
The heterogeneity of trial results seems partly related equal, investigators analyzed data from a
to differences in dose and duration of the drug, the cohort study of the care and outcomes of lung
proportion of patients with relative adrenal insuffi- cancer (12). There was consistent evidence that
ciency, and whether the outcome is reversal of shock the older patients who received chemotherapy
or survival. That is, the trials were of the same general were selected to be less likely to have adverse
questions but very different specific questions. events: Fewer got chemotherapy, they had
Clinical trials also suffer from practical limita- fewer adverse events before treatment, and
tions. They are expensive; in 2011, the average cost they received less aggressive therapy. Even so,
per patient in drug trials was estimated to be nearly adverse event rates after chemotherapy were
$50,000, and some large trials have cost hundreds of 34% to 70% more frequent in older patients
millions of dollars. Logistics can be daunting, espe- than in younger ones. This effect could not
cially in maintaining similar methods across sites in be attributed to other diseases more common
multicenter trials and in maintaining the integrity with age—they persisted after adjustment for
of allocation concealment. Randomization itself comorbidity—and is more likely to be the result
remains a hindrance, if not to the conduct of trials at of age-related decrease in organ function.
all then to full, unbiased participation. It is particu- Whether the benefits of treatment outweigh
larly difficult to convince patients to be randomized the increased rate of adverse events is a sepa-
when a practice has become well established in the rate question.
absence of conclusive evidence of its benefit.
For these reasons, clinical trials are not available
to guide clinicians in many important treatment
Unfortunately, it is difficult to be sure that obser-
decisions, but clinical decisions must be made none-
vational studies of treatment are not confounded.
theless. What are the alternatives to randomized con-
Treatment choice is determined by a great many fac-
trolled trials, and how credible are they?
tors including severity of illness, concurrent diseases,
Observational Studies local preferences, and patient cooperation. Patients
receiving the various treatments are likely to differ
of Interventions not only in their treatment but in other ways as well.
In the absence of a consensus favoring one mode of Especially troubling is confounding by indica-
treatment over others, various treatments are given tion (sometimes called “reverse causation”), which
according to the preferences of each individual patient occurs when whatever prompted the doctor to
and doctor. As a result, in the course of ordinary choose a treatment (the “indication”) is a cause of the
patient care, large numbers of patients are treated in observed outcome, not just the treatment itself. For
148 Clinical Epidemiology: The Essentials
example, patients may be offered a new surgical pro- database is not part of formal research, there is no
cedure because they are a good surgical risk or have accounting for confounding and effect modification,
less aggressive disease and, therefore, seem especially but the predictions do have the advantage of being
likely to benefit from the procedure. To the extent about real-world patients, not those as highly selected
that the reasons for treatment choice are known, they as in most clinical trials.
can be taken into account like any other confounders.
Randomized versus
Observational Studies?
Are observational studies a reliable substitute for
Example randomized controlled trials? With controlled trials
Influenza can cause worsening of broncho- as the gold standard, most observational studies of
spasm in children with asthma. Influenza vac- most questions get the right answer. However, there
cine is recommended, but many children with are dramatic exceptions. For example, observational
asthma were not getting vaccinated, perhaps studies have consistently shown that antioxidant vita-
because of concern that vaccination might cause mins are associated with lower cardiovascular risk,
an exacerbation of asthma, which was suggested but large randomized controlled trials have found
by observational studies. The association might no such effect. Therefore, clinicians can be guided by
have been the result of confounding by indica- observational studies of treatment effects when there
tion, whereby children with worse asthma were are no randomized trials to rely on, but they should
both more likely to be vaccinated and more likely maintain a healthy skepticism.
to experience exacerbations. To test this possibil- Well-designed observational studies of interventions
ity, investigators did a population-based cohort have some strengths that complement the limitations
study of the influenza vaccine and asthma of usual randomized trials. They count the effects of
exacerbations in children 1 to 6 years of age actual treatment, not just of offering treatment, which
in four large health maintenance organizations is a legitimate question in its own right. They com-
during three influenza seasons (13). In crude monly include most people as they exist in naturally
(unadjusted) analyses, asthma exacerbation occurring populations, in either clinical or community
rates were two to three times higher in vacci- settings, without severe inclusion and exclusion crite-
nated children compared with those not vacci- ria. They can often accomplish longer follow-up than
nated. Risk was smaller, but still elevated, after trials, matching the time it takes for disease and out-
adjusting for age, sex, health care organiza- comes to develop. By taking advantage of treatments
tion, calendar time, asthma severity (based on and outcomes as they happen, observational studies
the use of drugs and medical visits for asthma), (especially case-control studies and historical cohort
and preventive care practices. To achieve more studies using health records) can answer clinical ques-
precise control for severity of asthma, rates in tions more quickly than it takes to complete a random-
the 2 weeks after vaccination were compared ized trial. Of course, they are also less expensive.
with rates in the same children in periods out- Because an ideal randomized controlled trial is
side this interval. In this analysis, each child the standard of excellence, it has been suggested that
served as his or her own control. The resulting observational studies of treatment be designed to
relative risks indicated protection with vaccina- resemble, as closely as possible, a randomized trial of
tion: 0.58, 0.74, and 0.98 in the three seasons. A the same question (15). One might ask, if the study
summary of randomized controlled trials sug- had been a randomized trial what would be the inclu-
gests that the vaccine is unlikely to precipitate sion and exclusion criteria (e.g., excluding patients
asthma attacks (14). with contraindications to either intervention), how
would exposure be precisely defined and how should
drop-outs and cross-overs be managed? The resulting
observational study cannot be expected to avoid all
Clinical Databases vulnerabilities, but at least it would be stronger.
Sometimes, databases are available that include base-
line characteristics and outcomes for a large number PHASES OF CLINICAL TRIALS
of patients. Clinicians can match characteristic of a
specific patient to similar patients in the database For studies of drugs, it is customary to define three
and see what their outcomes were. When use of the phases of trials in the order they are undertaken.
Chapter 9: Treatment 149
Phase I trials are intended to identify a dose range rates of common side effects. They include enough
that is well tolerated and safe (at least for high- patients, sometimes thousands, to detect clinically
frequency, severe side effects) and include very small important treatment effects and are usually published
numbers of patients (perhaps a dozen) without a in biomedical journals.
control group. Phase II trials provide preliminary Phase III trials are not large enough to detect dif-
information on whether the drug is efficacious and ferences in the rate, or even the existence, of uncom-
the relationship between dose and efficacy. These tri- mon side effects (see discussion of statistical power
als may be controlled but include too few patients in in Chapter 11). Therefore, it is necessary to fol-
treatment groups to detect any but the largest treat- low up very large numbers of patients after a drug
ment effects. Phase III trials are randomized trials is in general use, a process called postmarketing
and can provide definitive evidence of efficacy and surveillance.
Review Questions
Read the following and select the best 9.3. In a randomized trial, patients with
response. meningitis who were treated with
corticosteroids had lower rates of death,
9.1. A randomized controlled trial compares two hearing loss, and neurologic sequelae. Which
drugs in common use for the treatment of of the following is a randomized comparison?
asthma. Three hundred patients were entered
A. The subset of patients who, at the time of
into the trial, and eligibility criteria were
randomization, were severely affected by
broad. No effort was made to blind patients
the disease
to their treatment group after enrollment.
B. Patients who experienced other treatments
Except for the study drugs, care was decided
versus those who did not
by each individual physician and patient. The
C. Patients who remained in the trial
outcome measure was a brief questionnaire
versus those who dropped out after
assessing asthma-related quality of life. Which
randomization
of the following best describes this trial?
D. Patients who responded to the drug versus
A. Practical clinical trial those who did not
B. Large simple trial E. Patients who took the drug compared
C. Efficacy trial with those who did not
D. Equivalence trial
E. Non-inferiority trial 9.4. A patient asks for your advice about whether
to begin an exercise program to reduce
9.2. A randomized controlled trial compared his risk of sudden death. You look for
angioplasty with fibrinolysis for the treatment randomized controlled trials but find only
of acute myocardial infarction. The authors observational studies of this question. Some
state that “analysis was by intention to treat.” are cohort studies comparing sudden death
Which of the following is an advantage of rates in exercisers with rates of sudden death
this approach? in sedentary people; others are case-control
A. It describes the effects of treatments that studies comparing exercise patterns in people
patients have actually received. who had experienced sudden death and
B. It is unlikely to underestimate treatment matched controls. Which of the following
effect. is not an advantage of observational studies
C. It is not affected by patients dropping out of treatments like these over randomized
of the study. controlled trials?
D. It describes the consequences of offering A. Reported effects are for patients who have
treatments regardless of whether they are actually experienced the intervention.
actually taken. B. It may be possible to carry out these
E. It describes whether treatment can work studies by using existing data that was
under ideal circumstances. collected for other purposes.
150 Clinical Epidemiology: The Essentials
C. The results can be generalized to more 9.8. A randomized controlled trial is analyzed
ordinary, real world settings. according the treatment each patient actually
D. Treatment groups would have had a received. Which of the following best
similar prognosis except for treatment describes this approach to analysis?
itself.
A. Superiority
E. A large sample size is easier to achieve.
B. Intention-to-treat
C. Explanatory
9.5. In a randomized controlled trial of a program
D. Phase I
to reduce lower extremity problems in
E. Open-label
patients with diabetes mellitus, patients were
excluded if they were younger than age 40,
9.9. In a randomized controlled trial, a beta-
were diagnosed before becoming 30 years old,
blocking drug is found to be more effective
took specific medication for hyperglycemia,
than placebo for stage fright. Patients taking
had other serious illness or disability, or were
the beta-blocker tended to have a lower pulse
not compliant with prescribed treatment
rate and to feel more lethargic, which are
during a run-in period. Which of the
known effects of this drug. For which of the
following is an advantage of this
following is blinding possible?
approach?
A. The patients’ physicians
A. It makes it possible to do an intention-to-
B. The investigators who assigned patients to
treat analysis.
treatment groups
B. It avoids selection bias.
C. The patients in the trial
C. It improves the generalizability of the
D. The investigators who assess outcome
study.
D. It makes an effectiveness trial possible.
9.10. Which of the following best describes
E. It improves the internal validity of the
“equipoise” as the rationale for a randomized
study.
trial of two drugs?
9.6. Which of the following is not accomplished A. The drugs are known to be equally
by an intention-to-treat analysis? effective.
B. One of the drugs is known to be more
A. A comparison of the effects of actually
toxic.
taking the experimental treatments
C. Neither drug is known to be more
B. A comparison of the effects of offering the
effective than the other.
experimental treatments
D. Although one drug is more effective, the
C. A randomized comparison of treatment
other drug is easier to take with fewer side
effects
effects.
9.7. You are reading a report of a randomized
9.11. Antibiotic A is the established treatment for
controlled trial and wonder whether stratified
community-acquired pneumonia, but it is
randomization, which the trial used, was
expensive and has many side effects. A new
likely to improve internal validity. For which
drug, antibiotic B, has just been developed
of the following is stratified randomization
for community-acquired pneumonia and
particularly helpful?
is less expensive and has fewer side effects,
A. The study includes many patients. but its efficacy, relative to drug A, is not well
B. One of the baseline variables is strongly established. Which of the following would be
related to prognosis. the best kind of trial for evaluating drug B?
C. Assignment to treatment group is not
A. Superiority
blinded.
B. Cross-over
D. Many patients are expected to drop
C. Cluster
out.
D. Non-inferiority
E. An intention-to-treat analysis is
E. Equivalence
planned.
Chapter 9: Treatment 151
REFERENCES
1. Action to Control Cardiovascular Risk in Diabetes Study 9. The AIM-HIGH Investigators. Niacin in patients with low
Group, Gerstein HC, Miller ME, et al. Effects of intensive HDL cholesterol levels receiving intensive statin therapy. N
glucose lowering in type 2 diabetes. N Engl J Med 2008; Engl J Med 2011;365:2255–2267.
358:2545–2559. 10. Feldman T, Foster E, Glower DG, et al. Percutaneous repair
2. Allen C, Glasziou P, Del Mar C. Bed rest: a potentially or surgery for mitral regurgitation. N Engl J Med 2011;364:
harmful treatment needing more careful evaluation. Lancet 1395–1406.
1999;354:1229–1233. 11. Mitja O, Hayes R, Ipai A, et al. Single-dose azithromycin ver-
3. Hawkins G, McMahon AD, Twaddle S, et al. Stepping down sus benzathine benzylpenicillin for treatment of yaws in chil-
inhaled corticosteroids in asthma: randomized controlled trial. dren in Papua New Guinea: an open-label, non-inferiority,
BMJ 2003;326:1115–1121. randomized trial. Lancet 2012;379:342–347.
4. Lamb SE, Marsh JL, Hutton JL, et al. Mechanical supports for 12. Chrischilles EA, Pendergast JF, Kahn KL, et al. Adverse events
acute, severe ankle sprain: a pragmatic, multicentre, random- among the elderly receiving chemotherapy for advanced non-
ized controlled trial. Lancet 2009;373:575–581. small cell lung cancer. J Clin Oncol 2010;28:620–627.
5. Dykes PC, Carroll DL, Hurley A, et al. Fall prevention in acute 13. Kramarz P, DeStefano F, Gargiullo PM, et al. Does influenza
care hospitals. A randomized trial. JAMA 2010;304:1912–1918. vaccination exacerbate asthma. Analysis of a large cohort of
6. Carson JL, Terrin ML, Noveck H, et al. Liberal or restrictive children with asthma. Vaccine Safety Datalink Team. Arch
transfusion in high-risk patients after hip surgery. N Engl J Fam Med 2000;9:617–623.
Med 2011;365:2453–2462. 14. Cates CJ, Jefferson T, Rowe BH. Vaccines for preventing influ-
7. Sprung CL, Annane D, Keh D, et al. Hydrocortisone therapy for enza in people with asthma. Available at http://summaries.
patients with septic shock. N Engl J Med 2008;358:111–124. cochrane.org/CD000364/vaccines-for-preventing-influenza-
8. Avins AL, Pressman A, Ackerson L, et al. Placebo adherence in-people-with-asthma. Accessed July 26, 2012.
and its association with morbidity and mortality in the studies 15. Feinstein AR, Horwitz RI. Double standards, scientific meth-
of left ventricular dysfunction. J Gen Intern Med 2010;25: ods, and epidemiologic research. N Engl J Med 1982;307:
1275–1281. 1611–1617.
Chapter 10
Prevention
If a patient asks a medical practitioner for help, the doctor does the best he can. He is
not responsible for defects in medical knowledge. If, however, the practitioner initiates
screening procedures, he is in a very different situation. He should have conclusive evidence
that screening can alter the natural history of disease in a significant proportion of those
screened.
—Archie Cochrane and Walter Holland
1971
Immunization
NO ASYMPTOMATIC CLINICAL
Childhood immunizations to prevent 15 different DISEASE DISEASE COURSE
diseases largely determine visit schedules to the pedia-
trician in the early months of life. Human papillo- Primary Secondary Tertiary
mavirus (HPV) vaccinations of adolescent girls has Remove risk Early detection Reduce
recently been added for prevention of cervical cancer. factors and treatment complications
Adult immunizations include diphtheria, pertussis,
Figure 10.1 ■ Levels of prevention. Primary prevention
and tetanus (DPT) boosters and well as vaccinations
prevents disease from occurring. Secondary prevention de-
to prevent influenza, pneumococcal pneumonia, and tects and cures disease in the asymptomatic phase. Tertiary
hepatitis A and B. prevention reduces complications of disease.
Screening
all, clinicians’ efforts are aimed at preventing the
Screening is the identification of asymptomatic dis-
untimely occurrences of the 5 Ds: death, disease, dis-
ease or risk factors. Screening tests start in the pre-
ability, discomfort, and dissatisfaction (discussed in
natal period (such as testing for Down syndrome in
Chapter 1). However, in clinical medicine, the defini-
the fetuses of older pregnant women) and continue
tion of prevention has traditionally been restricted to
throughout life (e.g., when inquiring about hearing
interventions in people who are not known to have
in the elderly). The latter half of this chapter discusses
the particular condition of interest. Three levels of
scientific principles of screening.
prevention have been defined: primary, secondary,
and tertiary prevention (Fig. 10.1).
Behavioral Counseling (Lifestyle Changes)
Clinicians can give effective behavioral counsel- Primary Prevention
ing to motivate lifestyle changes. Clinicians counsel
Primary prevention keeps disease from occurring at
patients to stop smoking, eat a prudent diet, drink
all by removing its causes. The most common clinical
alcohol moderately, exercise, and engage in safe sex-
primary care preventive activities involve immuniza-
ual practices. It is important to have evidence that
tions to prevent communicable diseases, drugs, and
(i) behavior change decreases the risk for the condi-
behavioral counseling. Recently, prophylactic surgery
tion of interest, and (ii) counseling leads to behav-
has become more common, with bariatric surgery to
ior change before spending time and effort on this
prevent complications of obesity, and ovariectomy
approach to prevention (see Levels of Prevention later
and mastectomy to prevent ovarian and breast cancer
in the chapter).
in women with certain genetic mutations.
Primary prevention has eliminated many infec-
Chemoprevention
tious diseases from childhood. In American men,
Chemoprevention is the use of drugs to prevent dis- primary prevention has prevented many deaths from
ease. It is used to prevent disease early in life (e.g., two major killers: lung cancer and cardiovascular
folate during pregnancy to prevent neural tube defects disease. Lung cancer mortality in men decreased by
and ocular antibiotic prophylaxis in all newborns to 25% from 1991 to 2007, with an estimated 250,000
prevent gonococcal ophthalmia neonatorum) but is deaths prevented (3). This decrease followed smok-
also common in adults (e.g., low-dose aspirin pro- ing cessation trends among adults, without orga-
phylaxis for myocardial infarction, and statin treat- nized screening and without much improvement in
ment for hypercholesterolemia). survival after treatment for lung cancer. Heart disease
mortality rates in men have decreased by half over
LEVELS OF PREVENTION the past several decades (4) not only because medi-
cal care has improved, but also because of primary
Merriam-Webster’s dictionary defines prevention as prevention efforts such as smoking cessation and use
“the act of preventing or hindering” and “the act or of antihypertensive and statin medications. Primary
practice of keeping something from happening” (2). prevention is now possible for cervical, hepatocel-
With these definitions in mind, almost all activities lular, skin and breast cancer, bone fractures, and
in medicine could be defined as prevention. After alcoholism.
154 Clinical Epidemiology: The Essentials
A special attribute of primary prevention involving ted to further investigation of abnormal results and
efforts to help patients adopt healthy lifestyles is that treatment, if necessary, the screening test should not
a single intervention may prevent multiple diseases. be performed at all.
Smoking cessation decreases not only lung cancer
but also many other pulmonary diseases, other cancers,
Tertiary Prevention
and, most of all, cardiovascular disease. Maintaining an
appropriate weight prevents diabetes and osteoarthri- Tertiary prevention describes clinical activities that
tis, as well as cardiovascular disease and some cancers. prevent deterioration or reduce complications after
Primary prevention at the community level can a disease has declared itself. An example is the use
also be effective. Examples include immunization of beta-blocking drugs to decrease the risk of death
requirements for students, no-smoking regulations in patients who have recovered from myocardial
in public buildings, chlorination and fluoridation of infarction. Tertiary prevention is really just another
the water supply, and laws mandating seatbelt use in term for treatment, but treatment focused on health
automobiles and helmet use on motorcycles and bicy- effects occurring not so much in hours and days but
cles. Certain primary prevention activities occur in months and years. For example, in diabetic patients,
specific occupational settings (use of earplugs or dust good treatment requires not just control of blood
masks), in schools (immunizations), or in specialized glucose. Searches for and successful treatment of
health care settings (use of tests to detect hepatitis B other cardiovascular risk factors (e.g., hypertension,
and C or HIV in blood banks). hypercholesterolemia, obesity, and smoking) help
For some problems, such as injuries from automo- prevent cardiovascular disease in diabetic patients as
bile accidents, community prevention works best. For much, and even more, than good control of blood
others, such as prophylaxis in newborns to prevent glucose. In addition, diabetic patients need regu-
gonococcal ophthalmia neonatorum, clinical settings lar ophthalmologic examinations for detecting early
work best. For still others, clinical efforts can comple- diabetic retinopathy, routine foot care, and monitor-
ment community-wide activities. In smoking preven- ing for urinary protein to guide use of angiotensin-
tion efforts, clinicians help individual patients stop converting enzyme inhibitors to prevent renal failure.
smoking and public education, regulations, and taxes All these preventive activities are tertiary in the sense
prevent teenagers from starting to smoke. that they prevent and reduce complications of a dis-
ease that is already present.
Secondary Prevention
Confusion about Primary,
Secondary prevention detects early disease when it
is asymptomatic and when treatment can stop it from
Secondary, and Tertiary Prevention
progressing. Secondary prevention is a two-step pro- Over the years, as more and more of clinical practice
cess, involving a screening test and follow-up diag- has involved prevention, the distinctions among pri-
nosis and treatment for those with the condition of mary, secondary, and tertiary prevention have become
interest. Testing asymptomatic patients for HIV and blurred. Historically, primary prevention was thought
routine Pap smears are examples. Most secondary of as primarily vaccinations for infectious disease and
prevention is done in clinical settings. counseling for healthy lifestyle behaviors, but primary
As indicated earlier, screening is the identification prevention now includes prescribing antihypertensive
of an unrecognized disease or risk factor by history medication and statins to prevent cardiovascular dis-
taking (e.g., asking if the patient smokes), physical eases, and performing prophylactic surgery to prevent
examination (e.g., a blood pressure measurement), ovarian cancer in women with certain genetic abnor-
laboratory test (e.g., checking for proteinuria in a malities. Increasingly, risk factors are treated as if they
diabetic), or other procedure (e.g., a bone mineral are diseases, even at a time when they have not caused
density examination) that can be applied reasonably any of the 5 Ds. This is true for a growing number
rapidly to asymptomatic people. Screening tests sort of health risks, for example, low bone mineral density,
out apparently well persons (for the condition of inter- hypertension, hyperlipidemia, obesity, and certain
est) who have an increased likelihood of disease or a genetic abnormalities. Treating risk factors as disease
risk factor for a disease from people who have a low broadens the definition of secondary prevention into
likelihood. Screening tests are part of all secondary the domain of traditional primary prevention.
and some primary and tertiary preventive activities. In some disciplines, such as cardiology, the term
A screening test is usually not intended to be diag- secondary prevention is used when discussing tertiary
nostic. If the clinician and/or patient are not commit- prevention. “A new era of secondary prevention” was
Chapter 10: Prevention 155
declared when treating patients with acute coronary understanding of what is being sought or prevented.
syndrome (myocardial infarction or unstable angina) For instance, physicians performing routine check-
with a combination of antiplatelet and anticoagu- ups on their patients may order a urinalysis. How-
lant therapies to prevent cardiovascular death (5). ever, a urinalysis might be used to search for any
Similarly, “secondary prevention” of stroke is used to number of medical problems, including diabetes,
describe interventions to prevent stroke in patients asymptomatic urinary tract infections, renal cancer,
with transient ischemia attacks. or renal failure. It is necessary to decide which, if
Tests used for primary, secondary, and tertiary any, of these conditions is worth screening for before
prevention, as well as for diagnosis, are often identi- undertaking the test. One of the most important sci-
cal, another reason for confusing the levels of pre- entific advances in clinical prevention has been the
vention (and confusing prevention with diagnosis). development of methods for deciding whether a pro-
Colonoscopy may be used to find a cancer in a posed preventive activity should be undertaken (6).
patient with blood in his stool (diagnosis); to find an The remainder of this chapter describes these methods
early asymptomatic colon cancer (secondary preven- and concepts.
tion); remove an adenomatous polyp, which is a risk Three criteria are important when judging whether
factor for colon cancer (primary prevention); or to a condition should be included in preventive care
check for cancer recurrence in a patient treated for (Table 10.1):
colon cancer (a tertiary preventive activity referred to
1. The burden of suffering caused by the condition.
as surveillance).
2. The effectiveness, safety, and cost of the preventive
Regardless of the terms used, an underlying rea-
intervention or treatment.
son to differentiate levels among preventive activities
3. The performance of the screening test.
is that there is a spectrum of probabilities of disease
and adverse health effects from the condition(s) being
sought and treated during preventive activities, as well
as different probabilities of adverse health effects from Table 10.1
interventions that are used for prevention at the vari-
Criteria for Deciding Whether a Medical
ous levels. The underlying risk of certain health prob-
Condition Should Be Included in
lems is usually much higher in diseased than healthy
Preventive Care
people. For example, the risk of cardiovascular dis-
ease in diabetics is much greater than in asymptom- 1. How great is the burden of suffering caused by the
atic non-diabetics. Identical tests perform differently condition in terms of:
depending on the level of prevention. Furthermore, Death Discomfort
the trade-offs between effectiveness and harms can be Disease Dissatisfaction
quite different for patients in different parts of the Disability Destitution
spectrum. False-positive test results and overdiagnosis 2. How good is the screening test, if one is to be
(both discussed later in this chapter) among people performed, in terms of:
without the disease being sought are important issues Sensitivity Safety
in secondary prevention, but they are less important Specificity Acceptability
in treatment of patients already known to have the Simplicity
disease in question. The terms primary, secondary, Cost
and tertiary prevention are ways to consider these dif- 3. A. For primary and tertiary prevention, how good is
ferences conceptually. the therapeutic intervention in terms of:
Effectiveness
Safety
SCIENTIFIC APPROACH TO Cost-effectiveness
CLINICAL PREVENTION Or
B. For secondary prevention, if the condition is found,
When considering what preventive activities to per- how good is the ensuing treatment in terms of:
form, the clinician must first decide with the patient Effectiveness
which medical problems or diseases they should Safety
try to prevent. This statement is so clear and obvi- Early treatment after screening being more effective
ous that it would seem unnecessary to mention, but than later treatment without screening, when
the patient becomes symptomatic
the fact is that many preventive procedures, espe-
Cost-effectiveness
cially screening tests, are performed without a clear
156 Clinical Epidemiology: The Essentials
determine if decades later it also was effective the electronic health databases of more than
against cancer. A study was done in Taiwan, 9 million persons. Comparing the incidence of
where nationwide HBV vaccination was be- GBS occurring up to 6 weeks after vaccination
gun in 1984. The rates of hepatocellular cancer to that of later (background) GBS occurrence in
rates were compared among people who were the same group of vaccinated individuals, the
immunized at birth between 1984 and 2004 to attributable risk of the 2009 vaccine was esti-
those born between 1979 and 1984 when no mated to be an additional five cases of GBS per
vaccination program existed. (The comparison million vaccinations. Although GBS incidence
was made possible by thorough national health increased after vaccination, the effect in 2009
databases on the island.) Hepatocellular cancer was about half that seen in the 1970s (9). The
incidence decreased almost 70% among young very low estimated attributable risk in 2009 is
people in the 20 years after the introduction of reassuring. If rare events, occurring in only a
HBV vaccination (8). few people per million are to be detected in
near real time, population-based surveillance
systems are required. Even so, associations
As pointed out in Chapter 5, observational studies found in surveillance systems are weak evi-
are vulnerable to bias. The conclusion that HBV vac- dence for a causal relationship because they
cine prevents hepatocellular carcinoma is reasonable are observational in nature and electronic da-
from a biologic perspective and from the dramatic tabases often do not have information on all
result. It will be on even firmer ground if studies of important possible confounding variables.
other populations who undergo vaccination confirm
the results from the Taiwan study. Building the case
for causation in the absence of randomized trials is Counseling
covered in Chapter 12.
U.S. laws do not require rigorous evidence of effec-
tiveness of behavioral counseling methods. Never-
Safety theless, clinicians should require scientific evidence
With immunizations, the occurrence of adverse before incorporating routine counseling into preven-
effects may be so rare that randomized trials would tive care; counseling that does not work wastes time,
be unlikely to uncover them. One way to study this costs money, and may harm patients. Research has
question is to track illnesses in large datasets of mil- demonstrated that certain counseling methods can
lions of patients and to compare the frequency of an help patients change some health behaviors. Smoking
adverse effect linked temporally to the vaccination cessation efforts have led the way with many random-
among groups at different time periods. ized trials evaluating different approaches.
Example Example
Guillain-Barré syndrome (GBS) is a rare, seri- Smoking kills approximately 450,000 Americans
ous immune-mediated neurological disorder each year. But, what is the evidence that advis-
characterized by ascending paralysis that can ing patients to quit smoking gets them to stop?
temporarily involve respiratory muscles so that Are some approaches better than others? These
intubation and ventilator support are required. questions were addressed by a panel that re-
A vaccine developed against swine flu in the viewed all studies done on smoking cessation,
1970s was associated with an unexpected focusing on randomized trials (10). They found
sharp rise in the number of cases of GBS and 43 trials that assessed amounts of counseling
contributed to suspension of the vaccination contact and found a dose-response—the more
program. In 2009, a vaccine was developed to contact time the better the abstinence rate
protect against a novel influenza A (H1N1) vi- (Fig. 10.2). In addition, the panel found that ran-
rus of swine origin, and a method was need- domized trials demonstrated that pharmaco-
ed to track GBS incidence. One way this was therapy with bupropion (a centrally acting drug
done was to utilize a surveillance system of that decreases craving), varenicline (a nicotine
158 Clinical Epidemiology: The Essentials
20
10
0
Counciling sessions (No.) 0–1 2–3 4–8 >8
Medication used – + – + – + – +
Figure 10.2 ■ Dose response of smoking cessation rates according to the
number of counseling sessions clinicians have with patients and use of
medication. (Data from Fiore MC, Jaén CR, Baker TB, et al. Treating tobacco use
and dependence: 2008 Update. Clinical Practice Guideline. Rockville, MD: U.S.
Department of Health and Human Services. Public Health Service. May 2008.)
UNSCREENED Dx
SCREENED
Early treatment Dx
not effective
SCREENED Improved
Early treatment Dx survival
is effective
Figure 10.4 ■ How lead time affects survival time after screening; O ⫽ onset of disease. Pink-
shaded areas indicate length of survival after diagnosis (Dx).
in the group drops after the prevalence screen. This it takes 20 to 30 years for it to progress from carci-
means that the positive predictive value for test results noma in situ into a clinically invasive disease), treat-
will decrease after the first round of screening. ment of the medical condition found on screening
can be very effective.
Special Biases How can lead time cause biased results in a study of
the efficacy of early treatment? As Figure 10.4 shows,
The following biases are most likely to be a problem
because of screening, a disease is found earlier than it
in observational studies of screening.
would have been after the patient developed symptoms.
As a result, people who are diagnosed by screening for
Lead-Time Bias
a deadly disease will, on average, survive longer from
Lead time is the period of time between the detec- the time of diagnosis than people who are diagnosed
tion of a medical condition by screening and when after they develop symptoms, even if early treatment is
it ordinarily would have been diagnosed because a no more effective than treatment at the time of clini-
patient experienced symptoms and sought medical cal presentation. In such a situation, screening would
care (Fig. 10.4). The amount of lead time for a given appear to help people live longer, spuriously improving
disease depends on the biologic rate of progression survival rates when, in reality, they would have been
of the disease and how early the screening test can given not more “survival time” but more “disease time.”
detect the disease. When lead time is very short, as An appropriate method of analysis to avoid lead-
is true with lung cancer, it is difficult to demonstrate time bias is to compare age-specific mortality rates
that treatment of medical conditions picked up rather than survival rates in a screened group of peo-
on screening is more effective than treatment after ple and a control group of similar people who do not
symptoms appear. On the other hand, when lead get screened, as in a randomized trial (Table 10.2).
time is long, as is true for cervical cancer (on average, Screening for breast, lung, and colorectal cancers are
Table 10.2
Avoiding Bias in Screening
D D D
Diagnosis after
S symptoms
Size/stage of tumor
Rapid growth
S
S
Detection possible
by screening
Slow growth
Onset Screened
Time
Figure 10.6 ■ Relationship between length-time bias and speed of tumor growth.
Rapidly growing tumors come to medical attention before screening is performed, whereas
more slowly growing tumors allow time for detection. D = diagnosis after symptoms; S =
detection after screening.
162 Clinical Epidemiology: The Essentials
interested in their health and are generally healthier be made up of similar populations, the control popu-
than non-compliant ones. For example, a random- lation should not have access to screening, and both
ized study that invited people for screening found populations must have careful follow-up to docu-
that volunteers from the control group who were not ment all cases of the outcome being studied.
invited but requested screening had better mortality Because randomized controlled trials and prospec-
rates than the invited group, which contained both tive population-based studies are difficult to conduct,
compliant people who wanted screening and those take a long time, and are expensive, investigators some-
who refused (15). The effect of patient compliance, as times try to use other kinds of studies, such as histori-
distinct from treatment effect, has primarily involved cal cohort studies (Chapter 5) or case control studies
medication adherence in the placebo group, and has (Chapter 6), to investigate preventive maneuvers.
been termed placebo adherence.
Example Example
An analysis was done to determine if health out- To test whether periodic screening with sig-
comes differed in the placebo arm of a random- moidoscopy reduces mortality from colorec-
ized trial among patients who demonstrated tal cancer within the reach of the sigmoido-
different degrees of placebo adherence (16). scope, researchers investigated the frequency
The original study was a tertiary prevention trial of screening sigmoidoscopy over the previous
that randomized asymptomatic patients with left 10 years among patients dying of colorectal
ventricular dysfunction (cardiac ejection fraction cancer and among well patients, matched for
<35%) to active treatment (enalapril) or placebo. age and sex (17). To deal with lead-time and
The analysis showed that after 3 years, patients length-time biases, they investigated screening
randomized to placebo who took at least 75% only in people who were known to have died
of the placebo medication (“highly adherent” (case group) or not to have died (control group)
patients) had half the mortality rate of “lower from colorectal cancer. To deal with compliance
adherent” patients randomized to placebo. Ex- bias, they adjusted their results for the number
planations for this strange result were sought. of general periodic health examinations each
The difference in mortality was not changed after person had. They also adjusted the results for
adjusting for several risk factors, including serious the presence of medical conditions that could
illness. The placebo medication itself was chemi- have led to both increased screening and in-
cally inert. Placebo adherence appears to be a creased likelihood of colorectal cancer. Patients
marker of healthier people even though the un- dying of colorectal cancer in the rectum or
derlying mechanism remains to be determined. distal sigmoid (where tumors could have been
detected by sigmoidoscopy) were less likely to
have undergone a screening sigmoidoscopy in
the previous 10 years (8.8%) than those in the
Biases from length time and patient compliance control group (24.2%). The investigators found
can be avoided by relying on studies that have concur- that sigmoidoscopy followed by early therapy
rent screened and control groups that are comparable. prevented almost 60% of deaths resulting from
In each group, all people experiencing the outcomes colorectal cancer arising in the distal colon or
of interest must be counted, regardless of the method rectum, equivalent to about a 30% reduction in
of diagnosis or degree of participation (Table 10.2). mortality from cancer of the colon and rectum
Randomized trials are the strongest design because as a whole. Also, by showing that there was no
patients who are randomly allocated will have compa- protection for colorectal cancers above the lev-
rable numbers of slow- and fast-growing tumors and, el reached by the sigmoidoscope, the authors
on average, comparable levels of compliance. These suggested that “it is difficult to conceive of
groups then can be followed over time with mortal- how such anatomical specificity of effect could
ity rates, rather than survival rates, to avoid lead-time be explained by confounding.” The findings of
bias. If a randomized trial is not possible, results of this study have been corroborated by two sub-
population-based observational studies can be valid. sequent randomized trials of sigmoidoscopy.
In such cases, the screened and control groups must
Chapter 10: Prevention 163
PERFORMANCE OF
SCREENING TESTS cancer (18). Thirteen of 18 men who were di-
agnosed with prostate cancer within a year af-
Tests used for screening should meet the criteria for ter the blood sample had elevated PSA levels
diagnostic tests laid out in Chapter 8. (>4.0 ng/mL) and would have been diagnosed
The following criteria for a good screening test after an abnormal PSA result; the other five
apply to all types of screening tests, whether they are had normal PSA results and developed interval
history, physical examination or laboratory tests. cancers during the first year after a normal PSA
test. Thus, sensitivity of PSA was calculated as
High Sensitivity and Specificity 13 divided by (13 + 5), or 72%.
Table 10.3
Calculating Sensitivity of a Cancer Screening Test According to the Detection Method
and the Incidence Method
Theoretical Example
A new screening test is introduced for pancreatic cancer. In a screening group, cancer is detected in 200 people; over the
ensuing year, another 50 who had negative screening tests are diagnosed with pancreatic cancer. In a concurrent control
group with the same characteristics and the same size, members did not undergo screening; 100 people were diagnosed
with pancreatic cancer during the year.
Sensitivity of the Test Using the Detection Method
Number of screen-detected cancers
Sensitivity =
Number of screen-detected cancers plus number of interval cancers
200
=
(200 + 50)
= .80 or 80%
Sensitivity of the Test Using the Incidence Method
Sensitivity = 1 – (interval rate in the screening group/incidence rate in the control group)
50
=1−a b = 0.50 or 50%
100
50
Example
40
Several different tests can be used to screen for
colorectal cancer. They include annual fecal oc-
30 cult blood tests, sigmoidoscopy every 5 years,
and colonoscopy every 10 years. The tests vary
greatly in their upfront costs, ranging from
20
$20 for fecal occult blood tests to more than
a $1,000 for screening colonoscopy. However,
the cost per year of life saved by screening is
10 not very different for these tests; all were in an
acceptable range by U.S. standards in 2011 (21).
This is true for several reasons. First, the sim-
pler, cheaper tests have to be done more often.
40–44 50–54 60–69 70–79 80–89 Fecal occult blood tests are recommended year-
45–49 55–59 ly, whereas colonoscopy is recommended every
Age (years) 10 years. The cheaper tests also produce more
Figure 10.7 ■ Yield of abnormal screening mammo- false-positive results that lead to more testing
grams according to patient age. Number of women with- (and, therefore, more costs). Finally, they pro-
out breast cancer for each woman diagnosed with breast duce more false-negative results and miss pa-
cancer among women having an abnormal mammogram tients who actually have cancer. This leads to
when screened for breast cancer. (Data from Carney PA, Mi- increased costs for care of patients with more
glioretti DL, Yankaskas BC, et al. Individual and combined advanced cancer.
effects of age, breast density, and hormone replacement
therapy use on the accuracy of screening mammography.
Ann Intern Med 2003;138:168–175.)
Safety
require an appointment and bowel preparation, are It is reasonable and ethical to accept a certain risk for
best suited for diagnostic testing in patients with diagnostic tests applied to sick patients seeking help for
symptoms and clinical indications. Nevertheless, specific complaints. The physician cannot avoid action
screening colonoscopy has been found to be highly when the patient is severely ill, and does his or her
effective in decreasing colorectal mortality, and a best. It is quite another matter to subject presumably
negative test does not have to be repeated for several well people to risks. In such circumstances, the proce-
years. Other tests, such as visual field testing for the dure should be especially safe. This is partly because
detection of glaucoma and audiograms for the detec- the chances of finding disease in healthy people are so
tion of hearing loss, fall between these two extremes. low. Thus, although colonoscopy is hardly thought of
The financial “cost” of the test depends not only as a dangerous procedure when used on patients with
on the cost of (or charge for) the procedure itself but gastrointestinal complaints, it can cause bowel perfo-
also on the cost of subsequent evaluations performed ration. In fact, when colonoscopy, with a rate of two
on patients with positive test results. Thus sensitiv- perforations per 1,000 examinations, is used to screen
ity, specificity, and predictive value affect cost. Cost is for colorectal cancer in people in their 50s, perfora-
also affected by whether the test requires a special visit tions occur more often than cancers are found.
to the physician. Screening tests performed while the Concerns have been raised about possible long-term
patient is seeing his or her physician for other reasons risks with the increasing use of CT scans to screen for
(as is frequently the case with blood pressure mea- coronary artery disease or, in the case of whole-body
surements) are much cheaper for patients than tests scans, a variety of abnormalities. The radiation dose of
requiring special visits, extra time off work, and addi- CT scans varies by type, with a CT scan for coronary
tional transportation. Cost also is determined by how calcium on average being the equivalent of about 30,
often a screening test must be repeated. and a whole-body scan about 120, chest x-rays. One
166 Clinical Epidemiology: The Essentials
Size
Nonprogressive
Abnormal cell
Time
Death from
other causes
Figure 10.8 ■ Mechanism of overdiagnosis in cancer screening. Note that non-
progressive, as well as some very slow-growing, cancers will never cause clinical harm. When
these cancers are found on screening, overdiagnosis has occurred. Overdiagnosis is an ex-
treme form of length-time bias. (Redrawn with permission from Welsh HG. Should I Be Tested
for Cancer? Maybe Not and Here’s Why. Berkeley, CA: University of California Press; 2004.)
can help determine the amount of overdiagnosis, it is them in prevention. For example, CT scans and mag-
impossible to identify it in an individual patient. netic resonance imaging were developed for diagnostic
purposes in patients with serious complaints or known
Incidentalomas disease and PSA was developed to determine whether
treatment for prostate cancer was successful. All of
Over the past couple of decades, using CT as a
these tests are now commonly used as screening tests,
screening test has become more common. CT has
but most became common in practice without careful
been evaluated rigorously as “virtual colonoscopy” for
evaluation. Only low-dose CT scans for lung cancer
colorectal cancer screening and also for lung cancer
screening underwent careful evaluation prior to wide-
screening. It has been advocated as a screening test
spread use. PSA screening became so common in the
for coronary heart disease (with calcium scores) and
United States that when it was subjected to a careful
for screening in general with full-body CT scans.
randomized trial, more than half the men assigned to
Unlike most screening tests, CT often visualizes much
the control arm had a PSA test during the course of a
more than the targeted area of interest. For example,
trial. When tests are so commonly used, it is difficult
CT colonography visualizes the abdomen and lower
to determine rigorously whether they are effective.
thorax. In the process, abnormalities are sometimes
Over time, improvements in screening tests, treat-
detected outside the colon. Masses or lesions detected
ments, and vaccinations may change the need for
incidentally by an imaging examination are called
screening. As indicated earlier, effective secondary
incidentalomas.
prevention is a two-step process; a good screening test
followed by a good treatment for those found to have
disease. Changes in either one may affect how well
screening works in preventing disease. At one extreme,
Example a highly accurate screening test will not help prevent
adverse outcomes of a disease if there is no effective
A systematic review of 17 studies found that
therapy. Screening tests for HIV preceded the devel-
incidentalomas were common in CT colonog-
opment of effective HIV therapy; therefore, early in
raphy; 40% of 3,488 patients had at least one
the history of HIV, screening could not prevent dis-
incidental finding, and 14% had further evalu-
ease progression in people with HIV. With the devel-
ation (31). Workups uncovered early stage non-
opment of increasingly effective treatments, screening
colonic cancers in about 1% of patients and
for HIV has increased. At the other extreme, a highly
abdominal aortic aneurysms >5.5 cm in size in
effective treatment may make screening unnecessary.
about 0.1%. Therefore, incidentalomas were
With modern therapy the 10-year survival rate of tes-
very common but included some potentially
ticular cancer is about 85%, so high that it would be
important conditions. There was no informa-
difficult to show improvement with screening for this
tion about how many patients who did not
rare cancer. As HPV vaccination for prevention of
receive further evaluation went on to suffer
cervical cancer becomes more widespread and if it is
serious consequences. Also, most of the de-
able to cover all carcinogenic types of HPV, the need
tected cancers were ones for which screening is
for cervical cancer screening should decrease over
not recommended. With inadequate evidence
time. Some recent studies of mammography screen-
about benefits and likelihood of increased
ing have not found the anticipated mortality benefits
costs in pursuit of extracolonic findings, the
seen in earlier studies, partly because breast cancer
Center for Medicare and Medicaid Services
mortality among women not screened was lower than
declined to cover screening CT colonography in
in the past, probably due to improved treatments.
the Medicare population.
Thus, with the introduction of new therapies and
screening tests, effectiveness of screening will change
and on-going re-evaluation is necessary.
CHANGES IN SCREENING TESTS
AND TREATMENTS OVER TIME
WEIGHING BENEFITS AGAINST
Careful evaluation of screening has been particularly HARMS OF PREVENTION
difficult when tests approved for diagnostic purposes
are then used as screening tests before evaluation in a How can the many aspects of prevention covered in
screening study, a problem analogous to using thera- this chapter be combined to make a decision whether
peutic interventions for prevention without evaluating to include a preventive intervention in clinical practice?
170 Clinical Epidemiology: The Essentials
Conceptually, the decision should be based on the Cost-effectiveness analysis is a method for
weighing the magnitude of benefits against the magni- assessing the costs and health benefits of an inter-
tude of harms that will occur as a result of the action. vention. All costs related to disease occurrence
This approach has become common when making and treatment should be counted, both with and with-
treatment decisions; reports of randomized trials rou- out the preventive activity, as well as all costs related to
tinely include harms as well as benefits. the preventive activity itself. The health benefits
A straightforward approach is to present the ben- of the activity are then calculated, and the incremen-
efits and harms for a particular preventive activity tal cost for each unit of benefit is determined.
in some orderly and understandable way. When-
ever possible, these should be presented using abso-
lute, not relative risks. Figure 10.9 summarizes the
estimated key benefits and harms of annual mam- Example
mography for women in their 40s, 50s, and 60s
(7,32,33). Such an approach can help clinicians and Cervical cancer is caused by persistent infection
patients understand what is involved when mak- of epithelial cells with certain HPV genotypes.
ing the decision to screen. It can also help clarify Vaccines have been developed against carci-
why different individuals and expert groups come to nogenic HPV genotypes that cause approxi-
different decisions about a preventive activity, even mately 70% of all cervical cancers. A random-
when looking at the same set of information. Dif- ized trial found the vaccine highly effective
ferent people put different values on benefits and in preventing precursors of and early cervical
harms (34). cancer in women without evidence of prior
Another approach to weighing benefits and harms HPV infection. A cost-effectiveness analysis was
is a modeling process that expresses both benefits and performed to assess the benefits and costs of
harms in a single metric and then subtracts harms vaccinating girls at age 12 who would then go
from benefits. (The most common metric used is the on to be screened for cervical cancer beginning
quality adjusted life year [QALY]). The advantage in early adulthood (35). The investigators esti-
of this approach is that different types of prevention mated medical costs from a societal perspective
(e.g., vaccinations, colorectal cancer screening, and that includes costs of medical care (e.g., vaccine
tertiary treatment of diabetes) can all be compared to and screening costs as well as diagnostic and
each other, which is important for policymakers with treatment costs for cervical cancer) and indi-
limited resources. The disadvantage is that for most rect costs (e.g., work time taken off by patients
clinicians and policymakers, it is difficult to under- for office visits). They expressed the benefit
stand the process by which benefits and harms are in terms of QALYs gained by the combination
handled. of vaccine plus screening versus no screening
Regardless of the method used in weighing the or vaccine. In the analysis, they varied the age
benefits and harms of preventive activities, the qual- and intervals at which screening would occur
ity of the evidence for each benefit and harm must be as well as the effectiveness of vaccine over
evaluated to prevent the problem of “garbage in, gar- time. Assuming a vaccine efficacy of 90%, the
bage out.” Several groups making recommendations analysis estimated that starting conventional
for clinical prevention have developed explicit meth- Pap smear screening at age 25 and repeating
ods to evaluate the evidence and take into account screening at a 3-year interval would reduce the
the strength of evidence when making their recom- lifetime cervical cancer risk by 94%, at a cost of
mendations (see Chapter 14). $58,500 per QALY. (Upper limits of acceptable
If the benefits of a preventive activity outweigh cost-effectiveness ratios in the United States are
the harms, the final step is to determine the eco- usually set at about $50,000 per QALY gained.)
nomic effect of using it. Some commentators like Screening begun at age 21 and repeated every
to claim that “prevention saves money,” but it does 5 years would reduce the lifetime cancer risk by
so only rarely. (One possible exception is screening 90% at a cost of $57,400 per QALY. In summary,
for colorectal cancer. Chemotherapy for the disease the results suggested that the combination of
has become so expensive that some analyses now find HPV vaccine and screening for cervical cancer
screening for this cancer saves money.) Even so, most can decrease the risk of cancer, is cost-effective,
preventive services recommended by groups who and permits delayed onset of screening and/or
have carefully evaluated the data are as cost-effective decreased frequency.
as other clinical activities.
Chapter 10: Prevention 171
A
700 ≥1 False-positive
mammogram
615 615
600
500
Number of women
400
400
300
200 ≥1 Needle or
open biopsy Development
100 79 78 80 of breast cancer
38
16 25
0
40 50 60 40 50 60 40 50 60
Years of age at beginning of
the 10-year period
B
45 Development
of breast cancer
40 38
Breast cancer cured
35 by treatment
regardless
30
Number of women
of screening
25 25
25
Diagnosis of ductal
20 17 carcinoma in situ
16 because of
15 mammography
11 Life saved by
9 screening
10 mammography
7
5 5
5 3
1
0
40 50 60 40 50 60 40 50 60 40 50 60
The effort to gather all the information needed recommendation has taken into account the strength
to make a decision whether to conduct a preventive of the evidence. They can also look for estimates of
activity in clinical practice is not something a single cost-effectiveness. With these facts, they should be
clinician can accomplish, but when reviewing recom- able to share with their patients the information they
mendations about prevention, individual clinicians need. Patients can then make an informed decision
can determine if the benefits and harms of the activ- about preventive activities that takes into account the
ity are presented in an understandable way and if the scientific information and their individual values.
Review Questions
For questions 10.1–10.6, read the related B. The positive predictive value of the test
scenarios and select the best answer. was low.
C. The negative predictive value of the test
A study was conducted to determine whether a was low.
fecal occult blood screening test reduced mortality
from colorectal cancer (11). People ages 50 to 60 In a randomized controlled trial of screening chest
years were randomized to the screening test or to a x-rays and sputum cytology for lung cancer, approxi-
control group and followed for 13 years. Over this mately 9,000 men were randomized to screening for
time, there were 323 cancer cases and 82 colorectal 6 years or a control group (16). After 20 years, the
cancer deaths in the 15,570 people randomized lung cancer mortality was the same in both groups
to annual screening; there were 356 cancers and (4.4/1,000 person-years in the screened group and
121 colorectal cancer deaths in the 15,394 people 3.9/1,000 person-years in the control group). How-
randomized to the control group. Investigations ever, the median survival for patients diagnosed with
of positive tests found that about 30% of the lung cancer was 1.3 years in the screened group and
screened group had colon polyps. The sensitivity 0.9 years in the control group. Also, screening found
and specificity of the test for colon cancer were more lung cancer—206 cancers were diagnosed in
both about 90%. the screened group and 160 in the control group.
10.6. What is the most likely reason for the fact that A. People who refuse screening are usually
206 lung cancers were found in the screened healthier than those who accept.
group and only 160 in the control group? B. Volunteers for screening are more likely
to need screening than those who refuse.
A. There were more smokers in the screened
C. Volunteers tend to be more interested
group.
in their health than those who do not
B. Screening found cancers earlier and the
participate in preventive activities.
number of cancers in the control group
will catch up over time.
10.10. All of the following statements are correct
C. Screening picked up some cancers
except:
that would not have come to medical
attention without screening. A. The gold standard for a test used for
diagnosis may be different than that for
the same test when used for screening.
For questions 10.7–10.11, choose the best
B. The incidence method cannot be used to
answer.
calculate sensitivity for cancer screening
tests.
10.7. When assessing a new vaccine, which of the
C. When a screening program is begun,
following is least important:
more people with disease are found on
A. Efficacy in preventing the disease the first round of screening than on later
B. Safety of the vaccine rounds.
C. Danger of the disease
D. Cost of giving the vaccine 10.11. For a cost-effectiveness analysis of a preven-
tive activity, which kinds of costs should be
10.8. When the same test is used in diagnostic and included?
screening situations, which of the following
A. Medical costs, such as those associated
statements is most likely correct?
with delivering the preventive
A. The sensitivity and specificity will likely intervention
be the same in both situations. B. All medical costs, including diagnostic
B. The positive predictive value will be follow up of positive tests and treatment
higher in a screening situation. for persons diagnosed with disease, with
C. Disease prevalence will be higher in the and without the preventive activity
diagnostic situation. C. Indirect costs, such as loss of income due
D. Overdiagnosis is equally likely in both to time off from work, among patients
situations. receiving the prevention and those who
develop the disease
10.9. A study found that volunteers for a new D. Indirect costs for both patients and care
screening test had better health outcomes givers
than people who refused testing. Which of the E. All of the above
following statements most likely explains the
finding? Answers are in Appendix A.
REFERENCES
1. Schappert SM, Rechtsteiner EA. Ambulatory medical care utili- 4. National Center for Health Statistics. Health. United States.
zation estimates for 2007. National Center for Health Statistics. 2010; With special feature on death and dying. Hyattsville,
Vital Health Stat 2011;13(169). Available at http://www.cdc.gov/ MD. 2011.
nchs/data/series/sr_13/sr13_169.pdf. Accessed January 11, 2012. 5. Roe MT, Ohman EM. A new era in secondary prevention after
2. Prevention. 2011. In Meriam-webster.com. Available at http:// acute coronary syndrome. N Engl J Med 2012;366:85–87.
www.merriam-webster.com/dictionary/prevention. Accessed 6. Harris R, Sawaya GF, Moyer VA, et al. Reconsidering the cri-
January 13, 2012. teria for evaluating proposed screening programs: reflections
3. Siegel R, Ward E, Brawley O, et al. Cancer statistics, 2011. from 4 current and former members of the U.S. Preventive
The impact of eliminating socioeconomic and racial dispari- Services Task Force. Epidemiol Rev 2011;33:20–35.
ties on premature cancer deaths. CA Cancer J Clin 2011;61: 7. Howlader N, Noone AM, Krapcho M, et al. (eds). SEER Cancer
212–236. Statistics Review, 1975-2008, National Cancer Institute. Bethesda,
174 Clinical Epidemiology: The Essentials
MD. Available at http://seer.cancer.gov/csr/1975_2008/, based 21. Lansdorp-Vogelaar I, Knudsen AB, Brenner H. Cost-effective-
on November 2010 SEER data submission, posted to the SEER ness of colorectal cancer screening. Epi Rev 2011;33:88–100.
Web site, 2011. Accessed January 13, 2012. 22. Berrington de González A, Mahesh M, Kim K-P, et al. Radia-
8. Chang MH, You SL, Chen CJ, et al. Decreased incidence of tion dose associated with common computed tomography
hepatocellular carcinoma in hepatitis B vaccinees: a 20 year examinations and the associated lifetime attributable risk of
follow-up study. J Natl Cancer Inst 2009;101:1348–1355. cancer. Arch Intern Med 2009;169:2071–2077.
9. Greene SK, Rett M, Weintraub ES, et al. Risk of confirmed 23. D’Orsi CD, Shin-Ping Tu, Nakano C. Current realities of
Guillain-Barré Syndrome following receipt of monovalent delivering mammography services in the community: do chal-
inactivated influenza A (H1N1) and seasonal influenza vac- lenges with staffing and scheduling exist? Radiology 2005;235:
cines in the Vaccine Safety Datalink Project, 2009-2010. Am 391–395.
J Epidemiol 2012;175:1100–1109. 24. Buys SS, Partridge E, Black A, et al. Effect of screening on
10. Fiore MC, Jaén CR, Baker TB, et al. Treating tobacco use ovarian cancer mortality. The Prostate, Lung, Colorectal and
and dependence: 2008 Update. Clinical Practice Guideline. Ovarian (PLCO) cancer screening randomized controlled
Rockville, MD: U.S. Department of Health and Human Ser- trial. JAMA 2011;305:2295–2303.
vices. Public Health Service. May 2008. 25. Meador CK. The last well person. N Engl Med J 1994;330:
11. Mandel JS, Bond JH, Church TR, et al. (for the Minnesota 440–441.
Colon Cancer Control Study). Reducing mortality from 26. Croswell JM, Baker SG, Marcus PM, et al. Cumulative inci-
colorectal cancer by screening for fecal occult blood. N Engl dence of false-positive test results in lung cancer screening: a
J Med 1993;328:1365–1371. randomized trial. Ann Intern Med 2010;152:505–512.
12. Marcus PM, Bergstralh EJ, Fagerstrom RM, et al. Lung can- 27. Croswell JM, Kramer BS, Kreimer AR. Cumulative incidence
cer mortality in the Mayo Lung Project: impact of extended of false-positive results in repeated multimodal cancer screen-
follow-up. J Natl Cancer Inst. 2000;92:1308–1316. ing. Ann Fam Med 2009;7:212–222.
13. The National Lung Screening Trial Research Team. Reduced 28. Fowler FJ, Barry MJ, Walker-Corkery BS. The impact of a sus-
lung-cancer mortality with low-dose computed tomographic picious prostate biopsy on patients’ psychological, socio-behav-
screening. N Engl J Med 2011;365:395–409. ioral, and medical care outcomes. J Gen Intern Med 2006; 21:
14. Gǽde P, Lund-Andersen H, Hans-Henrik P, et al. Effect of a 715–721.
multifactorial intervention on mortality in type 2 diabetes. N 29. Schilling FH, Spix C, Berthold F, et al. Neuroblastoma screen-
Engl J Med 2008;358:580–591. ing at one year of age. N Engl J Med 2002;346:1047–1053.
15. Friedman GD, Collen MF, Fireman BH. Multiphasic health 30. Woods WG, Gao R, Shuster JJ, et al. Screening of infants and
checkup evaluation: a 16-year follow-up. J Chron Dis 1986;39: mortality due to neuroblastoma. N Engl J Med 2002;346:
453–463. 1041–1046.
16. Avins AL, Pressman A, Ackerson L, et al. Placebo adherence 31. Xiong T, Richardson M, Woodroffe R, et al. Incidental lesions
and its association with morbidity and mortality in the studies found on CT colonography: their nature and frequency. Br J
of left ventricular dysfunction. J Gen Intern Med 2010;25: Radiol 2005;78:22–29.
1275–1281. 32. Hubbard RA, Kerlikowske K, Flowers CI, et al. Cumulative
17. Selby JV, Friedman GD, Quesenberry CP, et al. A case-control probability of false-positive recall or biopsy recommendation
study of screening sigmoidoscopy and mortality from colorec- after 10 years of screening mammography. Ann Intern Med
tal cancer. N Eng J Med 1992;326:653–657. 2011;155:481–492.
18. Gann PH, Hennekens CH, Stampfer MJ. A prospective evalu- 33. Mandelblatt JS, Cronin KA, Bailey S, et al. Effects of mam-
ation of plasma prostate-specific antigen for detection of pros- mography screening under different screening schedules:
tatic cancer. JAMA 1995;273:289–294. model estimates of potential benefits and harms. Ann Intern
19. Delongchamps NB, Singh A, Haas GP. The role of prevalence Med 2009;151:738–747.
in the diagnosis of prostate cancer. Cancer Control 2006;13: 34. Gillman MW, Daniels SR. Is universal pediatric lipid screen-
158–168. ing justified? JAMA 2012;307:259–260.
20. Carney PA, Miglioretti DL, Yankaskas BC, et al. Individual 35. Goldie SJ, Kohli M, Grima D. Projected clinical benefits and
and combined effects of age, breast density, and hormone cost-effectiveness of a human papillomavirus 16/18 vaccine. J
replacement therapy use on the accuracy of screening mam- Natl Cancer Inst 2004;96:604–615.
mography. Ann Intern Med 2003;138:168–175.
Chapter 11
Chance
It is a common practice to judge a result significant, if it is of such a magnitude that it
would have been produced by chance not more frequently than once in twenty trials. This
is an arbitrary, but convenient, level of significance for the practical investigator, but it
does not mean that he allows himself to be deceived once in every twenty experiments.
—Ronald Fisher
1929 (1)
(called the “null hypothesis”) that there is no dif- more effective. Error of this kind, resulting in a “false-
ference. This traditional way of assessing the role of positive” conclusion that the treatment is effective, is
chance, associated with the familiar “P value,” has referred to as a type I error or ␣ error, the probability
been popular since statistical testing was introduced of saying that there is a difference in treatment effects
at the beginning of the 20th century. The hypothesis when there is not. On the other hand, the new treat-
testing approach leads to dichotomous conclusions: ment might be more effective, but the study concludes
Either an effect is present or there is insufficient evi- that it is not. This “false-negative” conclusion is called
dence to conclude an effect is present. a type II error or  error—the probability of saying
The other approach, called estimation, uses sta- that there is no difference in treatment effects when
tistical methods to estimate the range of values that is there is. “No difference” is a simplified way of saying
likely to include the true value—of a rate, measure of that the true difference is unlikely to be larger than a
effect, or test performance. This approach has gained certain size, which is considered too small to be of prac-
popularity recently and is now favored by most medi- tical consequence. It is not possible to establish that
cal journals, at least for reporting main effects, for there is no difference at all between two treatments.
reasons described below. Figure 11.1 is similar to 2 × 2 tables comparing
the results of a diagnostic test to the true diagnosis
HYPOTHESIS TESTING (see Chapter 8). Here, the “test” is the conclusion of
a clinical trial based on a statistical test of results from
In the usual situation, the principal conclusions of the trial’s sample of patients. The “gold standard” for
a trial are expressed in dichotomous terms, such as validity is the true difference in the treatments being
a new treatment is either better or not better than compared—if it could be established, for example, by
usual care, corresponding to the results being either making observations on all patients with the illness
statistically significant (unlikely to be purely by or a large number of samples of these patients. Type I
chance) or not. There are four ways in which the sta- error is analogous to a false-positive test result, and
tistical conclusions might relate to reality (Fig. 11.1). type II error is analogous to a false-negative test result.
Two of the four possibilities lead to correct con- In the absence of bias, random variation is responsible
clusions: (i) The new treatment really is better, and for the uncertainty of a statistical conclusion.
that is the conclusion of the study; and (ii) the treat- Because random variation plays a part in all obser-
ments really have similar effects, and the study con- vations, it is an oversimplification to ask whether
cludes that a difference is unlikely. chance is responsible for the results. Rather, it is a
question of how likely random variation is to account
False-Positive and False-Negative for the findings under the particular conditions of the
Statistical Results study. The probability of error due to random varia-
tion is estimated by means of inferential statistics,
There are also two ways of being wrong. The new a quantitative science that, given certain assumptions
treatment and usual care may actually have similar about the mathematical properties of the data, is
effects, but it is concluded that the new treatment is the basis for calculations of the probability that the
results could have occurred by chance alone.
Statistics is a specialized field with its own jargon
TRUE
(e.g., null hypothesis, variance, regression, power,
DIFFERENCE
and modeling) that is unfamiliar to many clini-
Present Absent cians. However, leaving aside the genuine complex-
ity of statistical methods, inferential statistics should
Type I (α) be regarded by the non-expert as a useful means to
Significant Correct an end. Statistical testing is a means by which the
CONCLUSION error
OF effects of random variation are estimated.
STATISTICAL The next two sections discuss type I and type II
TEST Not Type II
Correct errors and place hypothesis testing, as it is used to
significant ( β) error estimate the probabilities of these errors, in context.
Figure 11.1 ■ The relationship between the results of Concluding That a Treatment Works
a statistical test and the true difference between two
treatment groups. (Absent is a simplification. It really means Most statistics encountered in the medical litera-
that the true difference is not greater than a specified amount.) ture concern the likelihood of a type I error and are
Chapter 11: Chance 177
expressed by the familiar P value. The P value is a own preferences for what is statistically significant.
quantitative estimate of the probability that differ- However, P values >1 in 5 are usually reported as
ences in treatment effects in the particular study at simply P > 0.20, because nearly everyone can agree
hand could have happened by chance alone, assum- that a probability of a type I error >1 in 5 is unaccept-
ing that there is in fact no difference between the ably high. Similarly, below very low P values (e.g.,
groups. Another way of expressing this is that P is an P < 0.001) chance is a very unlikely explanation for
answer to the question, “If there were no difference an observed difference, and little further information
between treatment effects and the trial was repeated is conveyed by describing this chance more precisely.
many times, what proportion of the trials would con- Another approach is to accept the primacy of P ≤
clude that the difference between the two treatments 0.05 and describe results that come close to that
was at least as large as that found in the study?” standard with terms such as “almost statistically sig-
In this presentation, P values are called Pα, to dis- nificant,” “did not achieve statistical significance,”
tinguish them from estimates of the other kind of “marginally significant,” or “a trend.” These value-
error resulting from random variation, type II errors, laden terms suggest that the finding should have been
which are referred to as Pβ. When a simple P is found statistically significant but for some annoying reason
in the scientific literature, it ordinarily refers to Pα. was not. It is better to simply state the result and exact
The kind of error estimated by Pα applies whenever P value (or point estimate and confidence interval,
one concludes that one treatment is more effective see below) and let the reader decide for him or herself
than another. If it is concluded that the Pα exceeds how much chance could have accounted for the result.
some limit (see below) so there is no statistical differ-
ence between treatments, then the particular value of Statistical Significance and
Pα is not as relevant; in that situation, Pβ (probability Clinical Importance
of type II error) applies.
A statistically significant difference, no matter how
small the P, does not mean that the difference is clini-
Dichotomous and Exact P Values cally important. A P value of <0.0001, if it emerges
It has become customary to attach special significance from a well-designed study, conveys a high degree
to P values below 0.05 because it is generally agreed of confidence that a difference really exists but says
that a chance of <1 in 20 is a small enough risk of nothing about the magnitude of that difference or its
being wrong. A chance of 1 in 20 is so small, in fact, clinical importance. In fact, trivial differences may be
that it is reasonable to conclude that such an occur- highly statistically significant if a large enough num-
rence is unlikely to have arisen by chance alone. It ber of patients are studied.
could have arisen by chance, and 1 in 20 times it will,
but it is unlikely.
Differences associated with Pα < 0.05 are called Example
statistically significant. However, setting a cutoff
point at 0.05 is entirely arbitrary. Reasonable people The drug donepezil, a cholinesterase inhibitor,
might accept higher values or insist on lower ones, was developed for the treatment of Alzheimer
depending on the consequences of a false-positive con- disease. In a randomized controlled trial to es-
clusion in a given situation. For example, one might tablish whether the drug produced worthwhile
be willing to accept a higher chance of a false-positive improvements, 565 patients with Alzheimer
statistical test if the disease is severe, there is currently disease were randomly allocated to donepezil
no effective treatment, and the new treatment is safe. or placebo (3). The statistical significance of
On the other hand, one might be reluctant to accept a some trial end points was impressive: Both the
false-positive test if usual care is effective and the new mini-mental state examination and the Bristol
treatment is dangerous or much more expensive. This Activities of Daily Living Scale were statistically
reasoning is similar to that applied to the importance different at P < 0.0001. However, the actual dif-
of false-positive and false-negative diagnostic tests ferences were small, 0.8 on a 30-point scale for
(Chapter 8). the mini-mental state examination and 1 on a
To accommodate various opinions about what is 60-point scale for the Bristol Activities of Daily
and is not unlikely enough, some researchers report Living Scale. Moreover, other outcomes, which
the exact probabilities of P (e.g., 0.03, 0.07, 0.11), more closely represented the burden of illness
rather than lumping them into just two categories and care of these patients, were similar in the
(≤0.05 or >0.05). Users are then free to apply their
178 Clinical Epidemiology: The Essentials
Table 11.1
donepezil and placebo groups. These included Some Statistical Tests Commonly Used in
entering institutional care and progression of Clinical Research
disability (both primary end points) as well
as behavioral and psychological symptoms, Test When Used
caregiver psychopathology, formal care costs,
To Test the Statistical Significance of a Difference
unpaid caregiver time, and adverse events or
death. The authors concluded that the benefits Chi square (χ2) Between two or more
of donepezil were “below minimally relevant proportions (when there are a
large number of observations)
thresholds.”
Fisher’s exact Between two proportions (when
there are a small number of
observations)
On the other hand, very unimpressive P values
can result from studies with strong treatment effects Mann-Whitney U Between two medians
if there are few patients in the study. Student t Between two means
F test Between two or more means
Statistical Tests
To Describe the Extent of Association
Statistical tests are used to estimate the probability Regression Between an independent
of a type I error. The test is applied to the data to coefficient (predictor) variable and a
obtain a numerical summary for those data called a dependent (outcome) variable
test statistic. That number is then compared to a sam-
Pearson’s r Between two variables
pling distribution to come up with a probability of
a type I error (Fig. 11.2). The distribution is under To Model the Effects of Multiple Variables
the null hypothesis, the proposition that there is Logistic regression With a dichotomous outcome
no true difference in outcome between treatment Cox proportional With a time-to-event outcome
groups. This device is for mathematical reasons, not hazards
because “no difference” is the working scientific hypoth-
esis of the investigators conducting the study. One ends
up rejecting the null hypothesis (concluding there is a The chi-square (χ2) test for nominal data (counts)
difference) or failing to reject it (concluding that there is more easily understood than most and can be used
is insufficient evidence in support of a difference). Note to illustrate how statistical testing works. The extent
that not finding statistical significance is not the same as to which the observed values depart from what would
there being no difference. Statistical testing is not able have been expected if there were no treatment effect is
to establish that there is no difference at all. used to calculate a P value.
Some commonly used statistical tests are listed in
Table 11.1. The validity of many tests depends on
certain assumptions about the data; a typical assump- Example
tion is that the data have a normal distribution. If the
data do not satisfy these assumptions, the resulting Cardiac arrest outside the hospital has a poor
P value may be misleading. Other statistical tests, outcome. Animal studies suggested that hypo-
called non-parametric tests, do not make assump- thermia might improve neurologic outcomes.
tions about the underlying distribution of the data. To test this hypothesis in humans, 77 patients
A discussion of how these statistical tests are derived who remained unconscious after resuscita-
and calculated and of the assumptions on which they tion from out-of-hospital cardiac arrest were
rest can be found in any biostatistics textbook.
Estimate of probability
Test
Data that observed value could
statistic
be by chance alone
Statistical Compare to
test standard
distribution
Figure 11.2 ■ Statistical testing.
Chapter 11: Chance 179
trials have misrepresented the truth because these Visual presentation of negative results can be con-
particular studies had the bad luck to turn out in a vincing. Alternatively, one can examine confidence
relatively unlikely way? intervals (see Point Estimates and Confidence Inter-
vals, below) and learn a lot about whether the study
was large enough to rule out clinically important dif-
ferences if they existed.
Example Of course, reasons for false-negative results other
than chance also need to be considered: biologic rea-
One of the examples in Chapter 9 was a ran- sons such as too short follow-up or too small dose
domized controlled trial of the effects on car- of niacin, as well as study limitations such as non-
diovascular outcomes of adding niacin versus compliance and missed outcome events.
placebo in patients with lipid abnormalities Type II errors have received less attention than
who were already taking statin drugs (5). It was type I errors for several reasons. They are more dif-
a “negative” trial: Primary outcomes occurred ficult to calculate. Also, most professionals simply
in 16.4% of patients taking niacin and 16.2% in prefer things that work and consider negative results
patients taking placebo, and the authors con- unwelcome. Authors are less likely to submit nega-
cluded that “there was no incremental clinical tive studies to journals and when negative studies
benefit from the addition of niacin to statin are reported at all, the authors may prefer to empha-
therapy.” The statistical question associated size subgroups of patients in which treatment dif-
with this assertion is: How likely was it that the ferences were found. Authors may also emphasize
study found no benefit when there really is reasons other than chance to explain why true differ-
one? After all, there were only a few hundred ences might have been missed. Whatever the reason
cardiovascular outcome events in the study so for not considering the probability of a type II error,
the play of chance might have obscured treat- it is the main question that should be asked when the
ment effects. Figure 11.3 shows time-to-event results of a study are interpreted as “no difference.”
curves for the primary outcome in the two
treatment groups. Patients in the niacin and HOW MANY STUDY PATIENTS
control groups had remarkably similar curves ARE ENOUGH?
throughout follow-up, making a protective ef-
fect of niacin implausible. Suppose you are reading about a clinical trial that
compares a promising new treatment to usual care
50
Cumulative percent of patients
with primary outcome
40
30
10
Placebo plus statin
0
0 1 2 3 4
Years
Number at risk
Niacin plus statin 1,718 1,606 1,366 903 428
Placebo plus statin 1,696 1,581 1,381 910 436
and finds no difference. You are aware that random to detect the smallest degree of improvement that
variation can be the reason for whatever differences would be clinically meaningful?” On the other hand,
are or are not observed, and you wonder if the num- if one is interested in detecting only very large dif-
ber of patients in this study is large enough to make ferences between treated and control groups (i.e.,
chance an unlikely explanation for what was found. strong treatment effects) then fewer patients need to
Alternatively, you may be planning to do such a study be studied.
and have the same question. Either way, you need to
understand how many patients would be needed Type I Error
to make a strong comparison of the effects of the two
Sample size is also related to the risk of a type I error
treatments?
(concluding that treatment is effective when it is not).
The acceptable probability for a risk of this kind is a
Statistical Power value judgment. If one is prepared to accept the con-
The probability that a study will find a statistically sequences of a large chance of falsely concluding that
significant difference when a difference really exists is the treatment is effective, one can reach conclusions
called the statistical power of the study. Power and with fewer patients. On the other hand, if one wants
Pβ are complementary ways of expressing the same to take only a small risk of being wrong in this way,
concept. a larger number of patients will be required. As dis-
cussed earlier, it is customary to set Pα at 0.05 (1 in
Statistical power = 1 – Pβ 20) or sometimes 0.01 (1 in 100).
Power is analogous to the sensitivity of a diagnos-
tic test. One speaks of a study being powerful when it Type II Error
has a high probability of detecting differences when The chosen risk of a type II error is another determi-
treatments really do have different effects. nant of sample size. An acceptable probability of this
error is also a judgment that can be freely made and
Estimating Sample Size changed to suit individual tastes. Probability of Pβ is
Requirements often set at 0.20, a 20% chance of missing true differ-
ences in a particular study. Conventional type II errors
From the point of view of hypothesis testing of
are much larger than type I errors, reflecting a higher
nominal data (counts), an adequate sample size
depends on four characteristics of the study: the value placed on being sure an effect is really present
magnitude of the difference in outcome between when it is said to be.
treatment groups, Pα and Pβ (the probability of the
false-positive and false-negative conclusions you Characteristics of the Data
are willing to accept), and the underlying outcome The statistical power of a study is also determined by
rate. the nature of the data. When the outcome is expressed
These determinants of adequate sample size by counts or proportions of events or time-to-event,
should be taken into account when investigators plan its statistical power depends on the rate of events: The
a study, to ensure that the study will have enough sta- larger the number of events, the greater the statistical
tistical power to produce meaningful results. To the power for a given number of people at risk. As Peto
extent that investigators have not done this well, or et al. (6) put it:
some of their assumptions were found to be inaccu-
rate, readers need to consider the same issues when In clinical trials of time to death (or of the time to
interpreting study results. some other particular “event”—relapse, metastasis,
first thrombosis, stroke. recurrence, or time to death
from a particular cause—the ability of the trial to
Effect Size distinguish between the merits of two treatments
Sample size depends on the magnitude of the dif- depends on how many patients die (or suffer a rel-
ference to be detected. One is free to look for dif- evant event), rather than on the number of patients
ferences of any magnitude and of course one hopes entered. A study of 100 patients, 50 of whom die, is
to be able to detect even very small differences, but about as sensitive as a study with 1,000 patients, 50
of whom die.
more patients are needed to detect small differ-
ences, everything else being equal. Therefore, it is If the data are continuous, such as blood pres-
best to ask, “What is a sufficient number of patients sure or serum cholesterol, power is affected by the
182 Clinical Epidemiology: The Essentials
Table 11.2
Determinants of Sample Size
Determined by
Date Type
Investigator Means Counts
1 1
Sample size varies according to: OR Variability
Effect size, Pα, Pβ Outcome rate
500
For most of the therapeutic questions encountered
today, a surprisingly large sample size is required. The
value of dramatic, powerful treatments, such as anti-
biotics for pneumonia or thyroid replacement for
hypothyroidism, was established by clinical expe-
rience or studying a small number of patients, but 0 20 40 60 80 100
such treatments come along rarely and many of them Proportional reduction in event rate (%)
are already well established. We are left with diseases, Figure 11.4 ■ The number of people required in each
many of which are chronic and have multiple, inter- of two treatment groups (of equal size), for various
acting causes, for which the effects of new treatments rates of outcome events in the untreated group, to
are generally small. This makes it especially important have an 80% chance of detecting a difference (P ⫽ 0.05)
to plan clinical studies that are large enough to distin- in reduction in outcome event rates in treated relative
guish real from chance effects. to untreated patients. (Calculated from formula in Weiss
Figure 11.4 shows the relationship between sam- NS. Clinical epidemiology. The study of the outcome of illness.
ple size and treatment difference for several baseline New York: Oxford University Press; 1986.)
rates of outcome events. Studies involving fewer than
100 patients have a poor chance of detecting statisti-
cally significant differences for even large treatment Statistical precision is expressed as a confidence
effects. Looked at another way, it is difficult to detect interval, usually the 95% confidence interval,
effect sizes of <25%. In practice, statistical power can around the point estimate. Confidence intervals
be estimated by means of readily available formulas, are interpreted as follows: If the study is unbiased,
tables, nomograms, computer programs, or Web sites. there is a 95% chance that the interval includes the
true effect size. The more narrow the confidence
interval, the more certain one can be about the size of
POINT ESTIMATES AND the true effect. The true value is most likely to be close
CONFIDENCE INTERVALS to the point estimate, less likely to be near the outer
limits of the interval, and could (5 times out of 100)
The effect size that is observed in a particular study fall outside these limits altogether. Statistical preci-
(such as treatment effect in a clinical trial or relative sion increases with the statistical power of the study.
risk in a cohort study) is called the point estimate
of the effect. It is the best estimate from the study of
the true effect size and is the summary statistic usually
given the most emphasis in reports of research. Example
However, the true effect size is unlikely to be
exactly that observed in the study. Because of ran- The Women’s Health Initiative included a
dom variation, any one study is likely to find a result randomized controlled trial of the effects of
higher or lower than the true value. Therefore, a sum- estrogen plus progestin on chronic disease out-
mary measure is needed for the statistical precision comes in healthy postmenopausal women (9).
of the point estimate, the range of values likely to Figure 11.5 shows relative risk and confidence
encompass the true effect size.
184 Clinical Epidemiology: The Essentials
1 1 1 1
Risk =
100 1,000 10,000 100,000
1.0
Probability of detection
0.8
0.6
0.4
0.2
is no longer a need to estimate effect size, outcome needed to detect rare events such as uncommon
event rates, and variability among patients because side effects and complications. For that, a different
they are all known. Rather, attention should be approach, involving many more patients, is needed.
directed to point estimates and confidence intervals. An example is postmarketing surveillance of a drug,
With them, one can see the range of values that are in which thousands of users are monitored for side
consistent with the results and whether the effect sizes effects.
of interest are within this range or are ruled out by Figure 11.6 shows the probability of detecting an
the data. In the niacin study, summarized earlier as event as a function of the number of people under
an example of a negative trial, the hazard ratio was observation. A rule of thumb is: To have a good
1.02 and the 95% confidence interval was 0.87 to chance of detecting a 1/x event, one must observe 3x
1.21, meaning that the results were consistent with people (15). For example, to detect at least one event
a small degree of benefit or harm. Whether this mat- if the underlying rate is 1/1,000, one would need to
ters depends on the clinical importance attached to observe 3,000 people.
a difference in rates as large as represented by this
confidence interval.
MULTIPLE COMPARISONS
DETECTING RARE EVENTS The statistical conclusions of research have an aura of
authority that defies challenge, particularly by non-
It is sometimes important to know how likely a experts. However, as many skeptics have suspected, it
study is to detect a relatively uncommon event (e.g., is possible to “lie with statistics” even if the research
1/1,000), particularly if that event is severe, such as is well designed, the mathematics flawless, and the
bone marrow failure or life-threatening arrhythmia. investigators’ intentions beyond reproach.
A great many people must be observed in order to Statistical conclusions can be misleading because
have a good chance of detecting even one such event, the strength of statistical tests depends on the num-
much less to establish a relatively stable estimate of ber of research questions considered in the study and
its frequency. For most clinical research, sample size when those questions were asked. If many compari-
is planned to be sufficient to detect main effects, the sons are made among the variables in a large set of
answer sought for the primary research question. data, the P value associated with each individual com-
Sample size is likely to be well short of the number parison is an underestimate of how often the result of
186 Clinical Epidemiology: The Essentials
Age
<65 yr 1,714 19 (2.0) 7 (0.7)
65 to <75 yr 1,987 28 (2.7) 24 (2.0)
≥75 yr 1,897 66 (6.1) 20 (2.0)
Age
Female 2,321 64 (4.9) 25 (1.9)
Male 3,277 49 (2.7) 26 (1.4)
Estimated GFR
<50 mL/min 1,198 36 (5.8) 16 (2.5)
50 to <80 mL/min 2,374 59 (4.5) 22 (1.7)
≥80 mL/min 2,021 18 (1.6) 13 (1.1)
CHADS2 score
0–1 2,026 18 (1.6) 10 (0.9)
2 1,999 40 (3.7) 25 (2.1)
≥3 1,570 55 (6.3) 16 (1.9)
Heart failure
No 3,428 66 (3.6) 28 (1.5)
Yes 2,171 45 (3.8) 23 (1.8)
Table 11.4 taken into account, there would only be, at most,
Guidelines for Deciding Whether about 15 patients in each subgroup; if patients were
Apparent Differences in Effects within unevenly distributed among subgroups, there would
Subgroups Are Reala be even fewer in some.
What is needed then, in addition to tables show-
From the study itself: ing multiple subgroups, is a way of examining the
• Is the magnitude of the observed difference clinically effects of several variables together. This is accom-
important? plished by multivariable modeling—developing a
• How likely is the effect to have arisen by chance, mathematical expression of the effects of many vari-
taking into account: ables taken together. It is “multivariable” because it
the number of subgroups examined? examines the effects of multiple variables simultane-
the magnitude of the P value? ously. It is “modeling” because it is a mathematical
• Was a hypothesis that the effect would be observed construct, calculated from the data based on assump-
made before its discovery (or was justification for the tions about characteristics of the data (e.g., that the
effect argued for after it was found)?
variables are all normally distributed or all have the
• Was it one of a small number of hypotheses?
same variance).
From other information: Mathematical models are used in two general ways
• Was the difference suggested by comparisons within in clinical research. One way is to study the indepen-
rather than between studies? dent effect of one variable on outcome while taking
• Has the effect been observed in other studies? into account the effects of other variables that might
• Is there direct evidence that supports the existence confound or modify this relationship (discussed
of the effect?
under multivariable adjustment in Chapter 5). The
a
Adapted from Oxman AD, Guyatt GH. A consumer’s guide to second way is to predict a clinical event by calculating
subgroup analysis. Ann Intern Med 1992;116:78–84.
the combined effect of several variables acting together
(introduced in concept under Clinical Prediction
Rules in Chapter 7).
tend to be related to each other biologically (and
The basic structure of a multivariable model is:
as a consequence statistically), as is the case in the
above example where stroke and systemic embolism Outcome variable = constant + (β1 × variable1)
are different manifestations of the same clinical + (β2 × variable2) + . . .,
phenomenon.
where β1, β2, . . . are coefficients determined by the
data, and variable1, variable2, . . . are the variables that
MULTIVARIABLE METHODS might be related to outcome. The best estimates of
the coefficients are determined mathematically and
Most clinical phenomena are the result of many vari-
depend on the powerful calculating ability of modern
ables acting together in complex ways. For example,
computers.
coronary heart disease is the joint result of lipid
Modeling is done in many different ways, but
abnormalities, hypertension, cigarette smoking, fam-
some elements of the process are basic.
ily history, diabetes, diet, exercise, inflammation,
coagulation abnormalities, and perhaps personality. It 1. Identify all the variables that might be related to
is appropriate to try to understand these relationships the outcome of interest either as confounders or
by first examining relatively simple arrangements of effect modifiers. As a practical matter, it may not
the data, such as stratified analyses that show whether be possible to actually measure all of them and the
the effect of one variable is changed by the presence missing variables should be mentioned explicitly
or absence of one or more of the other variables. It is as a limitation.
relatively easy to understand the data when they are 2. If there are relatively few outcome events, the
displayed in this way. number of variables to be considered in the model
However, as mentioned in Chapter 7, it is usu- might need to be reduced to a manageable size,
ally not possible to account for more than a few vari- usually no more than several. Often this is done by
ables using this method because there are not enough selecting variables that, when taken one at a time,
patients with each combination of characteristics to are most strongly related to outcome. If a statis-
allow stable estimates of rates. For example, if 120 tical criterion is used at this stage, it is usual to
patients were studied, 60 in each treatment group, err on the side of including variables, for example,
and just one additional dichotomous variable was by choosing all variables showing an association
190 Clinical Epidemiology: The Essentials
with the outcome of interest at a cutoff level of Some commonly used kinds of models are logis-
P < 0.10. Evidence for the biologic importance tic regression (for dichotomous outcome variables
of the variable is also considered in making the such as those that occur in case-control studies) and
selection. Cox proportional hazards models (for time-to-event
3. Models, like other statistical tests, are based on studies).
assumptions about the structure of the data. Inves- Multivariable modeling is an essential strat-
tigators need to check whether these assumptions egy for dealing with the joint effects of multiple
are met in their particular data. variables. There is no other way to adjust for or
4. As for the actual models, there are many kinds to include many variables at the same time. How-
and many strategies that can be followed within ever, this advantage comes at a price. Models tend
models. All variables—exposure, outcome, and to be black boxes, and it is difficult to “get inside”
covariates—are entered in the model, with the them and understand how they work. Their validity
order determined by the research question. For is based on assumptions about the data that may
example, if some are to be controlled for in a not be met. They are clumsy at recognizing effect
causal analysis, they are entered in the model modification. An exposure variable may be strongly
first, followed by the variable of primary inter- related to outcome yet not appear in the model
est. The model will then identify the independent because it occurs rarely—and there is little direct
effect of the variable of primary interest. On the information on the statistical power of the model
other hand, if the investigator wants to make a for that variable. Finally, model results are easily
prediction based on several variables, the relative affected by quirks in the data, the results of ran-
strength of their association to the outcome vari- dom variation in the characteristics of patients from
able is determined by the model. sample to sample. It has been shown, for example,
that a model frequently identified a different set of
predictor variables and produced a different order-
ing of variables on different random samples of the
same dataset (20).
Example For these reasons, the models themselves cannot
be taken as a standard of validity and must be vali-
Gastric cancer is the second leading cause of dated independently. Usually, this is done by observ-
cancer death in the world. Investigators in ing whether or not the results of a model predicts
Europe analyzed data from a cohort recruited what is found in another, independent sample of
from 10 European countries to see whether patients. The results of the first model are consid-
alcohol was an independent risk factor for ered a hypothesis that is to be tested with new data.
stomach cancer (19). They identified nine vari- If random variation is mainly responsible for the
ables that were known risk factors or potential results of the first model, it is unlikely that the same
confounders of the association between the random effects will occur in the validating dataset,
main exposure (alcohol consumption) and dis- too. Other evidence for the validity of a model is its
ease (stomach cancer): age, study center, sex, biologic plausibility and its consistency with simpler,
physical activity, education, cigarette smoking, more transparent analyses of the data, such as strati-
diet, body mass index, and physical activity, fied analyses.
and in a subset of patients, serologic evidence
of Helicobacter pylori infection. As a limita- BAYESIAN REASONING
tion, they mentioned that they would have
included salt but did not have access to data An altogether different approach to the information
on salt intake. Modeling was with the Cox pro- contributed by a study is based on Bayesian inference.
portional hazards model, so they checked that We introduced this approach in Chapter 8 where we
the underlying assumption, that risk does not applied it to the specific case of diagnostic testing.
vary with time, was met. After adjustment for Bayesian inference begins with prior belief about
the other variables, heavy but not light alcohol the answer to a research question, analogous to pre-
consumption was associated with stomach can- test probability of a diagnostic test. Prior belief is
cer (hazard ratio 1.65, 95% confidence interval based on everything known about the answer up to
1.06–2.58); beer was associated, but not wine the point when new information is contributed by a
or liquor. study. Then, Bayesian inference asks how much the
results of the new study change that belief.
Chapter 11: Chance 191
Some aspects of Bayesian inference are compel- small number of hypotheses are identified before-
ling. Individual studies do not take place in an infor- hand and multiple comparisons are not as worrisome.
mation vacuum; rather, they are in the context of Rather, prior belief depends on the plausibility of the
all other information available at the time. Starting assertion rather than whether the assertion was estab-
each study from the null hypothesis—that there is lished before or after the study was begun.
no effect—is unrealistic because something is already Although Bayesian inference is appealing, so far it
known about the answer to the question before the has been difficult to apply because of poorly devel-
study is even begun. Moreover, results of individual oped ways of assigning numbers to prior belief and
studies change belief in relation to both their scien- to the information contributed by a study. Two
tific strengths and the direction and magnitude of exceptions are in cumulative summaries of research
their results. For example, if all preceding studies evidence (Chapter 13) and in diagnostic testing, in
were negative and the next one, which is of compa- which “belief ” is prior probability and the new infor-
rable strength, is found to be positive, then an effect mation is expressed as a likelihood ratio. However,
is still unlikely. On the other hand, a weak prior belief Bayesian inference is the conceptual basis for qualita-
might be reversed by a single strong study. Finally, tive thinking about cause (see Chapter 12).
with this approach it is not so important whether a
Review Questions
Read the following and select the best was 238 mg/dL in the group receiving the
response. new drug and 240 mg/dL in the group
receiving the old drug (P < 0.001). Which of
11.1. A randomized controlled trial of thrombo- the following best describes the meaning of
lytic therapy versus angioplasty for acute the P value in this study?
myocardial infarction finds no difference
A. Bias is unlikely to account for the
in the main outcome, survival to discharge
from hospital. The investigators explored observed difference.
whether this was also true for subgroups of B. The difference is clinically important.
patients defined by age, number of vessels C. A difference as big or bigger than what
affected, ejection fraction, comorbidity, and was observed could have arisen by chance
other patient characteristics. Which of the one time in 1,000.
following is not true about this subgroup D. The results are generalizable to other
analysis? patients with hypertension.
E. The statistical power of this study was
A. Examining subgroups increases the inadequate.
chance of a false-positive (misleading
statistically significant) result in one of 11.3. In a well-designed clinical trial of treatment
the comparisons. for ovarian cancer, remission rate at 1 year is
B. Examining subgroups increases the 30% in patients offered a new drug and 20%
chance of a false-negative finding in one in those offered a placebo. The P value is 0.4.
of these subgroups, relative to the main Which of the following best describes the
result. interpretation of this result?
C. Subgroup analyses are bad scientific
A. Both treatments are effective.
practice and should not be done.
B. Neither treatment is effective.
D. Reporting results in subgroups helps
C. The statistical power of this study is
clinicians tailor information in the study
60%.
to individual patients.
D. The best estimate of treatment effect size
is 0.4.
11.2. A new drug for hyperlipidemia was com-
E. There is insufficient information to
pared with placebo in a randomized con-
decide whether one treatment is better
trolled trial of 10,000 patients. After 2 years,
than the other.
serum cholesterol (the primary outcome)
192 Clinical Epidemiology: The Essentials
REFERENCES
1. Fisher R in Proceedings of the Society for Psychical Research, 11. Mai PL, Wideroff L, Greene MH, et al. Prevalence of family
1929, quoted in Salsburg D. The Lady Tasting Tea. New York: history of breast, colorectal, prostate, and lung cancer in a popu-
Henry Holt and Co; 2001. lation-based study. Public Health Genomics 2010;13:495–503.
2. Johnson AF. Beneath the technological fix: outliers and prob- 12. Venge P, Johnson N, Lindahl B, et al. Normal plasma levels
ability statements. J Chronic Dis 1985;38:957–961. of cardiac troponin I measured by the high-sensitivity cardiac
3. Courtney C, Farrell D, Gray R, et al for the AD2000 Collab- troponin I access prototype assay and the impact on the diag-
orative Group. Long-term donepezil treatment in 565 patients nosis of myocardial ischemia. J Am Coll Cardiol 2009;54:
with Alzheimer’s disease (AD2000): randomized double-blind 1165–1172.
trial. Lancet 2004:363:2105–2115. 13. McCormack K, Scott N, Go PMNYH, et al. Laparoscopic
4. Bernard SA, Gray TW, Buist MD, et al. Treatment of coma- techniques versus open techniques for inguinal hernia repair.
tose survivors of out-of-hospital cardiac arrest with induced Cochrane Database Syst Rev 2003;(1):CD001785.
hypothermia. N Engl J Med 2002;346:557–563. 14. Goodman SN, Berlin JA. The use of predicted confidence
5. The AIM-HIGH Investigators. Niacin in patients with low intervals when planning experiments and the misuse of
HDL cholesterol levels receiving intensive statin therapy. N power when interpreting results. Ann Intern Med 1994;121:
Engl J Med 2011;365:2255–2267. 200–206.
6. Peto R, Pike MC, Armitage P, et al. Design and analysis of 15. Sackett DL, Haynes RB, Gent M, et al. Compliance. In:
randomized clinical trials requiring prolonged observation of Inman WHW, ed. Monitoring for Drug Safety. Lancaster,
each patient. I. Introduction and design. Br J Cancer 1976;34: UK: MTP Press; 1980.
585–612. 16. Armitage P. Importance of prognostic factors in the analysis of
7. Lind J. A treatise on scurvy. Edinburgh; Sands, Murray and data from clinical trials. Control Clin Trials 1981;1:347–353.
Cochran, 1753 quoted by Thomas DP. J Royal Society Med 17. Hunter DJ, Kraft P. Drinking from the fire hose—statisti-
1997;80:50–54. cal issues in genomewide association studies. N Engl J Med
8. Weinstein SJ, Yu K, Horst RL, et al. Serum 25-hydroxyvita- 2007;357:436–439.
min D and risks of colon and rectal cancer in Finnish men. 18. Connolly SJ, Eikelboom J, Joyner C, et al. Apixaban in
Am J Epidemiol 2011;173:499–508. patients with atrial fibrillation. N Engl J Med 2011;364:
9. Rossouw JE, Anderson GL, Prentice RL, et al. for the Wom- 806–817.
en’s Health Initiative Investigators. Risks and benefits of estro- 19. Duell EJ, Travier N, Lujan-Barroso L, et al. Alcohol consump-
gen plus progestin in healthy postmenopausal women: prin- tion and gastric cancer risk in European Prospective Investi-
ciple results from the Women’s Health Initiative randomized gation into Cancer and Nutrition (EPIC) cohort. Am J Clin
controlled trial. JAMA 2002;288:321–333. Nutr 2011;94:1266–1275.
10. Braitman LE. Confidence intervals assess both clinical signifi- 20. Diamond GA. Future imperfect: the limitations of clinical
cance and statistical significance. Ann Intern Med 1991;114: prediction models and the limits of clinical prediction. J Am
515–517. Coll Cardiol 1989;14:12A–22A.
Chapter 12
Cause
In what circumstances can we pass from observed association to a verdict of causation?
Upon what basis should we proceed to do so?
—Sir Austin Bradford Hill
1965
KEY WORDS
Web of causation Decision analysis Example
Aggregate risk studies Cost-effectiveness
In 1843, Oliver Wendell Holmes (then profes-
Ecological studies analysis
sor of anatomy and physiology and later dean
Ecological fallacy Cost–benefit analysis
of Harvard Medical School), published a study
Time-series studies
linking hand washing habits by obstetricians
Multiple time-series
and childbed (puerperal) fever, an often-fatal
studies
disease following childbirth. (Puerperal fever
is now known to be caused by a bacterial in-
This book has been about three kinds of clinically use- fection.) Holmes’s observations led him to con-
ful information. One is description, a simple statement clude that “the disease known as puerperal
of how often things occur, summarized by metrics fever is so far contagious, as to be frequently
such as incidence and prevalence, as well as (in the case carried from patient to patient by physicians
of diagnostic test performance) sensitivity, specificity, and nurses (1).”
predictive value, and likelihood ratio. Another is pre- One response to Holmes’s assertion was
diction, evidence that certain outcomes regularly fol- that the findings made no sense. “I prefer to
low exposures without regard to whether the exposures attribute them [puerperal fever cases] to ac-
are independent risk factors, let alone causes. The third cident, or Providence, of which I can form a
is either directly or implicitly about cause and effect. Is a conception, rather than to contagion of which
risk factor an independent cause of disease? Does treat- I cannot form any clear idea, at least as to this
ment cause patients to get better? Does a prognostic particular malady,” wrote Charles Meigs, pro-
factor cause a different outcome, everything else being fessor of midwifery and the diseases of women
equal? This chapter considers cause in greater depth. and children at Jefferson Medical College.
Another word for the study of the origination of Around that time, a Hungarian physician, Ignaz
disease is “etiology,” now commonly used as a syn- Semmelweis, showed that disinfecting physi-
onym for cause, as in “What is the etiology of this cians’ hands reduced rates of childbed fever,
disease?” To the extent that the cause of disease is not and his studies were also dismissed because he
known, the disease is said to be “idiopathic” or of had no generally accepted explanation for his
“unknown etiology.” findings. Holmes’s and Semmelweis’s assertions
There is a longstanding tendency to judge the were made decades before pioneering work—
legitimacy of a causal assertion by whether it makes by Louis Pasteur, Robert Koch, and Joseph
sense according to beliefs at the time, as the following Lister—established the germ theory of disease.
historical example illustrates.
194
Chapter 12: Cause 195
The importance attached to a cause-and-effect retrovirus causes AIDS; and the discovery in 2003
relationship “making sense,” usually in terms of a bio- that a coronavirus caused an outbreak of severe acute
logic mechanism, is still imbedded in current think- respiratory syndrome (SARS) (2).
ing. For example, in the 1990s, studies showing that
eradication of Helicobacter pylori infection prevented
Multiple Causes
peptic ulcer disease were met with skepticism because
everyone knew that ulcers of the stomach and duo- For some diseases, one cause appears to be so dom-
denum were not an infectious disease. Now, H. pylori inant that we speak of it as the cause. We say that
infection is recognized as a major cause of this disease. Mycobacterium tuberculosis causes tuberculosis or that
In this chapter, we review concepts of cause in an abnormal gene coding for the metabolism of phe-
clinical medicine. We discuss the broader array of nylalanine, an amino acid, causes phenylketonuria.
evidence, in addition to biologic plausibility, that We may skip past the fact that tuberculosis is also
strengthens or weakens the case that an association caused by host and environmental factors and that
represents a cause-and-effect relationship. We also the disease phenylketonuria develops because there is
briefly deal with a kind of research design not yet phenylalanine in the diet.
considered in this book: studies in which exposure to More often, however, various causes make a more
a possible cause is known only for groups and not for balanced contribution to the occurrence of disease
the individuals in the groups. such that no one stands out. The underlying assump-
tion of Koch’s postulates, one cause– one disease, is
too simplified. Smoking causes lung cancer, coronary
BASIC PRINCIPLES artery disease, chronic obstructive pulmonary dis-
ease, and skin wrinkles. Coronary artery disease has
Single Causes multiple causes, including cigarette smoking, hyper-
In 1882, 40 years after the Holmes-Meigs con- tension, hypercholesterolemia, diabetes, inflamma-
frontation, Koch set forth postulates for determin- tion, and heredity. Specific parasites cause malaria,
ing that an infectious agent is the cause of a disease but only if the mosquito vectors can breed, become
(Table 12.1). Basic to his approach was the assump- infected, and bite people, and those people are not
tion that a particular disease has one cause and that a taking antimalarial drugs or are unable to control the
particular cause results in one disease. This approach infection on their own.
helped him to identify for the first time the bacteria When many factors act together, it has been called
causing tuberculosis, diphtheria, typhoid, and other the “web of causation” (3). A causal web is well
common infectious diseases of his day. understood in chronic degenerative diseases such as
Koch’s postulates contributed greatly to the con- cardiovascular disease and cancer, but it is also the
cept of cause in medicine. Before Koch, it was believed basis for infectious diseases, where the presence of a
that many different bacteria caused any given disease. microbe is a necessary but not sufficient cause of dis-
The application of his postulates helped bring order ease. AIDS cannot occur without exposure to HIV,
out of chaos. They are still useful today. That a unique but exposure to the virus does not necessarily result in
infectious agent causes a particular infectious disease disease. For example, exposure to HIV rarely results
was the basis for the discovery in 1977 that Legion- in seroconversion after needlesticks (about 3/1,000)
naire disease is caused by a gram-negative bacterium; because the virus is not nearly as infectious as, say,
the discovery in the 1980s that a newly identified the hepatitis B virus. Similarly, not everyone exposed
to tuberculosis—in Koch’s day or now—becomes
infected.
Table 12.1 When multiple causes act together, the resulting
Koch’s Postulates risk may be greater or less than would be expected by
simply combining the effects of the separate causes.
1. The organism must be present in every case of the That is, they interact—there is effect modification.
disease. Figure 12.1 shows the 10-year risk of cardiovascular
2. The organism must be isolated and grown in pure disease in a 60-year-old man with no prior history
culture. of cardiovascular disease according to the presence
3. The organism must cause a specific disease when or absence of several common risk factors. The risk
inoculated into an animal. is greater than the sum of the effects of each indi-
4. The organism must then be recovered from the
vidual risk factor. The effect of low HDL is more in
animal and identified.
the presence of elevated total cholesterol, the effect
196 Clinical Epidemiology: The Essentials
100
60
40
20
0
Total cholesterol (mg/dL) 160 280
HDL (mg/dL) 60 60 35
Smoking No No No Yes
Systolic blood pressure (mm Hg) 120 120 120 120 160
Daibetes mellitis No No No No No Yes
Figure 12.1 ■ The interaction of multiple risk factors for cardiovascular dis-
ease. Ten-year cardiovascular risk (%) for a 60-year-old man with no risk factors
(left bar) and with the successive addition of five risk factors (bars to the right). Each
risk factor alone adds relatively little (several percent) to risk whereas adding them
to each other increases risk almost 10-fold, far more than the sum of the individual
risk factors acting independently, which is shown by the shaded area of the right-
hand bar. (Data from The Framingham Risk Calculator in UpToDate, Waltham, MA
according to formulae in D’Agostino RB Sr, Vasan RS, Pencina MJ, et al. General
cardiovascular risk profile for use in primary care. The Framingham Heart Study.
Circulation 2008;117(6):743–753.)
Crowding
Malnutrition Exposure to Tissue invasion
Vaccination Mycobacterium and reaction
Genetics
SUSCEPTIBLE HOST INFECTION TUBERCULOSIS
Figure 12.2 shows how both risk factors and role in the decline in tuberculosis rates in developed
pathogenesis of tuberculosis—distant and proximal countries than treatments. Figure 12.3 shows that the
causes—lead on a continuum to the disease. Exposure death rate from tuberculosis in England and Wales
to M. tuberculosis depends on the host’s environment: dropped dramatically before the tubercle bacillus was
close proximity to active cases. Infection depends on identified and a century before the first effective anti-
host susceptibility, which can be increased by mal- biotics were introduced in the 1950s.
nutrition, decreased by vaccination, and altered by The web of causation is continually changing, even
genetic endowment. Whether infection progresses to for old diseases. Between 1985 and 1992, the num-
disease depends on these factors and others, such as ber of tuberculosis cases in the United States, which
immunocompetence, which can be compromised by had been falling for a century, began to increase
HIV infection and age. Finally, active infection may (Fig. 12.4) (4). Why did this happen? There had been
be cured by antibiotic treatment. an influx of immigrants from countries with high
Clinicians may be so intent on pathogenesis that rates of tuberculosis. The AIDS epidemic produced
they underestimate the importance of more remote more people with a weakened immune system, mak-
causes of disease. In the case of tuberculosis, social ing them more susceptible to infection with M. tuber-
and economic improvements influencing host sus- culosis. When infected, their bodies allowed massive
ceptibility, such as less crowded living space and bet- multiplication of the bacterium, making them more
ter nutrition, appear to have played a more prominent infectious. Rapid multiplication, especially in patients
400
Death rate (per 100,000)
300
200
Antibiotics introduced
100
0
1840 1860 1880 1900 1920 1940 1960 1980 2000
Year
30,000
25,000
Number of cases
20,000
15,000
10,000
50,000
0
1980 1985 1990 1995 2000 2005 2010
Year
Figure 12.4 ■ Tuberculosis cases in the United States, 1980 through 2010. A
longstanding decline was halted in 1985. The number of cases reached a peak in 1992
and then began to decline again. (Adapted from Centers for Disease Control and
Prevention. Reported tuberculosis in the United States, 2010. Available at http://www.
cdc.gov/Features/dsTB2010Data/. Accessed February 8, 2012.)
who did not follow prescribed drug regimens, favored Examining Individual Studies
the development of multidrug resistant strains.
People who were more likely to have both AIDS and One approach to evidence for cause-and-effect has
tuberculosis—the socially disadvantaged, intravenous been discussed throughout this book: in-depth analy-
drug users, and prisoners—were developing multi- sis of the studies themselves. When an association has
drug resistant disease and exposing others in the pop- been observed, a causal relationship is established to
ulation to a difficult-to-treat strain. The interplay of the extent that the association cannot be accounted
environment, behavior, and molecular biology com- for by bias and chance. Figure 12.5 summarizes a
bined to reverse a declining trend in tuberculosis. To familiar approach. One first looks for bias and how
combat the new epidemic of tuberculosis, the public much it might have changed the result, and then
health infrastructure was rebuilt. Multidrug regimens whether the association is unlikely to be by chance.
(biologic efforts) and directly observing therapy to For observational studies, confounding is always a
ensure compliance (behavioral efforts) were initiated
and the rate of tuberculosis began to decline again. Explanation ASSOCIATION
possibility. Although confounding can be controlled a long way toward increasing or decreasing its validity,
in comprehensive, state-of-the science ways, it is regardless of the type of design used. A bad random-
almost never possible to rule it out entirely; there- ized controlled trial contributes less to our under-
fore, confounding remains the enduring challenge to standing of cause than an exemplary cohort study.
causal reasoning based on observational research. With this hierarchy in mind, the strength of the
Randomized trials can deal definitively with con- evidence for cause and effect is sometimes judged
founding, but they are not possible for studies of risk according to the best studies of the question. Well-
(i.e., causes) per se. For example, it is unethical (and designed and well-executed randomized controlled
would be unsuccessful) to randomize non-smokers to trials trump observational studies, state-of-the-
cigarette smoking to study whether smoking causes science observational studies trump case series, and
lung cancer. However, randomized controlled trials so on. This is a highly simplified approach to evidence
can contribute to causal inference in two situations. but is a useful shortcut.
One is when the trial is to treat a possible cause, such
as elevated cholesterol or blood pressure, and the out- THE BODY OF EVIDENCE FOR
come is prevented. Another is when a trial is done for AND AGAINST CAUSE
another purpose and the intervention causes unan-
ticipated harms. For example, the fact that there were What aspects of the research findings support cause
an excess of cardiovascular events in randomized tri- and effect when only observational studies are avail-
als of the cyclooxygenase-2 inhibitor rofecoxib, which able? In 1965, the British statistician Sir Austin Brad-
had been given for other reasons (e.g., pain relief ), is ford Hill proposed a set of observations that taken
evidence that this drug may be a cause of cardiovas- together help to establish whether a relationship
cular events. between an environmental factor and disease is causal
or just an association (5) (Table 12.3). We review
Hierarchy of Research Designs these “Bradford Hill criteria,” mainly using smoking
The various research designs can be placed in a hierar- and lung cancer as an example. Smoking is generally
chy of scientific strength for the purpose of establish- believed to cause lung cancer even though there are
ing cause (Table 12.2). At the top of the hierarchy not randomized controlled trials of smoking or an
are systematic reviews of randomized controlled trials undisputed biologic mechanism.
because they can deal definitively with confounding.
Randomized trials are followed by observational stud- Table 12.3
ies, with little distinction between cohort and case- Evidence That an Association Is
control studies in an era when case-control analyses are Cause and Effect
nested in cohorts sampled from defined populations.
Lower still are uncontrolled studies, biologic reason- Criteria Comments
ing, and personal experience. Of course, this order is Temporality Cause precedes effect
only a rough guide to strength of evidence. The man-
Strength Large relative risk
ner in which an individual study is performed can go
Dose–response Larger exposure to cause
associated with higher rates of
Table 12.2 disease
Hierarchy of Research Design Strength Reversibility Reduction in exposure is followed
by lower rates of disease
Individual Risk Studies
Consistency Repeatedly observed by different
Systematic reviews: consistent evidence from multiple persons, in different places,
randomized controlled trials circumstances, and times
Randomized controlled trials Biologic plausibility Makes sense according to biologic
Observational studies knowledge of the time
Cohort studies Specificity One cause leads to one effect
Case-control, Case-cohort studies
Analogy Cause-and-effect relationship
Cross-sectional studies already established for a similar
Case series exposure or disease
Experience, expert opinion Adapted from Bradford Hill AB. The environment and disease:
association and causation. Proc R Soc Med 1965;58:295–300.
200 Clinical Epidemiology: The Essentials
deaths/100,000 men/year
looked when interpreting cross-sectional and case-
control studies, in which both the purported cause 200
Lung cancer
and the effect are measured at the same points in
time. Smoking clearly precedes lung cancer by several
decades, but there are other examples where the order 127
of cause and effect can be confused.
100
78
Example
10
“Whiplash,” is the occurrence of neck pain 0
following a forceful flexion/extension injury, 0 1–14 15–24 25+
typically in an auto accident. Many people with Cigarettes smoked, number/day
whiplash recover quickly, but some go on to Figure 12.6 ■ Example of a dose–response relation-
chronic symptoms, litigation, and requests for ship: lung cancer deaths in male physicians according
disability awards. Many patients with whiplash to dose (number) of cigarettes smoked. (Data from data
in Doll R, Peto R. Mortality in relation to smoking: 20 years’
have anxiety and depression, and epidemio-
observations on male British doctors. Br Med J 1976;2:1525–
logic studies have linked the two. The injury and
1536.)
associated pain are generally thought to cause
the psychological symptoms. Investigators in
Norway tested the hypothesis that cause and Dose–Response Relationships
effect was in the other direction (6). They ana- A dose–response relationship is present if increas-
lyzed a cohort with measurements of psycho- ing exposure to the purported cause is followed by a
logical symptoms before injuries and follow-up larger and larger effect. In the case of cigarette smok-
for 11 years. People with anxiety and depres- ing, “dose” might be the number of years of smoking,
sion before the injury were more likely to report current packs per day, or “pack-years.” Figure 12.6
whiplash. shows a clear dose–response curve when lung cancer
death rates (responses) are plotted against the number
Finding that what was thought to be a cause actu- of cigarettes smoked (doses).
ally follows an effect is powerful evidence against Demonstrating a dose–response relationship
cause, but temporal sequence alone is only minimal strengthens the argument for cause and effect, but its
evidence for cause. absence is relatively weak evidence against causation
because not all causal associations exhibit a dose–
Strength of the Association response relationship within the range observed and
because confounding remains possible.
A strong association between a purported cause and
an effect, as expressed by a large relative or absolute
risk, is better evidence for a causal relationship than a
weak association. The reason is that unrecognized bias Example
could account for small relative risks but is unlikely to
result in large ones. Both the strong association between smok-
Thus, the 20 times higher incidence of lung ing and lung cancer and the dose–response
cancer among male smokers compared to non- relationship could be examples of confound-
smokers is much stronger evidence that smoking ing. According to this argument, an unknown
causes lung cancer than the finding that smoking is factor may exist that both causes people to
related to renal cancer, for which the relative risk is smoke and increases their risk of developing
much smaller (about 1.5). Similarly, a 10- to 100- lung cancer. The more the factor is present, the
fold increase in risk of hepatocellular carcinoma in more smoking and lung cancer occur, hence the
patients with hepatitis B infection is strong evidence dose–response relationship. Such an argument
that the virus is a cause of liver cancer.
Chapter 12: Cause 201
15.8 16.0
ex-smokers to never smokers
Example
Ratio of mortality rate of
15
Biologic plausibility, when present, strengthens group to which individuals belong. Another term is
the case for causation, but the absence of biologic ecological studies, because people are classified by
plausibility may just reflect the limitations of under- the general level of exposure in their environment,
standing of the biology of disease rather than the lack which may or may not correspond to their individual
of a causal association. exposure. Examples are epidemiologic studies relating
countries’ wine consumption to rates of cardiovascu-
Specificity lar mortality and studies of regional cancer or birth
defect rates in relation to regional exposures such as
Specificity—one cause–one effect—is more often
chemical spills.
found for acute infectious diseases (e.g., poliomyeli-
The main problem with studies that simply cor-
tis and tetanus) and for genetic diseases (e.g., famil-
relate average exposure with average disease rates in
ial adenomatous polyposis or ochronosis), although
groups is the potential for an ecological fallacy,
genetic effects are sometimes modified by gene–gene
in which affected individuals in a generally exposed
and gene–environment interactions. As mentioned
group may not have been the ones actually exposed
earlier in this chapter, chronic, degenerative diseases
to the risk factor. Also, exposure may not be the only
often have many causes for the same effect and many
characteristic that distinguishes people in the exposed
effects from the same cause. Lung cancer is caused by
group from those in the non-exposed group; that
cigarette smoking, asbestos, and radiation. Cigarette
is, there may be confounding factors. Thus, aggre-
smoking not only causes lung cancer but also bron-
gate risk studies like these are most useful in raising
chitis, cardiovascular disease, periodontal disease,
hypotheses, which should then be tested by more rig-
and wrinkled skin, to name a few. Thus, specificity is
orous research.
strong evidence for cause and effect, but the absence
Evidence from aggregate risk studies can be
of specificity is weak evidence against it.
strengthened when observations are made over a
period of time bracketing the exposure and even fur-
Analogy ther strengthened if observations are in more than
The argument for a cause-and-effect relationship is one place and calendar time.
strengthened when examples exist of well-established In a time-series study, disease rates are measured
causes that are analogous to the one in question. at several points in time, both before and after the
Thus, the case that smoking causes lung cancer is purported cause has been introduced. It is then pos-
strengthened by observations that other environmen- sible to see whether a trend in disease rate over time
tal toxins such as asbestos, arsenic, and uranium also changes in relation to the time of exposure. If changes
cause lung cancer. in the purported cause are directly followed by changes
In a sense, applying the Bradford Hill criteria to in the purported effect, and not at some other time,
cause is an example of Bayesian reasoning. For exam- the association is less likely to be spurious. An advan-
ple, belief in causality based on strength of association tage of time-series analyses is that they can distinguish
and dose–response is modified (built up or dimin- between changes already occurring over time (secular
ished) by evidence concerning biologic plausibility trends) and the effects of the intervention itself.
or specificity, with each of the criteria contributing
to a greater or lesser extent to the overall belief that
an association is causal. The main difference from the
Bayesian approach to diagnostic testing is that the Example
various lines of evidence for cause (dose–response,
reversibility, consistency, etc.) are being assembled Health care–associated infections with methicillin-
concurrently in various scientific disciplines rather resistant Staphylococcus aureus (MRSA) are a
than in series by clinical research. major cause of morbidity, mortality, and cost in
acute care hospitals. The U.S. Veterans Affairs
AGGREGATE RISK STUDIES (VA) initiated a national program to prevent
these infections in VA hospitals (8). The VA
Until now, we have discussed studies in which expo- introduced a “MRSA bundle” comprised a set
sure and disease are known for each individual in a of interventions including hand hygiene, uni-
study. To fill out the spectrum of research designs, versal nasal surveillance for MRSA, contact
we now consider a different kind of study, called precautions for patients colonized or infected
aggregate risk studies, in which exposure to a risk with MRSA, and changes in institutional
factor is characterized by the average exposure of the
Chapter 12: Cause 203
1.8
1.6
Health–care associated MRSA
infections/1,000 patient-days
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0
06
06
07
08
09
10
5
9
07
08
09
10
00
00
00
00
00
20
20
20
20
20
20
20
20
20
20
.2
.2
.2
.2
.2
ne
b.
b.
b.
b.
b.
ne
ne
ne
ne
ct
ct
ct
ct
ct
Fe
Fe
Fe
Fe
Fe
Ju
O
O
Ju
Ju
Ju
Ju
Year
14
Targeted
coverage
Cervical cancer mortality rates/100,000 12 for national
population
(%)
10
Denmark 40
8
6 Norway 5
4 Sweden 100
Finland 100
2
Iceland 100
0
1955 1960 1965 1970 1975 1980 1985
Year
Figure 12.9 ■ A multiple time-series study. Change in cervical cancer mortality
rates according to year organized Pap smear screening programs were implemented
and targeted coverage. Arrows mark the year coverage was achieved for each country.
(Redrawn with permission from Lăără E, Day NE, Hakama M. Trends in mortality from
cervical cancer in Nordic countries: association with organized screening programmes.
Lancet 1987;1(8544):1247–1249.
35
25
Screening
Treatment
20
15 Mortality
10
0
1975 1980 1985 1990 1995 2000 2005
Year of death
Figure 12.10 ■ Causes for the decline in colorectal cancer deaths, 1975–2000. (Re-
drawn with permission from Edwards BK, Ward E, Kohler BA, et al. Annual report to the
Nation on the status of cancer, 1975–2006, featuring colorectal cancer trends and impact
of interventions (risk factors, screening, and treatment) to reduce future rates. Cancer 2010;
116:544–573.)
Figure 12.11 ■ Relative strength of evidence for and against a causal effect. Note that
with study designs, the strength of evidence for a causal relationship is a mirror image of that
against. With findings, evidence for a causal effect does not mirror evidence against an effect.
Figure 12.11 summarizes the different types of cause, whereas a cross-sectional study finding no
evidence for and against cause, depending on the effect is weak evidence against cause.
research design, and results that strengthen or weaken Belief in a cause-and-effect relationship is a judgment
the evidence for cause. The figure roughly indicates based on both the scientific strength and results of all
relative strengths in helping to establish or discard a research bearing on the question. As a practical matter, at
causal hypothesis. Thus, a carefully done cohort study issue is whether the weight of the evidence is convincing
showing a strong association and a dose–response enough for us to behave as if something were a cause, not
relationship that is reversible is strong evidence for whether it is established beyond all reasonable doubt.
Review Questions
Read the following statements and select the brain cancer have not agreed with each other.
best response. A randomized controlled trial might resolve
the question. What is the main reason why a
12.1. One of your patients read that cell phones randomized controlled trial (RCT) would be
cause brain cancer, and she wants to know unlikely for this question?
your opinion. You discover that the inci-
dence of malignant brain tumors is increas- A. It would cost too much.
ing in the United States. Results of several B. People would not agree to be randomized
to cell phone use.
observational studies of cell phone use and
Chapter 12: Cause 207
12.10. You discover that case-control studies have A. Use of cell phones increases the incidence
been done to determine whether cell phone of brain cancers by 50%.
use is associated with the development B. Use of cell phones protects against brain
of brain cancer. In one study, patients with cancers.
brain cancer and matched controls without C. Specific types of cancers might be
brain cancer were asked about cell phone associated with cell phone use.
use. The estimated relative risk for at least D. The study has adequate statistical power
100 hours of use compared to no use was 1.0 to answer the research question.
for all types of brain cancers combined (95%
confidence interval 0.6–1.5). This finding is Answers are in Appendix A.
consistent with all of the following except:
REFERENCES
1. Holmes OW. On the contagiousness of puerperal fever. Med 7. Moertel CC, Fleming TR, Rubin J, et al. A clinical trial of
Classics 1936;1:207–268. [Originally published, 1843.] amygdalin (Laetrile) in the treatment of human cancer. N
2. Fouchier RA, Kuiken T, Schutten M, et al. Aetiology: Koch’s Engl J Med 1982;306:201–206.
postulates fulfilled for SARS virus. Nature 2003;423:240. 8. Jain R, Kralovic SM, Evans ME, et al. Veterans Affairs ini-
3. MacMahon B, Pugh TF. Epidemiology: Principles and Meth- tiative to prevent methicillin-resistant Staphylococcus aureus
ods. Boston: Little, Brown & Co.; 1970. infections. N Engl J Med 2011;364:1419–1430.
4. Burzynski J, Schluger NW. The epidemiology of tuberculosis 9. Lăără E, Day NE, Hakama M. Trends in mortality from cervi-
in the United States. Sem Respir Crit Care Med 2008;29:492– cal cancer in the Nordic countries: association with organized
498. screening programmes. Lancet 1987;1(8544):1247–1249.
5. Bradford Hill A. The environment and disease: association or 10. Edwards BK, Ward E, Kohler BA, et al. Annual report to the
causation? Proc R Soc Med 1965;58:295–300. nation on the status of cancer, 1975-2006, featuring colorec-
6. Mykletun A, Glozier N, Wenzel HG, et al. Reverse causality in tal cancer trends and impact of interventions (risk factors,
the association between whiplash and symptoms of anxiety and screening, and treatment) to reduce future rates. Cancer 2010;
depression. The HUNT Study. Spine 2011;36:1380–1386. 116:544–573.
Chapter 13
Summarizing the
Evidence
When the research community synthesizes existing evidence thoroughly, it is certain
that a substantial proportion of current notions about the effects of health care will be
changed. Forms of care currently believed to be ineffective will be shown to be effective;
forms of care thought to be useful will be exposed as either useless or harmful; and the
justification for uncertainty about the effects of many other forms of health care will be
made explicit.
—Ian Chalmers and Brian Haynes
1994
209
210 Clinical Epidemiology: The Essentials
study design (e.g., randomized trial or cohort) to make textbooks that are continually updated) are a source.
PICOTS. Elements of targeted questions for other Experts in the content area (e.g., rheumatic heart dis-
kinds of studies (e.g., studies of diagnostic test accuracy ease or Salmonella infection) may recommend stud-
or observational studies of risk or prognostic factors) are ies that were not turned up by the other approaches.
less well defined but include many of the same features. References cited in articles already found are another
possibility. There are a growing number of registries
Finding All Relevant Studies of clinical trials and funded research that can be used
to find unpublished results.
The first step in a systematic review is to find all the
The goal of consulting all these sources is to avoid
studies that bear on the question at hand. The review
missing any important article, even at the expense of
should include a complete sample of the best studies
inefficiency. In diagnostic test terms, the reviewer uses
of the question, not just a biased sample of studies
multiple parallel tests to increase the sensitivity of the
that happen to have come to attention. Clinicians
search, even at the expense of many false-positive
who review topics less formally—for colleagues in
results (i.e., unwanted or redundant citations), which
rounds, morning report, and journal clubs—face a
need to be weeded out by examining the studies
similar challenge and should use similar methods,
themselves.
although the process cannot be as exhaustive.
In addition to exercising due diligence in find-
How can a reviewer be reasonably sure that he or
ing articles, authors of systematic reviews explicitly
she has found all the best studies, considering that
describe the search strategy for their review, including
the medical literature is vast and widely dispersed?
search terms. This allows readers to see the extent to
No one method of searching is sufficient for this task,
which the reviewer took into account all the studies
so multiple complementary approaches are used
that were available at the time.
(Table 13.2).
Most reviews start by searching online databases
of published research, among them MEDLINE, (the
Limit Reviews to Scientifically
National Library of Medicine’s electronic database Strong, Clinically Relevant Studies
of published articles) EMBASE, and the Cochrane To be included in a systematic review, studies must
Database of Systematic Reviews. There are many meet a threshold for scientific strength. The assump-
others that can be identified with a librarian’s help. tion is that only the relatively strong studies should
Some, such as MEDLINE, can be searched both for count. How is that threshold established? Various
a content area (such as treatment of atrial fibrillation) expert groups have proposed criteria for adequate sci-
and for a quality marker (e.g., randomized controlled entific strength, and their advantages and limitations
trial). However, even in the best hands the sensitivity are discussed later in this chapter.
of MEDLINE searches (even for articles that are in Usually only a small proportion of studies are
MEDLINE) is far from perfect. Also, the contents of selected from a vast number of potential articles on
the various databases tend to complement each other. the topic. Many articles describe the biology of dis-
Therefore, database searching is useful but not suf- ease and are not ready for clinical application. Oth-
ficient. ers communicate opinions or summaries of existing
Other ways of finding the right articles make evidence, not original clinical research. Many stud-
up for what database searches might have missed. ies are not scientifically strong, and the information
Recent reviews and textbooks (particularly electronic they contain is eclipsed by stronger studies. Relatively
few articles report evidence bearing directly on the
Table 13.2 clinical question and are both scientifically strong and
clinically relevant. Table 13.3 shows how articles were
Approaches to Finding All the Studies selected for a systematic review of statin drugs for the
Bearing on a Question prevention of infections; only 11 of 632 publications
identified were included in the review.
• Search online database such as MEDLINE, EMBASE,
and the Cochrane Database of Systemic Reviews.
Are Published Studies a Biased
• Read recent reviews and textbooks.
• Seek the advice of experts in the content area. Sample of All Completed Research?
• Consider articles cited in the articles already found by The articles cited in systematic reviews should include
other approaches. all scientifically strong studies of the question, regard-
• Review registries of clinical trials and funded research
less of whether have been published. Publication
(to identify unpublished studies).
bias is the tendency for published studies to be
212 Clinical Epidemiology: The Essentials
Example
A systematic review summarized published re-
ports of the effectiveness of the supplement
chondroitin on pain intensity in patients with
osteoarthritis of the hip or knee (2). The inves-
tigators found 20 published trials comparing
glucosamine to placebo. Figure 13.2 shows the
proportion of trials meeting several criteria for
quality. (Ignore, for the moment, effect sizes,
shown on the right.) A substantial proportion
Favor intervention Favor control of trials did not meet basic standards for scien-
Magnitude of the effect size tific quality such as concealment of allocation,
intention-to-treat analysis, and sample size, so
Possible publication bias the evidence base for this question was not
B particularly strong.
Precision of the estimate of the effect
No. of patients
No. of
Variable randomly Effect size
Trials
assigned
Concealment of allocation
Adequate 2 1,253
Unclear 18 2,593
Placebo control
Yes 17 3,091
No 3 755
Patient blinding
Adequate 12 1,952
Unclear or no 8 1,894
Duration of follow-up
>6 months 11 2,430
≤6 months 9 1,416
Summarizing Results
Example The results of a systematic review are typically dis-
played as a forest plot showing the point estimate of
Figure 13.2 shows the relationship between effectiveness and confidence interval for each study in
several markers of quality and effect size with the review. Figure 13.3 illustrates a summary of stud-
confidence intervals for trials of chondroitin ies comparing quinine to placebo for muscle cramps
versus placebo or no intervention for relief of (6). The measure of effectiveness in this example is
pain of osteoarthritis of the knee or hip (2). In change in the number of leg cramps in a 2-week
general, effect sizes were larger in trials in which period. In other systematic reviews, it might be rela-
the quality measure was absent; this effect was tive risk, attributable risk, or any other measure of
statistically significant for concealment of alloca- effect. Point estimates are represented by boxes with
tion, intention-to-treat analysis, and larger stud- their size proportional to the size of the study. A verti-
ies. An analysis restricted to the three largest cal line marks where neither quinine nor placebo was
studies with intention-to-treat analyses found more effective.
an effect size of –0.03 (95% confidence interval The origin of the name “forest plot” is uncertain,
–0.13 to 0.07), where an effect size of –0.30 was but it is variously attributed to a researcher’s name
considered minimally clinically relevant. That is, or the appearance resembling a “forest of lines” (7).
there was no clinically important effect. We believe they help readers “see the forest and the
trees.”
Study or Cramp
Weight Cramp number (95% CI)
subgroup number
−10 −5 0 5 10
Forest plots summarize a tremendous amount of patients, interventions, doses, follow-ups, and out-
information that would otherwise require a great deal comes. Treating “apples and oranges” as if they are all
of effort to find. just fruits disregards useful information.
Investigators use two general approaches to decide
1. Number of studies. The rows show the number of
whether it is appropriate to pool study results. One
studies meeting stringent criteria for quality, in
is to make an informed judgment about whether the
this case, 13.
research questions addressed by the trials are similar
2. What studies and when. The first column identifies
enough to constitute studies of the same question (or
names and year of publication for the component
a set of reasonably similar questions).
studies so that readers can see how old the stud-
ies are and where they can be found. (Full refer-
ences to studies are not shown in the figure but are
included in the article).
3. Pattern of effect sizes. The 13 point estimates, taken Example
as a whole, show what the various studies reported Do antioxidant supplements prevent gastroin-
for effect sizes. In the example, all of the 13 stud- testinal cancers? Investigators sought the an-
ies favored quinine, but the size of the effects var- swer to this question is a systematic review (8).
ies. The antioxidants were vitamins A, C, E, and B6,
4. Precision of estimates. Many studies (6 of 13) were beta-carotene (which has vitamin A activity),
“negative” (their confidence intervals included and the minerals zinc and selenium, singly and
no effect). This would give the impression, in a in combination at various doses. Cancers were
simple accounting of the number of “positive” of the esophagus, stomach, colon, rectum, pan-
and “negative” studies, that treatment is not creas, and liver. The investigators calculated a
effective or at least that effectiveness is question- pooled estimate of relative risk for the 14 ran-
able. The forest plot gives a different impression: domized trials they found. The point estimate
All point estimates favor quinine, and the nega- was close to 1 (no effect) and the confidence
tive studies tend to be imprecise yet consistent interval was narrow, indicating no overall ef-
with effectiveness. fect. But should they have pooled these stud-
5. The effects for the big studies. The large, statistically ies? On the face of it, the study questions were
precise studies (seen by both narrow confidence very different from each other. Although the
intervals and large boxes representing point esti- various vitamins and minerals do have in com-
mates) deserve more weight than small ones. In mon that they are antioxidants (as are count-
the example, the confidence intervals for the two less other compounds in the diet), they have
largest studies do not include “no change in mus- altogether different chemical structures, roles
cle cramp rate” (although one touches it). in metabolism, and deficiency syndromes. Simi-
In these ways, a single picture conveys in a glance larly, the gastrointestinal cancers have very dif-
a lot of basic information about the very best studies ferent genetics, risk factors, and pathogenesis.
of a question. For example, stomach cancer follows infec-
tion with the bacterium Helicobacter pylori,
whereas liver cancer is often a complication of
COMBINING STUDIES IN viral hepatitis. It seems improbable that these
META-ANALYSES supplements and these cancers have enough
in common to justify treating them as simply
Meta-analysis is the practice of combining (“pool-
“antioxidant supplements” and “gastrointesti-
ing”) the results of individual studies, if they are simi-
nal cancers” and finding a meaningful result.
lar enough to justify a quantitative summary effect
That the investigators found no effect may be
size. When appropriate, meta-analyses provide more
because of the differences among the studies
precise estimates of effect sizes than are available in
and does not rule out effects of specific supple-
any of the individual studies.
ments on specific cancers.
Are the Studies Similar Enough to
Justify Combining?
Reasonable people might disagree on how similar
It makes no sense to pool the results of very differ- studies must be to justify combining them. We have
ent studies—studies of altogether different kinds of been critical of pooling in this study, but other capable
Chapter 13: Summarizing the Evidence 217
Results of multiple
clinical trials
randomly
distributed around
the true treatment
effect
Treatment effects
B Random effects model
Pooled result
single estimated
treatment effect
Multiple true
treatment effects
(distribution of
treatment effects)
Results of multiple
clinical trials
randomly distributed
around the true
treatment effect
Treatment effects
Figure 13.4 ■ Models for combining studies in a meta-analysis. A. Fixed
effect model. B. Random effects model. (Redrawn with permission from UpTo-
Date, Waltham, MA.)
the width of the summary confidence intervals cal- sonable to combine studies using a random effects
culated by the fixed effect model tends to imply a model, as long as the studies are similar enough to
greater degree of precision than is actually the case. one another (which is obviously a value judgment).
Also, by combining dissimilar studies, one loses use- Random effects models produce wider confidence
ful information that might have resulted from con- intervals than fixed effect models and for this reason
trasting them. The fixed effect model is used when the are thought to be more realistic. However, it is uncer-
studies in a systematic review are clearly quite similar tain how the family of similar studies is defined and
to each other. whether the studies are really a random sample of all
The random effects model (Fig. 13.4B) assumes such studies of a question. Nevertheless, because ran-
that the studies address somewhat different ques- dom effects models at least take heterogeneity into
tions but that they form a closely related family of account and are, therefore, less likely to overestimate
studies of a similar question. The studies are consid- precision, they are the model used when (as if often
ered a random sample of all studies bearing on that the case) heterogeneity is present.
question. Even if clinical judgment and a statistical When an overall effect size is calculated, it is usu-
tests both suggest heterogeneity, it may still be rea- ally displayed at the bottom of the forest plot of
Chapter 13: Summarizing the Evidence 219
component studies as a diamond representing the gastric banding, among others. Randomized tri-
summary point estimate and confidence interval (see als have compared various combinations of these to
Fig. 13.3). The summary effect is a more precise and each other and to usual care, but no study has com-
formalized presentation of what might have been pared each technique to all the others. Network
concluded from the pattern of results available in the meta-analysis is a mathematical way of estimating
forest plot. the comparative effectiveness of interventions that
are not directly compared in actual studies but can
Identifying Reasons be indirectly compared by use of modeling. Using
for Heterogeneity a network meta-analysis, investigators were able to
estimate the respective effects of each bariatric sur-
Random effects models are a way of taking heteroge- gery method compared to usual care, showing that
neity into account when calculating a summary effect each was effective and identifying the hierarchy of
size, but a separate need is to identify characteristics effectiveness (10).
of patients or treatments that are responsible for the
variation in effects. CUMULATIVE META-ANALYSES
The most straightforward way to identify reasons
for heterogeneity is to do subgroup analyses. This is Usually the studies in a forest plot are represented
possible in patient-level meta-analyses, as described in separately in alphabetical order by first author or
the example about nitric oxide treatment of preterm in chronological order. Another way to look at the
infants. However, if trials, not patients, are pooled, same information is to present a cumulative meta-
one must rely on less direct methods. analysis. Component studies are put in chrono-
Another approach to understanding the reasons for logical order, from oldest to most recent, and a new
heterogeneity is to do a sensitivity analysis, as discussed summary effect size and confidence interval is calcu-
in Chapter 7. Summary effects are examined with and lated for each time the results of a new study became
without trials that seem, either for clinical or statistical available. In this way, the figure represents a running
reasons, to be different from the others. For example, summary of all the studies up to the time of each
investigators might look at summary effects (or statis- new trial. This is a Bayesian approach, as described
tical tests for heterogeneity) after removing relatively in Chapter 11, where each new trial modifies prior
weak trials or those in which the dose of drug was belief in comparative effectiveness, established by the
relatively small to see if study strength or drug dose trials that went before.
account for differences in results across studies. The following example illustrates the kind of
A modeling approach called meta-regression, insights a cumulative meta-analyses can provide and
similar to multivariable analysis (discussed in Chap- also shows how meta-analyses in general and cumula-
ter 11) can be used to explore reasons for heterogene- tive meta-analyses in particular are useful for estab-
ity when trials, not patients, are pooled. The indepen- lishing harmful effects. Individual trials, which are
dent variables are those reported in aggregate in each powered to detect effectiveness, are usually under-
individual trial (e.g., the average age or proportion of powered to detect harms because harms occur at a
men and women in those trials) and the outcomes substantially lower rate. Pooling data may accumulate
are the reported treatment effect for each of those tri- enough events to detect harmful effects.
als. The number of observations is the number of tri-
als in the meta-analysis. This approach is limited by
the availability of data on the covariates of interest
in the individual trials, the compatibility of the data
across trials, and the stability of models based on just
Example
a few observations (the number of trials in the meta- Rofecoxib is a non-steroidal anti-inflammatory
analysis). Another limitation, as with any aggregate- drug (NSAID) that is less likely to cause gas-
risk study, is the possibility of an “ecological fallacy,” trointestinal complications than older NSAIDs.
as discussed in Chapter 12. Studies of patients taking this drug reported
The various studies in a systematic review some- an increased rate of cardiovascular events, but
times address the effectiveness of a set of interrelated this was not attributed to the drug because
interventions, not just a single comparison between confidence intervals were wide, including no
an intervention and control group. For example, effect, and also because the risk in the larg-
there are several techniques for bariatric surgery such est study was explained as a protective effect
as jejunoileal bypass, sleeve gastrectomy, adjustable
220 Clinical Epidemiology: The Essentials
of the comparator, naproxen. Investigators did the information contributed by the one large
a meta-analysis of 16 randomized trials of ro- study. Subsequent studies tended to consolidate
fecoxib versus control (a placebo or another this finding and increase statistical precision.
NSAID) (11). All but one of the studies was Rofecoxib was taken off the market in 2004, 4
small, with only 1 to 6 cardiovascular events per years after a cumulative meta-analysis would
trial, but there were 24 cardiovascular events in have shown cardiovascular risk was present at
the one large trial. The summary relative risk conventional levels of statistical significance.
of rofecoxib for myocardial infarction, includ-
ing data from the large trial, was 2.24 (95%
confidence interval 1.24–4.02). When was risk
first apparent? A cumulative meta-analysis Cumulative meta-analyses have been used to
(Fig. 13.5) shows that a statistically significant show when the research community could have
effect was apparent in 2000, mainly because of known about effectiveness or harm if it had avail-
able a meta-analysis of the evidence, but now it
1,399 5 0.828
2,208 6 0.996
2,983 8 0.649
3,324 9 0.866
5,059 13 0.881
13,269 40 0.070
14,247 44 0.034
15,156 46 0.025
20,742 52 0.010
20,742 63 0.007
21,432 64 0.007
0.1 1 10
Albeck 1996 51 15 10 4
Charnley 1951 63 8 11 6
Kerr 1988 98 20 2 16
Kosteljanetz 1984 44 23 12 21
Kosteljanetz 1988 40 6 5 1
misleading estimate of effects and directing attention strength of the individual studies that they summa-
away from why differences in effects exist. Meta- rize. Sometimes most of the weight in a meta-analysis
analyses do not include information based on the is vested in a single trial, so there is relatively little
biology of disease, clinical experience, and the practi- difference between the information in that trial and
cal application of best evidence to patient care—other in a summary of all studies of the same question.
dimensions of care that may have as large influence For the most part, the two—large strong studies and
on patient-centered outcomes as do choice of drugs meta-analyses of all related studies—complement
or procedures. each other rather than compete. When they disagree,
Which comes closer to the truth, the best indi- the main issue is why they disagree, which should be
vidual research or meta-analyses? It is a false choice. sought by examination of the studies themselves, not
Meta-analyses cannot be better than the scientific the methods in general.
Review Questions
Read the following and select the best A. You wish to obtain a more generalizable
response. conclusion.
B. A statistical test shows that the studies are
13.1. A systematic review of observational studies heterogeneous.
of antioxidant vitamins to prevent cardio- C. Most of the component studies are
vascular disease combined the results of 12 statistically significant.
studies to obtain a summary effect size and D. The component studies have different, and
confidence interval. Which of the following to some extent complementary, biases.
would be the strongest rationale for combin- E. You can obtain a more precise estimate of
ing study results? effect size.
Chapter 13: Summarizing the Evidence 223
13.2. You are asked to critique a review of the D. Individual measures of quality such as
literature on whether alcohol is a risk factor randomization and blinding are not
for breast cancer. The reviewer has searched associated with study results.
MEDLINE and found several observational
studies of this question but has not searched 13.6. Which of the following kinds of studies is
elsewhere. All of the following are limita- least likely to be published?
tions of this search strategy except:
A. Small positive studies
A. Studies with negative results tend to not B. Large negative studies
be published. C. Large positive studies
B. MEDLINE searches typically miss D. Small negative studies
some articles, even those included in the
database. 13.7. Which of the following is a comparative
C. MEDLINE does not include all of the advantage of traditional (“narrative”) reviews
world’s journals. over systematic reviews?
D. MEDLINE can be simultaneously
A. Readers can confirm that evidence cited
searched for both content area and
is selected without bias.
methods.
B. Narrative reviews can review a broad
range of questions bearing on the care of
13.3. A systematic review of antiplatelet drugs
a condition.
for cardiovascular disease prevention
C. They rely on the experience and
combined individual patients, not trials.
judgment of an expert in the field.
Which of the following is an advantage of
D. They provide a quantitative summary of
this approach?
effects.
A. It is more efficient for the investigator. E. The scientific strength of the studies cited
B. Subgroup analyses are possible. is explicitly evaluated.
C. It is not necessary to choose between
fixed and random effects models when 13.8. Which of the following is not always part of
combining data. a typical forest plot?
D. Publication bias is less likely.
A. The number of studies that meet high
standards for quality
13.4. Which of the following is not generally used
B. A summary or pooled effect size with
to define a specific clinical question studied
confidence interval
by randomized controlled trials?
C. Point estimates of effect size for each
A. Covariates that were taken into study
account D. Confidence intervals for each study
B. Interventions (e.g., exposure or E. The size or weight contributed by each
experimental treatment) study
C. Comparison group (e.g., patients taking
placebo in a randomized trial) 13.9. Which of the following cannot be used for
D. Outcomes identifying severity of illness as a reason for
E. Patients in the trials heterogeneity in a systematic review with
meta-analysis?
13.5. Which of the following best describes ways
A. Controlling for severity of illness in a
of measuring study quality?
study-level meta-analysis
A. The validity of summary measures of B. A mathematical model relating average
study quality is well established. severity of illness to outcome across the
B. A description of study quality is a useful component studies
part of systematic reviews. C. A comparison of summary effect sizes
C. In a summary measure of quality, in studies stratified by mean severity of
strengths of the study can make up for illness
weaknesses. D. Subgroup analyses of patient-level data
224 Clinical Epidemiology: The Essentials
13.10. Which of the following is the best justifica- 13.11. Which of the following is an advantage of
tion for combining the results of five studies the random effects model over the fixed
into a single summary effect? effect model?
A. The patients, interventions, and A. It can be used for studies with time-to-
outcomes are relatively similar. event analyses.
B. All studies are of high quality. B. It describes the summary effect size for a
C. A statistical test does not detect single, narrowly defined question.
heterogeneity. C. It is better suited for meta-analyses of
D. Publication bias has been ruled out by a diagnostic test performance.
funnel plot. D. It gives more realistic results when there
E. The random effects model will be used to is heterogeneity among studies.
calculate the summary effect size. E. Confidence intervals tend to be narrower.
REFERENCES
1. Sackett DL, Richardson WS, Rosenberg W, et al. Evidence- 8. Bjelakovic G, Nikolova D, Simonetti RG, et al. Antioxidant
based medicine. How to practice and teach EBM. New York; supplements for prevention of gastrointestinal cancers: a sys-
Churchill Livingstone; 1997. tematic review and meta-analysis. Lancet 2004;364:1219–
2. Reichenbach S, Sterchi R, Scherer M, et al. Meta-analysis: 1228.
chondroitin for osteoarthritis of the knee and hip. Ann Intern 9. Askie LM, Ballard RA, Cutter GR, et al. Inhaled nitric oxide
Med 2007;146:580–590. in preterm infants: an individual-level data meta-analysis of
3. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality randomized trials. Pediatrics 2011;128:729–739.
of reports of randomized clinical trials: Is blinding necessary? 10. Padwal R, Klarenbach S, Wiebe N, et al. Bariatric surgery: a
Controlled Clinical Trials 1996;17:1–12. systematic review and network meta-analysis of randomized
4. Balk EM, Bonis PAL, Moskowitz H, et al. Correlation of qual- trials. Obes Rev 2011;12:602–621.
ity measures with estimates of treatment effect in meta-analyses 11. Juni P, Nartey L, Reichenbach S, et al. Risk of cardiovascu-
of randomized controlled trials. JAMA 2002;287:2973–2982. lar events and rofecoxib: cumulative meta-analysis. Lancet
5. Verdon F, Burnand B, Stubi CL, et al. Iron supplementation 2004;364:2021–2029.
for unexplained fatigue in nonanemic women: double-blind, 12. Douketis J, Tosetto A, Marcucci M, et al. Risk of recurrence
randomized placebo controlled trial. BMJ 2003;326:1124– after venous thromboembolism in men and women: patient
1228. level meta-analysis. BMJ 2011;342:d813.
6. El-Tawil S, Al Musa T, Valli H, et al. Quinine for muscle 13. van der Windt DA, Simons E, Riphagen II, et al. Physical
cramps. Cochrane Database Syst Rev 2010;(12):CD005044. examination for lumbar radiculopathy due to disc herniation
7. Lewis S, Clarke M. Forest plots: trying to see the wood and the in patients with low-back pain. Cochrane Database Syst Rev
trees. BMJ 2001;322:1479–1480. 2010;(2):CD007431.
Chapter 14
Knowledge
Management
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?
—T.S. Eliot
1934
Table 14.1
Grading Recommendations for Treatment According to the Quality of Evidence
(Confidence in Estimate of Effect, A–C) and Strength of Recommendation (1–2) with
Implications. Based on GRADE Guidelines
Example Solutions
Clinical Colleagues
A patient sees you because he will be traveling
to Ghana and wants advice on malaria prophy- A network of colleagues with various and complemen-
laxis. You are aware that the malaria parasite’s tary expertise is a time-honored way of getting point
of care information. Many clinicians have identified
Chapter 14: Knowledge Management 229
local opinion leaders for this purpose. Of course, those also make explicit the evidence base and rationale for
opinion leaders must have their own sources of infor- those recommendations. Like evidence-based medi-
mation, presumably more than just other colleagues. cine, guidelines are meant to be a starting place for
decision making about individual patients, to be
Electronic Textbooks modified by clinical judgment; that is, they are guide-
lines, not rules. High-quality guidelines represent the
Textbooks, even libraries, are on the Internet and
wise application of research evidence to the realities of
made available to clinicians by their medical schools,
clinical care, but guidelines vary in quality. Table 14.3
health systems, and professional societies. For exam-
ple, UpToDate (http://www.uptodate.com) is an elec-
tronic information resource for clinicians, the product Table 14.3
of thousands of physician–authors and editors cover- Standards for Trustworthy Clinical
ing 9,000 topics in the equivalent of 90,000 printed Practice Guidelines
pages (if it were ever printed).† Information is contin-
ually updated, peer reviewed, searchable and linked to Standard Explanation
abstracts of the original research, and recommenda-
Transparency How the guideline was developed
tions are graded. UpToDate is available at the point of and funded has been made explicit
care throughout the world wherever the Internet can and is publically accessible.
be accessed by computers or mobile platforms.
Conflict of Interest Group members’ conflicts of interest
related to financial, intellectual,
institutional, and patient/public
Example activities bearing on the guideline
are disclosed.
One author was seeing patients in Boston dur- Group Group membership was
ing the anthrax scare in 2001. Around that Composition multidisciplinary and balanced,
time, biologic terrorists spread anthrax spores comprising a variety of
through the U.S. postal system, resulting in doz- methodological experts and
ens of cases and five deaths. A young woman clinicians, and populations expected
to be affected by the guideline.
came to urgent care because she was worried
that a recent skin lesion was caused by anthrax. Systematic Review Recommendations are based on
When told it was not, she responded (somewhat systematic reviews that met high
impolitely), “How do you know, you’ve never standards for quality.
seen it?” To which her doctor responded, “Of Evidence and Each recommendation is
course not, none of us has, but we know what Strength of accompanied by an explanation for
it looks like. Come with me and I’ll show you.” Recommendation its underlying reasoning, the level of
Using UpToDate on an office computer, he was confidence in the evidence, and the
strength of the recommendation.
able to show the patient several pictures of an-
thrax skin lesions, which looked quite different Description of The guideline states precisely what
from hers, and she gained confidence that she Recommendations the recommended action is and
was not, in fact, the next anthrax victim. under what circumstances it should
be performed.
External Review The guideline has been reviewed
Other textbooks, such as ACP Medicine, by the full spectrum of relevant
Harrison’s Online, and many subspecialty textbooks, stakeholders (e.g., scientific and
are also available in electronic form. clinical experts, organizations, and
patients).
Clinical Practice Guidelines Updating The guideline reports the date of
publication and evidence review
Clinical practice guidelines are advice to clinicians and plans for updating when
about the care of patients with specific conditions. In there is new evidence that would
addition to giving recommendations, good guidelines substantially change the guideline.
Modified from Institute of Medicine. Clinical Practice Guidelines We
† Can Trust. Washington, DC: National Academies Press; 2011. The
Robert and Suzanne Fletcher are among hundreds the editors standards were for developing guidelines and have been modified
of UpToDate. to guide users in recognize guidelines they can trust.
230 Clinical Epidemiology: The Essentials
100
100.0
86.5
80 79.3
72.8
Percent of articles
69.9
66.3
60 62.0
55.5
48.3
40
38.9
28.8
20
17.3
0
5 10 15 20 25 30 35 40
Number of journals
Figure 14.1 ■ How many journals would you have to read to keep up with
the literature in your field? The proportion of scientifically strong, clinically rel-
evant articles in internal medicine according to the number of journals, in descending
order of yield. (Data from ACP Journal Club, 2011).
review by editors guided by peer review, comments test evaluation, systematic review, etc.) (Table 14.4).
by experts in the article’s content area and methods Readers can use these checklists to see if all the necessary
who provide advice on whether to publish and how a information is included in an article, just as investiga-
manuscript (the term for the article before it is pub- tors use them to assure that their articles are complete.
lished) could be improved. The reviewers are advisors Journals themselves are not particularly helpful for
to the editor (or editorial team), not the ones who some elements of knowledge management. Reading
directly decide the fate of the manuscript. Peer-review individual journals is not a reliable way of keeping
and editing practices, along with the evidence base up with new scientific developments in a field or for
and rationale for them, are summarized on the official looking up the answers to clinical questions.
Web site of the World Association of Medical Edi- But journals do add another dimension: exposing
tors (http://www.wame.org) and the International readers to the full breadth of their profession. Opin-
Committee of Medical Journal Editors (http://www. ions, stories, untested hypotheses, commentary on
icmje.org). Peer review and editing improve manu- published articles, expressions of professional values, as
scripts, but published articles are far from perfect (5); well as descriptions of the historical, social, and politi-
therefore, readers should be grateful for the journals’ cal context of current-day medicine and much more,
efforts to make articles better but also maintain a reflect the full nature of the profession (Table 14.5).
healthy skepticism about the quality of the end result. The richness of this information completes the clinical
Working groups have defined the information that picture for many readers. For example, when Annals of
should be in a complete research article according to the Internal Medicine began publishing stories about being
type of study (randomized controlled trial, diagnostic a doctor (6), many readers remarked that while reports
Table 14.4
Guidelines for Reporting Research Studies
Figure 14.2 ■ Reading a journal article in layers. Individual readers can progress deeper
into an article or stop and go on to another, according to its scientific strength and clinical im-
portance to them.
234 Clinical Epidemiology: The Essentials
disease, the clinical presentations of illness, the dif- colleagues outside their specialty about patient care
ference between isolated observations and consistent decisions. They have a better basis for deciding how
patterns of evidence, and much more. All of this is to delegate some aspects of their information needs.
a valuable complement to what patients bring to They can gain more confidence and experience greater
the encounter—intense interest in a specific clinical satisfaction with the intellectual aspects of their work.
question and the willingness to spend lots of time Beyond that, every clinician should have a plan for
searching for answers. knowledge management, one that fits his or her par-
ticular needs and resources. The Internet must be an
PUTTING KNOWLEDGE important part of the plan because no other medium
MANAGEMENT INTO PRACTICE is so comprehensive, up-to-date, and flexible. Much
of the information needed to guide patient care deci-
Clinical epidemiology, as described in this book, is sions should be available at the point of care so that it
intended to make clinicians’ professional lives easier can be brought to bear on the patient at hand. There
and more satisfying. Armed with a sound grounding is no reason why the information you use cannot be
in the principles by which the validity and generaliz- the best available in the world at the time, as long as
ability of clinical information are judged, clinicians can you have access to the Internet.
more quickly and accurately detect whether the scien- A workable approach to knowledge manage-
tific basis for assertions is sound. For example, they can ment must be active. Clinicians should set aside time
see when confidence intervals are consistent with clini- periodically to revisit their plan, to learn about new
cally important benefit or harm or that a study of the opportunities as they arise, and to acquire new skills
effects of an intervention includes neither randomiza- as they are needed. There has never been a time when
tion nor other efforts to deal with confounding. They the evidence base for clinical medicine was so strong
are better prepared to participate in discussions with and accessible. Why not make the most of it?
Review Questions
Read the following statements and select the C. Guarantee that the information they
best answer. contain is beyond reproach.
D. Expose you to the many dimensions of
14.1. You are finishing residency and will begin your profession.
practice. You want to establish a plan for
keeping up with new developments in your 14.3. Many children in your practice have attacks
field even though there are few professional of otitis media. You want to base your
colleagues in your community. All of the management on the best available evidence.
following might be useful, but which will be Which of the following is the least credible
most useful to you? source of information on this question?
A. Subscribe to a few good journals. A. A clinical practice guideline by a major
B. Buy new editions of printed textbooks. medical society
C. Subscribe to a service that reviews the B. A systematic review published in a major
literature in your field. journal
D. Search MEDLINE at regular intervals. C. The Cochrane Database of Systematic
E. Keep up contacts with colleagues in Reviews
your training program by e-mail and D. The most recent research article on this
telephone. question
14.2. You can rely on the best general medical 14.4. A search of MEDLINE is especially useful
journals in your field to: for which of the following?
A. Provide answers to clinical questions. A. Finding all of the best articles bearing on
B. Assure that you have kept up with the a clinical question
medical literature. B. An efficient strategy for finding the good
articles
236 Clinical Epidemiology: The Essentials
C. Looking for reports of rare events 14.8. Which of the following should be least reas-
D. Keeping up with the medical literature suring to a patient about the quality of a
E. Being familiar with the medical Web site providing information about HIV?
profession as a whole
A. The site is sponsored by a governmental
agency and names its advisory board
14.5. Which of the following is accomplished by
members.
peer review of research manuscripts before
B. The site provides facts, not opinions.
they are published?
C. The primary source of information is
A. Exclude articles by authors with a conflict stated.
of interest. D. The author is a well-known expert in the
B. Make the published article accurate and field.
trustworthy. E. The date of the last revision is posted and
C. Relieve readers of the need to be skeptical recent.
about the study.
D. Decide for the editors about whether 14.9. Which of the following is not part of grading
they should publish the manuscript. clinical recommendations using the GRADE
system?
14.6. An author of an article showing that screen-
A. Deciding whether to use a diagnostic test
ing colonoscopy is more effective in prevent-
B. Takes into account the balance of benefits
ing colorectal cancer than other forms of
and harms
screening would have conflicts of interest if
C. Rates the quality of scientific evidence
he or she had any of the following except:
separately
A. Clinical income from performing D. Suggests how commonly and how
colonoscopies forcefully a treatment should be
B. Investment in a company that makes recommended
colonoscopies E. Rates the strength of the evidence and of
C. Investment in medical products in general recommendations separately
D. Publications of articles that have
consistently advocated colonoscopy as the 14.10. A comprehensive approach to managing
best screening test knowledge in your field would include
E. Rivalry with other scholars who advocate which of the following?
another screening test
A. Subscribing to some journals and
browsing them
14.7. Which of the following is the least useful way
B. Establishing a plan for looking up
of looking up answers to clinical questions at
information at the point of care
the point of care?
C. Finding a publication that helps you keep
A. Subscribing to several journals and keeping up with new developments in your field
them available where you see patients D. Identifying Web sites you can
B. Guidelines on http://www.guidelines.gov recommend to your patients
C. The Cochrane Library on the Internet E. All of the above
D. A continually updated electronic
textbook Answers are in Appendix A.
REFERENCES
1. Laine C, Horton, R, DeAngelis CD, et al. Clinical trial registra- 4. Losanoff JE, Sauter ER, Rider KD. Cat scratch disease present-
tion: looking back and moving ahead. Lancet 2007;369:1909– ing with abdominal pain and retroperitoneal lymphadenopa-
1911. thy. J Clin Gastroentrol 2004;38:300–301.
2. Fletcher RH, Black B. “Spin” in scientific writing: scientific 5. Goodman SN, Berlin J, Fletcher SW, et al. Manuscript quality
mischief and legal jeopardy. Med Law 2007;26(3):511–525. before and after peer review and editing at Annals of Internal
3. Shariff SZ, Sontrop JM, Haynes RB, et al. Impact of PubMed Medicine. Ann Intern Med 1994;121:11–21.
search filters on the retrieval of evidence for physicians. CMAJ 6. Lacombe MA, ed. On Being a Doctor. Philadelphia: American
2012;184:303. College of Physicians; 1995.
Appendix A
1.1 D. Samples can give a misleading impression older patients respond to treatment the same
of the situation in the parent population, espe- as younger patients. Internal validity is about
cially if the sample is small. whether the results are correct for the patients
in the study, not about whether they are cor-
1.2 E. Generalizing the results of a study of men to
rect for other kinds of patients. Both bias and
the care of a woman assumes that the effective-
chance affect internal validity.
ness of surgery for low back pain is the same
for men and women. 1.10 B. If volunteers had the same amount of exercise
as the non-volunteers, a difference in the rate of
1.3 B. The two treatment groups did not have an
CHD could not be explained by this variable.
equal chance of having pain measured.
Differences in the groups (selection bias) or in
1.4 A. The difference in recovery between patients the methods used to determine CHD (measure-
who received surgery versus medical care may ment bias) could account for the finding.
be the result of some other factor, such as age,
1.11 C. The drug’s effect on mortality is the most
that is different between the two treated groups
important thing to determine. It is possible
and not the result of surgery itself.
that some arrythmia-suppressing drugs can
1.5 B. These are biases related to measurement of increase the rate of sudden death (in fact, this
the outcome (recovery from pain). has happened). In such situations, the interme-
diate biologic outcome of decreased arrhyth-
1.6 C. The other medical conditions confound the
mias is an unreliable marker of the clinical
relationship between treatment and outcome;
outcome—sudden death.
that is, they are related to both treatment
and recovery and might be the reason for the 1.12 B. Measurement bias is frequently an issue
observed differences. when patients are asked to recall something
that may be related to the illness, because
1.7 C. The observation that histamines mediate
those with illness may have heightened recall
inflammation in hay fever leads to a promis-
for preceding events that they think might be
ing hypothesis that blocking histamines will
related to their illness.
relieve symptoms, but the hypothesis needs to
be tested in people with hay fever. The other 1.13 B. The questioning (measurement) about con-
answers all assume more about the causes of traceptive use was not the same in the two
symptoms than is actually stated. For example, groups.
histamine is only one of many mediators of
1.14 D. Small study numbers increase the pos-
inflammation in hay fever.
sibility that chance accounts for differences
1.8 C. Samples may misrepresent populations between groups.
by chance, especially when the samples are
1.15 A. Different neighborhoods often are sur-
small.
rogates for socioeconomic variables that are
1.9 A. Generalizing from younger to older patients related to numerous health outcomes, which
is a matter of personal judgment based on may mean the two groups are different in
whatever facts there are that bear on whether terms of important covariates.
237
238 Appendix A: Answers to Review Questions
CHAPTER 2 FREQUENCY
2.1 C. The population at risk is dynamic—people 2.9 E. A larger sample would produce an estimate
are entering and leaving it continually, so the of incidence that is closer to the true one by
rates from cancer registries should be reported reducing random error. It would not intro-
as person-years. duce a systematic difference in incidence as
A–D would.
2.2 A. This is point prevalence because cases are
described for a point in time in the course of 2.10 A. It is endemic because it is confined to a geo-
their life. graphic area.
2.3 A. There is no follow-up (no time dimension) 2.11 E. The cases existed any time in a 3-month
in a prevalence study. period.
2.4 D. A random or probability sample is represen- 2.12 C. Dynamic populations, such as residents of
tative of the population sampled in the long run, the State of North Carolina, are continually
if enough people in the population are sampled. turning over because of births and deaths as
well as out-migration and in-migration.
2.5 B. In a steady state, duration of disease =
prevalence/incidence. In this case, it is 1/100 2.13 A. Children in the cohort have in common
divided by 40/10,000/year = 25 years. being born in North Carolina in 2012 and
are followed up for scoliosis as it develops over
2.6 D. A study beginning with existing patients
time.
with a disease and looking back at their pre-
vious experience with the disease would not 2.14 D. Prevalence studies are not useful for dis-
be a cohort study, which begins with patients eases of short duration because there will be
with something in common (e.g., new onset few cases at a point in time. They do not mea-
of disease) and follows them forward in time sure incidence and cannot provide much evi-
for subsequent health events. dence bearing on cause.
2.7 C. Even though sampling every 10th patient 2.15 E. A number like 800,000 without an associ-
might produce a representative sample, it ated denominator is not a rate and is, there-
would not be random and could misrepresent fore, not any of the incidence and prevalence
the population sampled. rates given in A–D.
2.8 C. Children with another seizure were all part
of the original cohort of children with a first
seizure.
CHAPTER 3 ABNORMALITY
are usually not immediately prescribed medi- the measurements (interobserver variability)
cation to lower cholesterol. could also be at play.
3.11 B. The figure shows one mode (hump). The 3.15 G. Panel B shows skewed measurements to
median and mean are similar to each other, the right of the true value. In other words, the
and are both below 4,000 g. hospital staff tended to overestimate the elec-
tronic monitor results and record normal fetal
3.12 C. Two standard deviations encompass 95%
heart rates when they were abnormal on elec-
of the values. The distribution is not skewed.
tronic monitoring, thus demonstrating biased
Range is sensitive to extreme values.
measurements. Chance and interobserver vari-
3.13 B. One standard deviation encompasses ability may also be involved because not all the
about 2/3 of values around the mean. Look- measurements are the same.
ing at the figure, 2/3 of the values around
3.16 G. Panel C shows similar results to Panel B
the mean would be approximately 3,000 to
except that the bias is in the other direction,
4,000 g.
that is, the hospital staff tended to underesti-
3.14 D. Panel A of Figure 3.13 shows approximately mate the electronic monitor results. In both
even dispersion of measurements above and Panels B and C, the hospital staff measure-
below the true value, suggesting chance varia- ments tended to “normalize” what were abnor-
tion. The effect of different observers making mal electronic monitor measurements.
4.1 A. When a risk factor is common in a popula- 4.6 B. Well-constructed risk models are more
tion, it is important to compare its prevalence likely to be highly calibrated (able to predict
in people with and without disease. There is the percentage of a group that develops dis-
no comparison in the example. Lung cancer ease) than have good discrimination (able to
is a common cancer, and smoking is a strong, predict which individuals in the group will
not weak, risk factor for cancer. The fact that develop disease).
there are other risk factors for lung cancer is
4.7 A. Markers help identify groups with
not relevant.
increased risk of disease, but because they are
4.2 B. Risk factors are sought when a new disease not causes of disease, removing the marker
appears, as was the case with HIV. does not prevent disease. Markers are usually
confounded with cause of disease (see Chap-
4.3 D. Risk prediction models are used in all situ-
ters 1 and 5).
ations.
4.8 B. Symptoms, physical examination find-
4.4 C. Because so many more women were
ings, and laboratory tests are generally more
assigned to the low-risk stratum, the largest
important than risk factors when diagnosing
number of cases will likely come from that
disease.
group. Risk prediction models give prob-
abilities but do not predict which persons in a 4.9 D. The overlap of the two groups demonstrates
group will develop disease. The figure does not that the risk model does not discriminate well
give information about incorrect assignments between women who did and did not develop
to the strata. breast cancer over 5 years. The figure gives no
direct information about calibration and strat-
4.5 C. The patient is unlikely to develop colorec-
ification. Few women who developed breast
tal cancer in the next 5 years because he is a
cancer were at high risk.
member of the group with a low probability,
but he could be one of the 2% who develops 4.10 C. Good calibration does not impair discrimi-
disease. Even for a common cancer, there are nation. However, risk models can discriminate
subgroups of people with a low probability of poorly even if they are highly calibrated, as the
developing it. For example, few young people example in the text shows.
develop colorectal cancer.
240 Appendix A: Answers to Review Questions
want to be sure that the patient does not have 5.12 B. The degree of illness may have confounded
other indications of increased risk of throm- the results in this study so that sicker patients
bosis such as age, smoking, or a family or per- are more likely to take aspirin. One way to
sonal history of clotting problems. This kind examine this possibility and adjust for con-
of decision requires careful judgment from founding if it is present is to stratify the users
both patient and clinician. However, using and non-users into groups with similar indi-
absolute or attributable risk will clarify the risk cations for using aspirin and compare death
for the patient better than using relative risk rates in the subgroups.
when discussing clinical consequences.
6.1 B. If exposure to oral contraceptives was nested in a cohort, a cohort analysis of the
recorded just after the myocardial infarction, same data can).
it could not have been a cause of it. All of the
6.9 B. Matching is used to control for variables
other choices could have artificially increased
that might be strongly related to exposure or
exposure in cases, resulting in a falsely elevated
disease, to be sure that at least the cases and
odds ratio.
controls do not differ on those variables.
6.2 E. Even an exemplary cases-control study such
6.10 C. The crude odds ratio is obtained by creat-
as this one should not claim that it has identi-
ing a 2 × 2 table relating the number of cases
fied a cause because unmeasured confounding
versus controls to the number of exposed ver-
is always possible.
sus non-exposed people and dividing the cross
6.3 E. Case-control studies produce only odds ratios, products. In this case, the odds ratio is 60 × 60
which can be used to estimate relative risk. divided by 40 × 40 = 2.25.
6.4 C. Epidemic curves describe the rise and fall 6.11 E. Case-control studies cannot study multiple
in the number of cases over the time of the outcomes because they begin with the pres-
outbreak. ence or absence of only one disease, cannot
report incidence, and cases should be incident
6.5 A. One can always obtain a crude relative risk
(new onset), not prevalent.
from a cohort study. However, if the cohort
data do not contain all of the variables that 6.12 D. Odds ratios based on prevalent cases can
should be controlled for, a case-control anal- provide a rough measure of association but
ysis of the cohort study is a more efficient not a comparison of risk, which is about inci-
approach to including the additional data dent (new-onset) cases.
because the data needs to be collected only for
6.13 D. During the early phases of an outbreak, the
cases and controls, not for the entire cohort.
offending microbe or toxin is usually known,
6.6 D. It would be better to sample cases and but even if it is not, the most pressing question
controls from a cohort rather than a dynamic is how the disease is being spread. This infor-
population, especially if exposure or disease is mation can then be used to stop the outbreak
changing rapidly over time and if controls are and identify the source.
not matched to the date of onset of disease for
6.14 C. If a case-control study is based on all or a ran-
cases.
dom sample of cases and a random sample of
6.7 E. Multiple control groups (not to be confused controls from a population or cohort, the cases
with multiple controls per case) are a way of and controls should be similar to each other on
examining whether the results are “sensi- characteristics other than exposure.
tive” to the types of controls chosen, that is,
6.15 D. Case-controls studies do not provide infor-
whether the results using the different control
mation on incidence.
groups are substantially different, calling the
results into question. 6.16 E. The odds ratio approximates the relative
risk when the disease is rare; a rule of thumb is
6.8 C. Case-control studies do not provide infor-
<1/100.
mation on incidence (although if they are
242 Appendix A: Answers to Review Questions
CHAPTER 7 PROGNOSIS
7.1 A. Zero time is at the onset of disease (in this 7.8 B. Survival curves estimate survival in a cohort,
case, Barrett’s esophagus), not the time of out- after taking into account censored patients,
come events (in this case, esophageal cancer). and do not directly describe the proportion of
the original cohort who survived.
7.2 C. Different rates of dropping out would bias
results only if those who dropped out had a 7.9 E. All of the responses might be correct
systematically different prognosis from those depending on the research question. For exam-
who remained. ple, it would be useful to know prognosis both
among patient in primary care and also among
7.3 E. Responses A–D are all possible reasons for
those referred to specialists.
measurement bias, whereas a true difference
in incontinence rates between patients treated 7.10 B. Because patients had to meet stringent crite-
with surgery versus medical care would not be ria to be included in the trial, the clinical course
a systematic error. of those assigned to usual care would not be rep-
resentative of patients with the disease, as they
7.4 E. Responses A–D are all features of clinical
occur in the general population or in a defined
prediction rules.
clinical setting. Therefore, the results would
7.5 C. Case series are a hybrid research strategy, not not be generalizable to any naturally occurring
describing clinical course in a cohort nor rela- group of patients with multiple sclerosis.
tive risk with a case-control approach nor study-
7.11 B. Even if clinical prediction rules are con-
ing a representative sample of prevalent cases.
structed using the best available methods, the
7.6 C. Patients would be censored for any reason strongest test of their ability to predict is that
that removes them from the study or causes they have been shown to do so in patients
them not to be in the study for as long as other that the ones used to develop the rule.
3 years. If the outcome is survival, having
7.12 D. A hazard ratio is calculated from informa-
another potentially fatal disease would not
tion in a survival curve and is similar but not
matter if they had not yet died of it.
identical to relative risk.
7.7 A. Prognosis is about disease outcomes over
7.13 C. Outcome events in time-to-event analyses
time, and prevalence studies do not measure
are either/or events that occur only once.
events over time.
CHAPTER 8 DIAGNOSIS
First, determine the numbers for each of the four cells 8.6 A. 38%
in Figure 8.2: a = 49; b = 79; c = 95 – 49 (or 46); d =
1. Calculate LR of a positive test (patient has
152 – 79 (or 73). Add the numbers to determine the
facial pain)
column and row totals.
49/( 49 + 46)
8.1 C. 52% Sensitivity = 49/95 = 52% LR + =
79/(79 + 73)
8.2 B. 48% Specificity = 73/152 = 48%
= 1.0
8.3 A. 38% Positive predictive value = 49/128 =
2. Convert pretest probability to pretest odds
38%
Pretest odds = prevalence/(1 – prevalence)
8.4 D. 61% Negative predictive value = 73/119 =
61% = 0.38/(1 – 0.38)
= 0.61
8.5 A. 38% Prevalence of sinusitis in this practice =
95/247 = 38% 3. Calculate posttest odds by multiplying pre-
test odds by LR+
Posttest odds = 0.61 × 1.0
= 0.61
Appendix A: Answers to Review Questions 243
CHAPTER 9 TREATMENT
9.1 A. Every effort was made to make this trial the measure of effect, like usual care, is affected
as true to life as possible by comparing drugs by drop-outs, reducing the observed treatment
in common use, having broad eligibility cri- effect over what it would have been if everyone
teria, not blinding participants, allowing care took the treatment they were assigned to.
to proceed as usual, and relying on a patient-
9.3 A. All characteristics at the time of random-
centered outcome rather than a laboratory
ization, such as severity of disease, are ran-
measurement. Although the trial is for effi-
domly allocated. Characteristics arising after
cacy, it is better described as practical and it is
randomization, such as retention, response to
certainly not large.
treatment, and compliance, are not.
9.2 D. Intention-to-treat analysis, counting out-
9.4 D. The greatest advantage of randomized tri-
comes according to the treatment group that
als over observational studies is prevention of
patients `were randomized to, tests the effects
confounding. Randomization creates compar-
of offering treatment, regardless of whether
ison groups that would have the same outcome
patients actually take it. It is, therefore, about
rates, on average, were it not for intervention
effectiveness in ordinary circumstances and
effects.
244 Appendix A: Answers to Review Questions
9.5 E. The study had extensive inclusion and exclu- be some evidence bearing on the comparison,
sion criteria. This would increase the extent to but the evidence should not be conclusive.
which patients in the trial were similar to each
9.11 D. Because the new drug has advantages over
other, making it easier to detect treatment
the old one, but comparative effectiveness is
differences if they exist, but at the expense of
unknown, the appropriate randomized trial
generalizability, the ability to extrapolate from
comparing the two would be a non-inferiority
study results to ordinary patient care.
trial to establish whether the new drug is no
9.6 A. Intention-to-treat analyses describe the less efficacious.
effects of being offered treatments, not neces-
9.12 A. Making the primary outcome of a trial a com-
sarily taking them. To describe the effects of
posite of clinically important and related out-
actually receiving the intervention, one would
comes increases the number of outcome events
have to treat the data as if they were from a
and, therefore, the ability of the trial to detect
cohort study and use a variety of methods to
an effect if it is present. The disadvantage is that
control for confounding.
the intervention may affect the component out-
9.7 B. Stratified randomization is one approach to comes differently, and reliance on the composite
control of confounding, especially useful when outcome alone might mask this effect.
a characteristic is strongly related to outcome—
9.13 C. Both bad luck in randomization and break-
and also when the study is small enough that
one worries that randomization might not cre- down in allocation concealment (by bad meth-
ate groups with similar prognosis. ods or cheating) would show up as differences
in baseline characteristics of patients in a trial.
9.8 C. In explanatory analyses, outcomes are attrib- Small differences are expected and the chal-
uted to the treatment patients actually receive, lenge is to decide how large the differences
not the treatment group they were randomized must be for them to raise concern.
to, which is an intention-to-treat analysis.
9.14 C. The usual drug trials reported in the clini-
9.9 B. Side effects of the drug, both symptoms and cal literature are “Phase III” trials, intended
signs, would alert patients and doctors to who to establish efficacy or effectiveness. Fur-
is taking the active drug but could not affect ther study, with postmarketing surveillance,
random allocation, which was done before is needed to detect uncommon side effects.
drug was begun. Responses A and D are about what Phase I
9.10 C. For a randomized controlled trial to be ethi-
and Phase II trials are meant to establish.
cal, there should not be conclusive evidence 9.15 B. Prevention of confounding is the main
that one of the experimental treatments is bet- advantage of randomized controlled trials,
ter or worse than the other—that is, the scien- which is why they are valued despite being eth-
tific community should be in a state of “equi- ically complicated, slower, and more expensive.
poise” on that issue. There may be opinions as Whether they resemble usual care depends on
to which is better, but no consensus. There may how the trial is designed.
CHAPTER 10 PREVENTION
10.1 A. 33%. The relative risk reduction of colorec- 0.33 or 33% reduction. An alternative
tal cancer mortality is the absolute risk reduc- approach calculates the complement of rela-
tion divided by the cancer mortality rate in the tive risk. In the example, the relative risk of
control group. (See Chapter 9.) colorectal mortality in the screened compared
The colorectal cancer mortality rate in the to the control group was 0.0053/0.0079 =
screened group = 82/15,570 = 0.0053 0.67. The complement of 0.67 = 1.00 – 0.67 =
The colorectal cancer mortality rate in the 0.33 or 33%.
control group = 121/15,394 = 0.0079
10.2 C. 385. The number needed to screen is
Absolute risk reduction = 0.0079 – 0.0053 =
the reciprocal of absolute risk reduction,
0.0026
1/0.0026, or 385.
Relative risk reduction of colorectal cancer
mortality due to screening = 0.0026/0.0079 =
Appendix A: Answers to Review Questions 245
10.3 C. If 30% of the screened group were found every day circumstances, in which some people
to have polyps, at least 4,671 (0.3 × 15,570) in the intervention group did not receive the
had a false-positive test for colon cancer. If the vaccine)? (ii) Is the vaccine safe? (iii) Is the bur-
sensitivity of the test was 90% and the number den of suffering of the condition the vaccine
of cancers was 82, about 74 people (0.9 × 82) protects against important enough to consider a
had a true-positive result. The positive pre- preventive measure? and (iv) Is it cost-effective?
dictive value of the test, therefore, would be Cost of the vaccine is only one component of an
about 74/(74 + 4,671), or 1.6%. (Using exact analysis of cost-effectiveness.
numbers, the authors calculated a positive
10.8 C. Disease prevalence is lower in presumably
predictive value of 2.2%). Negative predictive
well people (screening) than in symptom-
value was not calculated, but it would be high
atic patients (diagnosis). As a result, positive
because the incidence of cancer over 13 years
predictive value will be lower in screening.
was low (323/15,570 or 21 per thousand) and
Because screening is aimed at picking up early
about 90% of tests were negative.
disease, the sensitivity of most tests is lower in
10.4 B. Lead time (the period time between the screening than in patients with more advanced
detection of a disease on screening and when disease. Overdiagnosis is less likely in diagnos-
it would ordinarily be diagnosed because the tic situations when symptomatic patients have
patient seeks medical care due to symptoms) more late-stage disease.
can be associated with what appears to be an
10.9 C. Volunteers for preventive care are more
improvement in survival. The fact that mortal-
compliant with advice about medical care
ity did not improve after screening in this ran-
and usually have better health outcomes than
domized trial raises the possibility that lead-
those who reject preventive care. This effect is
time bias is responsible for the result. Another
so strong that it is seen even when volunteers
possible cause of the finding is overdiagnosis.
are taking placebo medications.
10.5 A. See answer for 10.4.
10.10 B. The incidence method is particularly useful
10.6 C. Overdiagnosis, the detection of lesions that when calculating sensitivity for screening tests
would not have caused clinical symptoms or because it takes into account the possibility of
morbidity, is likely because, even after 20 years, overdiagnosis. The gold standard for screen-
the number of cancers in the control group ing tests almost always involves a follow-up
remained fewer than that in the screened interval. The first round of screening picks up
group. Increased numbers of cancers in the prevalent as well as incident cases, inflating the
screened group could occur if there were more number compared to later screening rounds.
smokers in the screened group, but randomiza-
10.11 E. Cost-effectiveness should be estimated in
tion should have made that possibility unlikely.
a way that captures all costs of the preventive
10.7 D. Several important questions are to be activity and all the costs associated with diag-
considered when assessing a new vaccine, nosis and treatment, wherever they occur, to
including: (i) Has the vaccine been shown to determine the costs of the preventive activity
be efficacious (did it work under ideal circum- from a societal perspective. All costs that occur
stances when everyone in the intervention group when prevention is not done are subtracted
received the vaccine and no one in a comparable from all costs when prevention is done to esti-
control group did) and was it effective (under mate the cost for a given health effect.
CHAPTER 11 CHANCE
11.1 C. Although subgroup analyses risk false- generalizability or clinical importance of the
positive and false-negative conclusions, they finding
provide information that can help clinicians as
11.3 E. The P value is not small enough to establish
long as their limitations are kept in mind
that a treatment effect exists and provides no
11.2 C. The P value describes the risk of a false- information on whether a clinically important
positive conclusion and has nothing directly effect could have been missed because of inad-
to do with bias and statistical power or the equate statistical power.
246 Appendix A: Answers to Review Questions
11.4 D. The result was statistically significant, if one rater, not sample size), 6,000 divided by 3 or
wanted to think of the role of chance in that a 1/2,000 event rate could be detected with
way, because it excluded a relative risk of 1.0. 6,000 people under observation.
Response D described the information the
11.8 E. Statistical power depends on the joint
confidence interval contributes.
effects of all of the factors mentioned in A–D.
11.5 B. Calling P ≤ 0.05 “statistically significant” is It may vary a bit with the statistical test used
a useful convention but otherwise has no par- to calculate it, but this is not the main deter-
ticular mathematical or clinical meaning. minate of sample size.
11.6 A. Models do depend on assumptions about 11.9 B. Bayesian reasoning is about how new infor-
the data, are not done in a standard way, and mation affects prior belief and has nothing
are meant to complement stratified analyses, to do with inferential statistics or the ethical
not replace them. Although they might be rationale for randomized trials.
used in large randomized trials, they are not
11.10 D. The results are consistent with a 1% up to a
particularly useful in that situation because
46% higher death rate, with the best estimate
randomization of a large number of patients
being 22% higher, and excludes a hazard ratio
has already made confounding very unlikely.
of 1.0 (no effect), so it is “statistically signifi-
11.7 C. Of the 12,000 people in the trial, about cant.” Confidence internals provide more infor-
6,000 would be in the chemoprevention arm mation than a P value for the same data because
of the trial. Applying the rule of thumb men- they include the point estimate and range of
tioned in this chapter (but solving for event values that is likely to include the true effect.
CHAPTER 12 CAUSE
12.1 B. It would be unethical to randomly allocate a individuals who got the disease were not the
potentially harmful activity, and people would ones who were exposed and that people who
probably not accept long-term restrictions on were exposed were not the ones who got the
their use of cell phones. disease, a problem called the “ecological fallacy.”
12.2 E. All responses, which reflect Bradford Hill 12.7 B. In a time-series study, a change in the rate
criteria, would be useful in deciding whether of disease after an intervention might be
cell phones cause brain cancer, but E, a lack of caused by other changes in local conditions at
specificity, is weaker than the rest. (Consider about the same time. This possibility needs to
how many diseases cigarette smoking causes.) be ruled out before one can have confidence
that the study exposure caused the outcome.
12.3 C. Figure 12.1 shows how several risk factors
act together to cause coronary heart disease. 12.8 B. Randomized controlled trials, if well
Nearly all diseases are caused by the joint effect designed and carried out, are the strongest evi-
of genes and environment. Figure 12.3 shows dence for cause and effect. But they are not
a huge decline in tuberculosis rate before effec- possible for many suspected causes because it
tive treatment was established in the 1950s. is not ethical to involve people is studies of
harm and because they may not be willing to
12.4 B. In the absence of bias, statistical significance
cooperate with such a trial.
establishes that an association is unlikely to be
by chance whereas the Bradford Hill criteria 12.9 E. The pattern of evidence from all of the
go beyond that to establish whether an asso- Bradford Hill criteria is more powerful than
ciation is cause-and-effect. evidence for any one of them.
12.5 C. Cost-effectiveness analyses describe finan- 12.10 D. The confidence interval is so wide that it
cial costs per clinical effects, such as year of life is consistent with either harm or benefit and,
saved, as in this example, for alternative tests therefore, does not even establish whether
and treatments. there is an association between cell phones and
brain cancer, let alone whether cell phones are
12.6 D. When exposure and disease are measured in
a cause.
groups, not individuals, it is possible that the
Appendix A: Answers to Review Questions 247
13.1 E. The main advantage of combining (pool- not meaningful to give each element the same
ing) study results in a systematic review is weight.
to have a larger sample size and as a result a
13.6 D. The usual situation is that small negative
more stable and precise estimate of effect size.
studies (ones that find no effect) are less likely
Although it could be argued that the results
to be published than small studies that find
are somewhat more generalizable because
effects. Large studies are likely to be published
they come from more than one time and
no matter what they find.
place, generalizability is not the main advan-
tage. 13.7 B. Narrative reviews can take a comprehen-
sive approach to the set of questions clinicians
13.2 D. MEDLINE can be searched using both
must answer to manage a patient but that is at
content and methods terms, but it does not
the expense of a transparent description of the
include all of the world’s journals, searches
scientific basis for the evidence cited.
miss some of the articles in MEDLINE, and
they are not a remedy for publication bias 13.8 B. Forest plots summarize the raw evidence in
since all citations in it are published. These a systematic review. Pooled effect size and con-
limitations are why reviewers need to use fidence interval may or may not be included,
other, complementary ways of searching for depending on whether it is appropriate to
articles in addition to MEDLINE. combine the study results.
13.3 B. Although individual studies usually have 13.9 A. It is not possible to control for a covariate
limited statistical power in subgroups, pooling in a study-level meta-analysis, as one might in
of several studies may overcome this problem, one of the parent studies or in a patient-level
as long as patients, not trials, are pooled. meta-analysis.
13.4 A. B–E are elements of PICO, core compo- 13.10 A. Pooling is justified if there is relatively little
nents of a specific research question. Covari- heterogeneity across studies, as determined by
ates are a critical aspect of the research, espe- an informed review of the patient, interven-
cially for observational studies and to identify tions, and outcomes. A statistical rest for het-
effect modification in clinical trials, but they erogeneity is also useful in general, but not in
are about how successfully the question can be this case, with so few studies, because of low
answered, not the question itself. statistical power.
13.5 B. Systematic reviews should include a descrip- 13.11 D. Random effects models take heterogeneity
tion of the methodologic strength of the into account, at least if it is not extreme, and,
studies to help users understand how strong therefore, result in wider confidence intervals,
the conclusions are. Combining individual which are more likely to be accurate than
elements of quality into a summary measure the narrower confidence intervals that would
is less well established, perhaps because it is result from a fixed effect model.
14.4 C. PubMed searches are indispensable for find- point of care (if computers and the relevant
ing whether a rare event has been reported. programs are available). Medical journals have
They are one important part of an effort to great value for other reasons, but they are not
find all published articles on a specific clini- useful for this purpose.
cal question and are useful for researchers, but
14.8 D. Patients and clinicians alike might well
they are too inefficient for most questions at
respect famous experts but should look for
the point of care.
more solid footing—the organization that
14.5 B. Although peer review and editing make sponsors them, facts and the source of those
articles better (more readable, accurate, and facts—when deciding whether to believe
complete), articles are far from perfect when them.
the process is over and they are published.
14.9 A. GRADE has been developed for treat-
14.6 C. Conflict of interest is in relation to a spe- ment recommendations, but not other clinical
cific activity and does not exist in the general questions.
case of investing in medical products not spe-
14.10 E. All are basic elements of a comprehensive
cifically related to colonoscopy.
knowledge management plan, as described in
14.7 A. B–E are all valuable resources for looking this chapter.
up the answers to clinical questions at the
Appendix B
Additional Readings
1. INTRODUCTION Hennekins CH, Buring JE. Epidemiology in Medicine.
Boston: Little, Brown and Company; 1987.
Clinical Epidemiology Jekel JF, Elmore JG, Katz DL. Epidemiology, Biostatistics
Feinstein AR. Why clinical epidemiology? Clin Res 1972;20: and Preventive Medicine, 3rd ed. Philadelphia: Elsevier/
821–825. Saunders; 2007.
Feinstein AR. Clinical Epidemiology. The Architecture of Rothman KJ. Epidemiology: An Introduction. New York:
Clinical Research. Philadelphia: WB Saunders; 1985. Oxford University Press; 2002.
Feinstein AR. Clinimetrics. New Haven, CT: Yale Univer-
sity Press; 1987. Related Fields
Hulley SB, Cummings SR. Designing Clinical Research. Brandt AM, Gardner M. Antagonism and accommodation:
An Epidemiologic Approach, 3rd ed. Philadelphia: Lip- interpreting the relationship between public health and
pincott Williams & Wilkins; 2007. medicine in the United States during the 20th century.
Riegelman RIC. Studying and Study and Testing a Test, 5th Am J Public Health 2000;90:707–715.
ed. Philadelphia: Lippincott Williams & Wilkins; 2005. Kassirer JP, Kopelman RI. Learning Clinical Reasoning.
Sackett DL. Clinical epidemiology. Am J Epidemiol 1969; Baltimore: Williams & Wilkins; 1991.
89:125–128. Sox, HC, Blatt MA, Higgins MC, et al. Medical Decision
Sackett DL, Haynes RB, Guyatt GH, et al. Clinical Epide- Making. Philadelphia, American College of Physicians,
miology: A Basic Science for Clinical Medicine, 2nd ed. 2006.
Boston: Little, Brown and Company; 1991. White KL. Healing the Schism: Epidemiology, Medicine, and
Weiss NS. Clinical Epidemiology: The Study of the Outcomes the Public’s Health. New York: Springer-Verlag; 1991.
of Illness, 3rd ed. New York: Oxford University Press; 2006.
2. FREQUENCY
Evidence-Based Medicine Morgenstern H, Kleinbaum DG, Kupper LL. Measures of
Guyatt G, Rennie D, Meade M, et al. User’s Guide to the disease incidence used in epidemiologic research. Int J
Medical Literature: Essentials of Evidence-Based Clini- Epidemiol 1980;9:97–104.
cal Practice, 2nd ed. Chicago: American Medical Asso-
ciation Press; 2008. 3. ABNORMALITY
Hill J, Bullock I, Alderson P. A summary of the methods
that the National Clinical Guideline Centre uses to Feinstein AR. Clinical Judgment. Baltimore: Williams &
produce clinical guidelines for the National Institute Wilkins; 1967.
for Health and Clinical Excellence. Ann Intern Med Streiner DL, Norman GR. Health Measurement Scales—A
2011:154:752–757. Practical Guide to Their Development and Use, 3rd ed.
Jenicek M, Hitchcock D. Evidence-Based Practice: Logic New York: Oxford University Press; 2003.
and Critical Thinking in Medicine. Chicago: American Yudkin PL, Stratton IM. How to deal with regression to the
Medical Association Press; 2005. mean in intervention studies. Lancet 1996;347:241–243.
Straus SE, Glasziou P, Richardson WS, et al. Evidence- 4. RISK: BASIC PRINCIPLES
Based Medicine: How to Practice and Teach It, 4th ed.
New York: Elsevier; 2011. Diamond GA. What price perfection? Calibration and dis-
crimination of clinical prediction models. J Clin Epide-
Epidemiology miol 1992;45:85–89.
Friedman GD. Primer of Epidemiology, 5th ed. New York: Steiner JF. Talking about treatment: the language of popu-
Appleton and Lange; 2004. lations and the language of individuals. Ann Intern Med
Gordis L. Epidemiology, 4th ed. Philadelphia: Elsevier/ 1999;130:618–622.
Saunders; 2009.
Greenberg RS, Daniels SR, Flanders W, et al. Medical Epi-
5. RISK: EXPOSURE TO DISEASE
demiology, 4th ed. New York: Lange Medical Books/ Samet JM, Munoz A. Evolution of the cohort study. Epide-
McGraw Hill; 2005. miol Rev 1998;20:1–14.
249
250 Appendix B: Additional Readings
6. RISK: FROM DISEASE TO EXPOSURE reflections from 4 current and former members of the
U.S. Preventive Services Task Force. Epidemiol Rev
Grimes DA, Schulz KF. Compared to what? Finding 2011;33:20–25.
controls for case-control studies. Lancet 2005;365: Rose G. Sick individuals and sick populations. In J Epidemiol
1429–1433. 30:427–432.
Wald NJ, Hackshawe C, Frost CD. When can a risk factor
7. PROGNOSIS be used as a worthwhile screening test? BMJ 1999;319:
Dekkers OM, Egger M, Altman DG, et al. Distinguish- 1562–1565.
ing case series from cohort studies. Ann Intern Med
2012;156:37–40. 11. CHANCE
Jenicek M. Clinical Case Reporting in Evidence-Based
Concato J, Feinstein AR, Holford TR. The risk of deter-
Medicine, 2nd Ed. New York: Oxford University Press;
mining risk with multivariable models. Ann Intern Med
2001.
1993;118:201–210.
Laupacis A, Sekar N, Stiell IG. Clinical prediction rules:
Goodman SN. Toward evidence-based statistics. 1: the P
a review and suggested modifications of methodologic
value fallacy. Ann Intern Med 1999;130:995–1004.
standards. JAMA 1997;277:488–494.
Goodman SN. Toward evidence-based statistics. 2: the
Vandenbroucke JP. In defense of case reports. Ann Intern
Bayes factor. Ann Intern Med 1999;130:1005–1013.
Med 2001;134:330–334.
Rothman KJ. A show of confidence. N Engl J Med 1978;
299:1362–1363.
8. DIAGNOSIS
McGee S. Evidence-Based Physical Diagnosis. New York: 12. CAUSE
Elsevier; 2007.
Ransohoff DF, Feinstein AR. Problems of spectrum and Buck C. Popper’s philosophy for epidemiologists. Int J Epi-
bias in evaluating the efficacy of diagnostic tests. N Engl demiol 1975;4:159–168.
J Med 1978;299:926–930. Chalmers AF. What Is This Thing Called Science?, 2nd ed.
Whiting P, Rutjes AWS, Reitsma JB, et al. Sources of varia- New York: University of Queensland Press; 1982.
tion and bias in studies of diagnostic accuracy: a systematic Morganstern H. Ecologic studies in epidemiology: con-
review. Ann Intern Med 2004;140:189–202. cepts, principles, and methods. Ann Rev Public Health
1995;16:61–81.
9. TREATMENT
13. SUMMARIZING THE EVIDENCE
Friedman LM, Furberg CD, DeMets DL. Fundamentals
of Clinical Trials, 3rd ed. New York: Springer-Verlag; Goodman S, Dickersin K. Metabias: a challenge for compar-
1998. ative effectiveness research. Ann Intern Med 2011;155:
Kaul S, Diamond GA. Good enough: a primer on the 61–62.
analysis and interpretation of noninferiority trials. Ann Lau J, Ioannidis JPA, Schmid CH. Summing up the evidence:
Intern Med 2006;145:62–69. one answer is not always enough. Lancet 1998;351:
Pocock SJ. Clinical Trials: A Practical Approach. Chichester: 123–127.
Wiley; 1983. Leeflang MMG, Deeks JJ, Gatsonis C, et al. Systematic
Sackett DL, Gent M. Controversy in counting and attrib- reviews of diagnostic test accuracy. Ann Intern Med
uting events in clinical trials. N Engl J Med 1979;301: 2008;149:889–897.
1410–1412. Norris SL, Atkins D. Challenges in using nonrandomized
The James Lind Library. http://www.jameslindlibrary.org studies in systematic reviews of treatment interventions.
Tunis SR, Stryer DB, Clancy CM. Practical clinical tri- Ann Intern Med 2005;142:1112–1119.
als: increasing the value of clinical research for decision Riley RD, Lambert PC, Abo-Zaid G. Meta-analysis of indi-
making in clinical and health policy. JAMA 2004;291: vidual participant data: rationale, conduct, and report-
1624–1632. ing. BMJ 2010;340:c221.
Yusuf S, Collins R, Peto R. Why do we need some large,
simple randomized trials? Stat Med 1984;3:409–420. 14. KNOWLEDGE MANAGEMENT
Cook DA, Dupras DM. A practical guide to develop-
10. PREVENTION ing effective Web-based learning. J Gen Intern Med
Goodman SN. Probability at the bedside: the knowing of 2004;19:698–707.
chances or the chances of knowing? Ann Intern Med Shiffman RN, Shekelle P, Overhage JM, et al. Standard-
1999;130:604–606. ized reporting of clinical practice guidelines: a proposal
Harris R, Sawaya GF, Moyer VA, et al. Reconsidering the from the conference on guideline standardization. Ann
criteria for evaluation proposed screening programs: Intern Med 2003;139;493–498.
Index
Note: Page numbers in italics denote figures; those followed by a t denote tables.
251
252 Index
Randomized controlled trials, 134, 135 studies of, 61–67 Spectrum of patients, 116
alternatives to, 147–148 taking other variables into Stage migration, 96
assessment of outcomes, 142–143, account, 71 Standardization, 75
143t ways to express and compare, 67–71 Statistical power, 181, 184–185
blinding, 141–142, 141 Risk assessment tool, 54 Statistical precision, 183
cluster, 145 Risk difference, 68 Statistical tests, 176, 178–179, 178t
comparison groups, 138, 138 Risk factors, 51 Statistically significant, 176, 177
crossover, 146 casuality and, 53–54 Stratification, 74
differences arising after, 139–141 to choose treatment, 58 Stratified randomization, 139
ethics, 135 common exposure to, 52 Structured abstract, 234, 234t
intervention, 136, 138 in establishing pretest probability for Subgroup analysis, 187–189, 188, 189t
limitations of, 147 diagnostic testing, 58 Subgroups, 146
non-inferiority, 145 to predict risk, 54 Superiority trials, 145
versus observational studies, 148 to prevent disease, 59 Surveillance, 155
sampling, 135–136, 137 Risk prediction tool, 54 Surveys, 21
variations on, 145–146 calibration, 56 Survival analysis, 97
Randomized trials, 156 clinical uses of, 58–59 Survival curves, 98–100, 99
Range, 34 discrimination, 56 interpreting, 100
Rare events, detecting, 185, 185 risk stratification, 57 Survival of a cohort, 97–98, 99
Reading journals, 233–234, 233, 234t sensitivity and specificity of, 56–57, Survival rate, 20
Recall bias, 86 57 Systematic review, 210–216, 210t, 212t
Receiver operator characteristic (ROC) Risk ratio, 68
curve, 57, 114–115, 115 Risk stratification, 54
Reference standard, 109 for screening programs, 58 T
Referral process, in increasing the Run-in period, 141 Tertiary prevention, 154
pretest probability of disease, treatments in, 158
120–122 Test set, 102
Regression to the mean, 45–46 S Time, distribution of disease by, 26–27,
Relative risk, 68, 68t Safety, 157 26
Reliability, 34, 35 Sample, 6, 6 Time-series studies, 202, 203
Reproducibility, 34 Sample size, 181 Time-to-event analysis, 99
Responsiveness, 34–35 Sampling bias, 12, 103 Training set, 102
Restriction, 73 Sampling fraction, 25, 36 Treatment
Retrospective/historical cohort Scales, 33 allocating, 139, 139t
studies, 63 Scientific misconduct, 226 effectiveness trials in, 143–144, 143
Reverse causation, 147 Screening, 153 efficacy trials in, 143–144, 143
Reversible associations, 201, 201 acceptable to patients and clinicians, equivalence trial, 145
Reviews 166 explanatory trials, 144–145, 144
narrative, 209 changes in, 169 ideas and evidence, 132–134
systematic, 210 low positive predictive value, 164 intention-to-treat, 144–145, 144
traditional, 209 safety, 165–166 non-inferiority trials, 145
Risk sensitivity, 163 observational studies of interventions,
absolute, 67–68, 67t, 68t calculating, 163–164, 164t 147–148
attributable, 68, 68t simplicity and low cost, 164–165 phases of clinical trials, 148–149
confounding, 71–72 specificity, 163 randomized controlled trials, 134, 135
control of, 72–76 Secondary prevention, 154 alternatives to, 147–148
controlling for extraneous variables, treatments in, 158 assessment of outcomes, 142–143,
88–89 Selection bias, 7, 14, 16, 237 143t
defined, 50 Sensitive tests, use of, 113 blinding, 141–142, 141
difference, 67t, 68 Sensitivity, 111, 112 comparison groups, 138, 138
effect modification, 76–77 defined, 113 differences arising after, 139–141
of false-positive result, 166–167 establishing, 115–117 ethics, 135
interpreting attributable and relative, of risk prediction tool, 56–57 intervention, 136, 138
68–69, 69t trade-offs between, 113, 113t limitations of, 147
of negative labeling effect, 167 Sensitivity analysis, 104–105 versus observational studies, 148
observational studies and cause, 76 Serial likelihood ratios, 128–129 sampling, 135–136, 137
odds ratio, 87–88, 87 Serial testing, 128, 128 variations on, 145–146
overdiagnosis (pseudo disease) in Shared decision making, 12 studies of, 134
cancer screening, 167–169 Single-blind, 142 superiority trials, 145
population, 69–71 Single causes, 195, 195t tailoring the results of trials to
population attributable, 69 Skewed distribution, 39 individual patients, 146–147
predicting, 54–56 Specificity, 111, 112, 202 Treatment effect studies of, 134
ratio, 67t, 68 defined, 113 Trials of N = 1, 146–147
recognizing, 51–54 establishing, 115–117 True negative, 109
relative, 67t, 68, 68t of risk prediction tool, 56–57 True positive, 109
simple descriptions of, 71 trade-offs between, 113, 113t Two-tailed, 179
Index 255