Professional Documents
Culture Documents
Locating and Appraising Systematic Reviews
Locating and Appraising Systematic Reviews
Series Editors:
Cynthia Mulrow, MDf MSc
Deborah Cook# MD# MSc
In this article, we describe the strengths and weaknesses of which agent has the lowest rate of serious gastroin-
several methods of locating systematic reviews, including testinal complications, such as hemorrhage. You
electronic databases such as MEDLINE, Best Evidence (the suspect that many original studies have been pub-
electronic version of ACP Journal Club and Evidence-Basedlished that discuss the risks of different NSAIDs,
Medicine), and the Cochrane Library (a regularly updated
source of reviews and controlled trials produced by the
but you would like to have a succinct and accurate
Cochrane Collaboration). We also present steps that can be summary of the study results rather than having to
used to critically appraise review articles; as an example, do all of the searching, selecting, and synthesizing
we use a systematic review that evaluates the gastrointes- yourself. Because this question is important to your
tinal toxicity of various nonsteroidal anti-inflammatory patient and common in your practice, you proceed
drugs in the context of a clinical scenario. to look for a systematic review.
How can we determine whether the results of * Adapted from Henry and colleagues (3) with permission of BMJ.
trials included in a meta-analysis are similar? The
size of the treatment effect (and its CI) from each
trial can be graphed. If the magnitude or direction and colleagues are satisfactory. Because a future
of the effect sizes differs greatly among studies and article in the systematic review series will focus on
if the CIs do not substantially overlap, one could measures of effect, we only briefly address this issue
question whether it is appropriate to pool the re- here.
sults. Henry and colleagues identified 12 studies that
Another common approach is to use a statistical were relevant and met their inclusion criteria. They
test to ascertain whether the study results differ then abstracted the data in duplicate, calculated the
more than would be expected by chance. If the relative risks associated with each NSAID, and
studies measure approximately the same effect and pooled the relative risk estimates. They found that
any differences occur because of chance (that is, if each NSAID was associated with a higher risk for
the results are consistent with a common effect gastrointestinal complications than was ibuprofen
size), the test for homogeneity (sometimes, unfortu- and ranked the drugs in order of increasing size of
nately, called the test of heterogeneity) is not sig- risk (ranging from 1.6 for fenoprofen to 9.2 for
nificant (usually reported as P > 0.05). A significant azapropazone). The authors also calculated CIs
test result means that the difference in results around the pooled estimates. All NSAIDs except
among the individual studies is not likely to have fenoprofen were associated with an increased risk
been caused by chance. This calls into question for serious gastrointestinal hemorrhage compared
whether it is appropriate to pool the results; it may with ibuprofen.
also suggest that a priori subgroup analyses may be 8. Will the results help in caring for patients?
appropriate. However, when the results of large tri- Determining this involves asking several questions:
als are pooled, the test for homogeneity may indi- Can I apply the results to my patients? Did the studies
cate that statistically significant (but perhaps clini- consider all the clinically important outcomes? Are
cally unimportant) differences exist in the results. In the benefits worth any associated risks or costs?
this situation, it may still be reasonable to pool the It is important to consider the patients in the
results statistically. individual studies and to ascertain whether your
Henry and colleagues established that the results patient is similar with regard to age, comorbid con-
of their included studies were consistent. They cal- ditions, or other risk factors (such as smoking and
culated the risk for gastrointestinal complications family history). Does he or she have a comparable
associated with each NSAID relative to the risk baseline risk for the outcome of interest, or is the
associated with ibuprofen and then tested whether risk higher or lower in a clinically meaningful way?
the relative risk for each drug was consistent across A systematic review that finds that a new treatment
the studies. The Table, originally published in the delays death but that does not address any of the
systematic review by Henry and colleagues, shows potential adverse events associated with use of the
the relative risk, CIs, and P values for each of these treatment may prompt us to seek additional infor-
tests for consistency (homogeneity). Each P value is mation from other sources or to refer back to some
greater than 0.05. of the more detailed original articles. We would
7. What are the overall results and how precise want to discuss these issues with our patient (or we
are they? We have considered the key methodologic may choose not to offer the intervention in the first
questions to be asked when appraising a review place).
article and believe that the methods used by Henry We decide that the review by Henry and col-
536 1 April 1997 • Annals of Internal Medicine • Volume 126 • Number 7
Implausibly precise statistics . . . are often bogus. Consider a number that is well
known to generations of parents and doctors: the normal human body temperature of
98.6° Fahrenheit. Recent investigations involving millions of measurements have
revealed that this number is wrong; normal human body temperature is actually 98.2°
Fahrenheit. The fault, however, lies not with Dr. Wunderlich's original measure-
ments—they were averaged and sensibly rounded to the nearest degree: 37° Celsius.
When this temperature was converted to Fahrenheit, however, the rounding was
forgotten, and the 98.6 was taken to be accurate to the nearest tenth of a degree.
Had the original interval between 36.5° Celsius and 37.5° Celsius been translated, the
equivalent Fahrenheit temperatures would have ranged from 97.7° to 99.5°. Appar-
ently, dyscalculia can even cause fevers.
Submitted by:
Donald Venes, MD
Portland, OR 97207
Submissions from readers are welcomed. If the quotation is published, the sender's name will be acknowl-
edged. Please include a complete citation, as done for any reference.—The Editor