Jospt.2008.2725 A Primer On Selected Aspects of EBP Relating To Questions of Treatment II

[ CLINICAL COMMENTARY ]
J. TIMOTHY NOTEBOOM, PT, PhD, SCS¹IJ;F>;D9$7BB?IED"PT, PhD²

JOSHUA A. CLELAND, PT, PhD, OCS, FAAOMPT³@KB?;C$M>?JC7D"PT, DSc, OCS, FAAOMPT4
A Primer on Selected Aspects of

Evidence-Based Practice Relating
to Questions of Treatment, Part 2:
Interpreting Results, Application to
Clinical Practice, and Self-Evaluation
Downloaded from www.jospt.org at on July 25, 2023. For personal use only. No other uses without permission.
ritical appraisal using evidence-based practice (EBP) a successful outcome regardless of what
C methods permits clinicians to make independent professional treatment is applied. If the clinician de-
Copyright © 2008 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.
termines that patients from the study suf-

judgments about the validity, strength, and relevance of
ficiently resemble the patient of interest,
evidence. Independent judgments are necessary because then the clinician can proceed to a critical
the interpretations and conclusions of authors in published studies appraisal of the study design and results.
should not be accepted without close scrutiny by the reader. The EBP The EBP approach identifies a finite set of
approach facilitates extraction of critical information from studies on key validity issues to consider and facili-
treatment, including patient demographics and reported treatment tates decisions about clinical meaningful-
ness of reported treatment effects. Critical
effects. Because clinicians are attempting baseline, and level of acuity, are just some appraisal enables a clinician to answer 3
Journal of Orthopaedic & Sports Physical Therapy®
to apply results from current best evi- of the characteristics that are typically questions37 after the foreground question
dence to clinical practice, a key question reported in the methods or results sec- is posed and the best evidence is found:
to be answered is, “Are the patients in this tion of the study. The clinician may even (1) Are the results valid?, (2) What are the
study similar to the patient I am manag- wish to determine if previously published results?, and (3) How can I apply the re-
ing?” Therefore, patient demographic prognostic studies have specific patient sults to patient care? The first of these 3
data, such as age, diagnostic classifica- demographic data that may predict questions was addressed in part 1 of this
tion, level of impairment/dysfunction at which patients are more likely to achieve series. The remaining 2 questions will be
addressed in this commentary.
T SYNOPSIS: The process of evidence-based ing outcomes of care. This second commentary in a
practice (EBP) guides clinicians in the integration 2-part series will review principles relating to steps 3 STEP 3B. CRITICALLY AP-
of individual clinical expertise, patient values and through 5 of this 5-step model. The purpose of this PRAISING THE LITERATURE:
expectations, and the best available evidence. commentary is to provide a perspective to assist clini- WHAT ARE THE RESULTS?
Becoming proficient with this process takes time cians in interpreting results, applying the evidence
and consistent practice, but should ultimately lead to to patient care, and evaluating proficiency with EBP
R
eaders should understand sta-
improved patient outcomes. The EBP process entails skills in studies of interventions for orthopaedic and
sports physical therapy. J Orthop Sports Phys Ther
tistical analyses and the presentation
5 steps: (1) formulating an appropriate question, (2)
2008;38(8):485-501. doi:10.2519/jospt.2008.2725 of quantitative results when critically
performing an efficient literature search, (3) critically
appraising an article.26 While an extensive
appraising the best available evidence, (4) applying T KEY WORDS: critical appraisal, physical
the best evidence to clinical practice, and (5) assess- therapy, treatment effectiveness review of data analysis techniques is be-
yond the scope of this commentary, we will
1
Associate Professor, Department of Physical Therapy, Regis University, Denver, CO. 2 Professor, Rocky Mountain University of Health Professions, Provo, UT; Associate Professor,
Baylor University, Waco, TX. 3 Assistant Professor, Department of Physical Therapy, Franklin Pierce College, Concord, NH; Research Coordinator, Rehabilitation Services, Concord
Hospital, Concord, NH. 4 Assistant Professor, Department of Physical Therapy, Regis University, Denver, CO; Faculty, Regis University Manual Therapy Fellowship, Regis University,
Denver, CO. Address correspondence to Dr Tim Noteboom, Regis University, 3333 Regis Blvd, G-4, Denver, CO 80212. E-mail: noteboom@regis.edu
journal of orthopaedic & sports physical therapy | volume 38 | number 8 | august 2008 | 485
describe a number of statistical concepts patients with hip osteoarthritis, MacDon- termine if between-group differences are
and procedures commonly used in physi- ald et al57 reported medians rather than statistically significant. These tests are
cal therapy literature. Bandy6 conducted means for all baseline attribute variables examples of parametric tests, which are
a 2-year review of the literature published and for all outcome variables. If data are more robust tests in identifying signifi-
in the journal Physical Therapy and iden- from nominal or ordinal scales, the mode cant differences in group means. How-
tified 10 statistical procedures that were or median scores, respectively, are report- ever, there are assumptions that need to
used in 80% of the articles reviewed. ed to describe central tendency. be met to apply parametric tests, which
These were descriptive statistics, 1-way A comparison of means is frequently typically include normal distribution of
analysis of variance (ANOVA), t tests, fac- used to make judgments about differences data, equal variances across group data,
torial ANOVA, intraclass correlation, post between different groups or across various and independence of data.69 Alternately,
hoc analyses, Pearson correlation, regres- time points in a study. However, means when assumptions underlying parametric
sion, chi-square, and nonparametric tests are incomplete descriptors of data because statistical tests are not met, nonparamet-
analogous to t tests.6 In this commentary they give information only about central ric analogs of these tests should be used,
we will review some basic statistical con- tendency. A more complete description of although nonparametric tests are gener-
cepts that we feel are important for readers the data includes an indication of the vari- ally less powerful. For example, Hale et
performing critical appraisals. We will also ability in the distribution of scores (disper- al41 decided in their study on postural
discuss statistical methods used to identify sion of the individual data points). The control in patients with chronic ankle in-
between-group differences in clinical trials more variable the data, the more dispersed stability to use Kruskal-Wallis tests and
that use both continuous scale outcomes the scores will be. Among several available Mann-Whitney U tests instead of ANO-
and dichotomous scale outcomes, with il- measures of variability, the SD is the statis- VAs and t tests, because their outcomes
lustrations from orthopaedic and sports tic most frequently reported, together with data were not normally distributed.
physical therapy literature. the mean so that data are characterized The traditional approach for making
according to both central tendency and the decision about statistical significance
Reporting of Results in variability.69 Results are commonly report- is hypothesis testing. Taking a compari-
Treatment Studies ed as the mean SD. For example, Hall son of means as one example, hypoth-
Results published in studies of physical et al42 compared headache index results at esis testing attempts to determine with
therapy interventions typically include a 4 weeks for their treatment group (31 statistical methods whether differences
summary of the findings from a wide vari- 9) and their placebo group (51 15), re- between or among means are due to
ety of tests and measures that quantify the vealing a between-group mean difference chance or are reflective of a true popula-
outcome variables selected by the authors of 20 points, with somewhat greater vari- tion difference in the target population.
to determine the effects of the intervention ance in the placebo group. If the median A central concept in hypothesis testing
being studied. In some instances, such is used as the measure of central tendency, is the null hypothesis, one form of which
as with case reports or case series, raw the range or interquartile range should be states that there is no mean difference in
data from each subject in the study may used to describe variability of the data, as the target population, thereby implying
be presented. However, this approach is the median may not always be the central that any observed differences in sample
not realistic or warranted in studies with value within the given range, especially means are due to chance. Therefore, if
larger samples. More commonly, data when the data are nonparametric. we reject the null hypothesis based on
are analyzed and reported as aggregated results of a statistical test, then we con-
group results. Numerical indices are then Statistical Analyses Using Hypothesis sider it unlikely that an observed differ-
used to describe attributes of the aggre- Testing and P Values ence is due to chance, and the difference
gated data. The mean or average is a mea- Although descriptive statistics such as is said to be statistically significant. How-
sure that describes central tendency in a the mean and SD of a sample may be ever, statistical tests provide estimates of
distribution of scores, and is most useful useful in comparing 2 different treat- probability along a continuum, which is
for variables that are on an interval or ment groups or different time points for why researchers either express a specific
ratio scale.69 If data exhibit outliers such 1 group, such as pretreatment to post- threshold value or accept the default val-
that the value of the mean would be dis- treatment scores, clinicians also want to ue (.05 or 5%) for statistical significance.
torted, the median is often reported as the know whether observed sample differ- This threshold probability is the alpha
measure of central tendency. The median ences represent true differences in the level, or B, which indicates the maximum
might also be preferred over the mean target population of patients. Therefore level of risk tolerance for falsely rejecting
when sample sizes are so small that they it is necessary to apply inferential statis- the null hypothesis (a type I error).69 The
may not represent the target population. tical tests, such as the t test, ANOVA, or alpha level, sometimes expressed as the
For example, in a case series including 7 analysis of covariance (ANCOVA), to de- “level of significance,” is established by
486 | august 2008 | volume 38 | number 8 | journal of orthopaedic & sports physical therapy
the researcher prior to data collection. of almost every critical appraisal of evi- CIs reflect imprecision in the data and un-
When the alpha level is .05, P values dence. Montori62 and others3,76 have ar- certainty associated with the magnitude
less than .05 permit rejection of the null gued that because P values are not helpful of the treatment effect.33,44 In contrast, the
hypothesis, leading us to infer that true in providing clinicians with information narrower the width of the CI around the
mean differences exist in the target pop- about the magnitude of the treatment ef- point estimate of the treatment effect, the
ulation. When P values are greater than fect, other statistics should be used. In more confident one can be that the true
.05, we conclude that the risk for com- contrast to P values, CIs provide informa- effect and its point estimate are similar,
mitting a type I error exceeds our pre- tion on the magnitude of the treatment allowing the clinician to make more con-
determined threshold (the alpha level). effect in a form that pertains directly to fident decisions from the data.
Therefore, when P is greater than .05, we the process of deciding whether to ad- Although journals are increasingly re-
do not consider observed differences to be minister a therapy to patients. Whereas quiring authors to report CIs, readers will
statistically significant and conclude that a sample statistic is only a point estimate often find published evidence with no CIs
such differences between groups may be of the true population value, the CI is a around the point estimates of treatment ef-
due to chance. However, the set point for range of values within which the popula- fects. Even when authors do report CIs they
alpha is somewhat arbitrary and the P tion value is likely to be found at a given commonly fail to interpret them.29 Readers
values can be influenced by sample size. level of confidence.35 Sim and Reid76 have performing critical appraisals of evidence
Therefore, while researchers may set a reported that because CIs focus attention can often compute CIs themselves given
specific alpha to accept or reject the null on the magnitude and the probability of published details. A helpful and easy-to-use
hypothesis, the savvy reader should still a treatment effect, they thereby assist in spreadsheet for computation of CIs (PEDro
examine the results, confidence inter- determining the clinical usefulness and Confidence Interval Calculator) is freely
vals (CIs), and sample size to determine importance (as well as the statistical sig- downloadable from the PEDro website.1 As
whether or not a P value greater than .05 nificance) of the findings.76 Most often an illustration, we can extract means, sam-
may be potentially meaningful. the 95% CI is used. This is commonly in- ple sizes, and SDs from a recent randomized
For example, in a recent clinical trial67 terpreted to represent the range of values controlled trial (RCT)21 wherein authors
comparing 2 types of exercises for increas- within which we can be 95% certain that found significantly better improvements (P
ing strength in women with chronic neck the true population value actually lies.3 = .009) in an experimental treatment group
pain, both groups increased strength from For example, Gerber et al34 reported compared to a control group. Pretreatment
pretreatment to posttreatment (P.01). the mean visual analog scale (VAS) score to posttreatment improvements in shoul-
In other words, within-group improve- for knee pain after 15 weeks of postopera- der internal rotation were 20° 12.9° in
ments were significant. However, com- tive exercise training for the experimental the experimental group (n = 15) compared
parisons of improvements between the treatment group: 0.77 cm (95% CI: 0.19 to to 5.9° 9.4° in the control group (n = 24).
2 groups (“between-group differences”) 1.35 cm). At a 95% level of confidence we Although the authors did not report a 95%
were not significantly different (P = .97). conclude that the true posttreatment popu- CI around the between-group difference,
Based on the 2 P values for within-group lation mean pain value for patients receiv- we can easily compute it using the PEDro
and between-group differences, we can re- ing this type of exercise training is no less Confidence Interval Calculator.1 FIGURE 1
ject the null hypothesis for within-group than the lower limit of the CI (0.19 cm) and shows results for this computation. From
improvements in the target population no greater than the upper limit of the CI these results we see that the point estimate
and conclude that each treatment group (1.35 cm). Readers should note that not all for the difference between mean group im-
achieved statistically significant gains in values within the CI are considered equally provements was 14.1° in favor of the treat-
strength from pretreatment to posttreat- likely to be the true population value. The ment group. The 95% CI does not include
ment assessment points. In contrast, we point estimate from the sample (0.77 cm) a zero difference, which is compatible with
must conclude that the observed between- is considered the single best estimate of the the statistically significant result (P = .009).
group difference in improvement was due population parameter, with values becom- Furthermore, we estimate the true popula-
to chance, attributable only to sampling ing increasingly less likely when approach- tion difference for mean improvement to be
error, and does not reflect a true differ- ing either limit of the CI.33 The convention no less than 6.9° and no more than 21.3°
ence in effectiveness of the exercise pro- of using a 95% CI is arbitrary, similar to favoring the treatment.
grams in the target population. setting the alpha level to .05.
The level of precision or imprecision Results for Continuous Scale Outcomes:
Confidence Intervals expressed by CI width is affected by the Differences Between Means
Confidence interval analysis is an essen- sample size and the variance in the distri- If randomization in a RCT was effective in
tial skill for the evidence-based practitio- bution of scores. Small sample sizes and creating reasonably equivalent groups at
ner and will comprise an important part greater variance result in wider CIs.73 Wide baseline, the pretreatment group means
after 3 weeks of exercise, and found this

difference to be statistically significant
(P.05). The raw between-group effect
"(&(!"#(#"(&# &#)$&
size is, therefore, 0.09 m/s (2.38 – 2.29
"(&('(!($#$) (#"'("&*(#"#&(#"(&# &#)$&
"(&('!$ '-")!&#')('#&(#"(&# &#)$&
m/s). Knowing this value, the clinician
can proceed to determine the clinical rel-
"(&(!"#(,$&!"( &#)$&
evance of the treatment effect.
"(&('(!($#$) (#"'("&*(#"#&(,$&!"( &#)$&

"(&('!$ '-")!&#')('#&(,$&!"( &#)$& In contrast to using raw posttreatment
scores to calculate the between-group
"(&(&%)&#"/""(&* & effect size, authors will sometimes use
change scores to represent average im-
provements over time by computing the
'(!(&"(+"((+#$#$) (#"!"'' . difference between baseline, or pretreat-
'(!(' .
(#.
ment, means and posttreatment means.
Between-group differences in average
change scores are then computed to rep-
FIGURE 1. Results from the PEDro Confidence Interval Calculator* for computation of a 95% confidence interval
resent the magnitude of the between-
(CI) around a difference between 2 group means. Note that the sign of the difference and signs on upper and lower
limits of the CI are arbitrary; differences and confidence limits must be interpreted in light of the nature of the group treatment effect. This approach
scales and the relative outcomes between groups. From Physiotherapy Evidence Database (PEDro). Available at: was used by Johnson et al49 when they re-
http://www.pedro.fhs.usyd.edu.au/index.html. Accessed July 11, 2008. ported that the improvement from base-
line to posttreatment in shoulder external

for outcomes on continuous scales will effective than the comparison (no treat- rotation in the experimental treatment
be close to equal. Therefore, when group ment, placebo, or a competing treatment), group (31.3° 7.4°) was significantly bet-
means are not meaningfully different at the posttreatment experimental group ter (P.001) than that in the comparison
baseline, the magnitude of the between- mean will show greater improvement treatment group (3.0° 10.8°).
group treatment effect, when statistically than the comparison group mean(s). For Raw effect sizes are commonly trans-
significant, can be most easily conceptu- a scale on which a higher score is a better formed into unitless effect size indices,
alized as the posttreatment difference outcome (eg, muscle strength), the exper- such as d for the t test and f for ANOVA,
between group means for these outcome imental group posttreatment mean will which are examples of standardized ef-
scales. However, clinicians should criti- be greater than the comparison group fect size indices.22 The most common
cally assess the within-group variability mean if the treatment is effective. For a approach in rehabilitation research is to
because variance that is much different scale on which a lower score is a better divide the raw effect size by the combined
between groups could be somewhat mis- outcome (eg, VAS for pain), the experi- (pooled) SDs. This method has the ben-
leading. In cases where groups are not mental group posttreatment mean will be efit of accounting for both the magnitude
equivalent at baseline for important less than the comparison group mean if of the treatment effect and the variability
prognostic factors, ANCOVA methods the treatment is effective. The magnitude of the group means. For example, using
can statistically adjust the posttreatment of this posttreatment between-group dif- values from the between-group compari-
means to account for baseline differenc- ference is a measure of the treatment ef- son in the Butcher et al study13 reported
es.69 For example, Rydeard et al,72 found fect and is sometimes called the raw effect above, the raw effect size was .09 m/s,
in a recent RCT that mean scores for the size.22 Computation of the raw between- whereas the effect size index (d) was
functional disability outcome were signif- group effect size is the simple subtraction 0.24 (0.09 m/s divided by the pooled SD
icantly different between groups at base- of one group mean from another and is of 0.37 m/s). Effect size indices provide a
line in spite of randomization. Therefore, expressed in the relevant units of the out- general indication for relative magnitudes
they used baseline functional disabil- come scale. Therefore, this point estimate of treatment effects. For example, Co-
ity outcome scores as a covariate in the of the raw effect size is conceptually intui- hen22 characterized effect size indices for
statistical analyses, then found that the tive and is crucial for deciding whether a comparison of 2 means as follows: 0.2,
between-group difference in posttreat- the magnitude of a statistically significant small; 0.5, medium; 0.8, large. Although
ment means for functional disability, as treatment effect is clinically meaningful. unitless effect size indices are helpful for
adjusted by the ANCOVA method, was For example, Butcher et al13 reported comparing the magnitude of effect sizes
statistically significant between groups. vertical jump takeoff velocity in a control among studies using different outcomes
If the treatment under consideration group (2.29 0.35 m/s) and in a trunk measures, these transformed indices of
(the experimental treatment) is more stability training group (2.38 0.39 m/s) treatment effect are not as intuitive or as
helpful as raw effect sizes for making the the minimal clinically important differ- cance is achieved. For example, Hyland
crucial comparisons that allow clinicians ence (MCID) as “the smallest difference et al45 found in a RCT that the posttreat-
to judge whether treatment effects exceed in score in the domain of interest which ment pain VAS outcome in a calcaneal
thresholds for clinical meaningfulness, as patients perceive as beneficial.” There taping group (2.7 1.8) was significantly
discussed below. However, if variance is is a growing body of literature outlin- better (P.001) than that of the control
much different between or among groups, ing methods for determining MCID group (6.2 1.0). Inasmuch as the point
raw effect sizes may be misleading. In ad- values,8,46 reporting MCIDs for specific estimate for the treatment effect was a
dition, effect size indices can be useful for scales, and using MCIDs to make judg- posttreatment between-group difference
comparing treatment effects across more ments about clinical meaningfulness of of 3.5 cm favoring calcaneal taping, we
than one experiment. For these reasons, treatment effects in clinical trials. Al- can compare this value to a MCID for the
readers may wish to consider both raw though published MCID values must be pain VAS. If we accept a suggestion of 3.0
effect sizes and the standardized effect considered in the context of the varying cm as a reasonable value for the MCID
size indices when critically appraising methods and intended purposes for their for the pain VAS,55 we consider the treat-
evidence for therapy. derivations or estimations,8 clinicians ment effect in the study sample to be a
unfamiliar with specific scales will often clinically meaningful benefit because the
The Minimal Detectable Change and find it helpful to be aware of published point estimate of the effect (3.5 cm) is
Minimal Clinically Important Difference MCID values when critically appraising greater than the MCID (3.0 cm).
Properties evidence. No single published value for If a reader is not sufficiently familiar
Decisions about clinical meaningful- a MCID can be applied uncritically in with an outcome scale to make an intui-
ness of results involve judgments about all circumstances or for all purposes.59 tive judgment about clinical meaningful-
thresholds distinguishing trivial effects Rather, a published MCID can provide ness of a treatment effect size, and if no
from clinically important effects. Al- an initial reference point when applying published MCID can be found for that
though any such judgment can be subject personal clinical expertise to make inde- outcome scale, it is often helpful to con-
to debate and will depend on multiple pendent judgments about what distin- vert the effect size to a percent difference.
contextual considerations and local cir- guishes trivial from clinically important Following the example from Hyland et
cumstances, these judgments are essen- treatment effects in a local context. An al45 above, the percent difference between
tial in any critical appraisal of evidence. illustrative patient scenario integrating groups in posttreatment pain VAS (10-cm
Because clinicians are frequently inter- patient values with published MCIDs scale) means was calculated as follows:
ested in identifying the amount of change to make patient-relevant judgments in (6.2 – 2.7) ÷ 6.2 = 57%. Therefore, the
over time, measurement properties such a critical appraisal is given below in the mean pain VAS score for the treatment
as minimal detectable change (MDC) are section titled “Step 4. Incorporating Evi- group was 57% lower (better) than that of
important to consider. Similar to other dence Into Clinical Practice.” the control group. Most clinicians would
measures of reliability, such as standard Published MCID values for selected judge a 57% average reduction in pain to
error of measurement (SEM), the MDC outcome scales commonly used in ortho- be clinically meaningful, even without
is the smallest real difference, which rep- paedic and sports physical therapy are being familiar with a particular pain out-
resents the smallest change in score that displayed in TABLE 1. come scale.
likely reflects true change rather than Although the definition of the MCID
measurement change alone.74,77 For ex- above suggests application to an individ- Interpretations of Apparently Positive
ample, Stratford and colleagues78 have ual patient, MCID values are commonly Trials: MCID, Effect Size, and CI Limits
reported that the Roland-Morris Ques- employed to make judgments about A clinical trial is termed “positive” when
tionnaire, a commonly used outcome the clinical meaningfulness of averaged the null hypothesis is rejected by formal
measure for patients with low back pain, group treatment effects, both for within- hypothesis testing. In a positive trial, au-
has an MDC of 4 points. Therefore, to be group effects25,53 and for between-group thors conclude that results are statistical-
confident that 2 scores taken across time effects.20,24,63 Indeed, Jaeschke et al46 ex- ly significant and that the experimental
represent a true change the scores would plicitly anticipated use of the MCID to treatment is more effective than the com-
need to be more than 4 points from each make judgments both for individual and parison. Guyatt et al38 use the phrase “ap-
other. However, MDC only provides an group differences. If the observed raw ef- parently positive trial” to communicate
indication of the minimum change that fect size is equal to or greater than the the idea that critical appraisal requires
is detectable by the instrument, and not MCID, the treatment effect is consid- an evidence-based practitioner to look
necessarily the amount of change that ered clinically meaningful. Otherwise, beyond statistical significance. Addition-
could be considered clinically meaning- the treatment effect is deemed trivial al judgments must be made about clinical
ful to the patient. Jaeschke et al46 defined regardless of whether statistical signifi- meaningfulness of the treatment effect
Published Values for Minimal Clinically Important
TABLE 1
Differences (MCIDs) on Select Outcome Scales
Outcome Scale Suggested MCID* Clinical Context Published Study

6-minute walk test 54 m Patients with chronic obstructive pulmonary disease Wise and Brown, 200586
10-cm pain visual analog scale 3.0 cm Emergency room patients with acute pain Lee et al, 200355
11-point numeric pain rating scale 2 Patients with chronic pain Farrar et al, 200128
American Shoulder and Elbow Surgeons 6.4 Patients with musculoskeletal shoulder pathologies Michener et al, 200260
Standardized Shoulder Form, patient self-
report section
Functional rating index 9 Patients with low back pain Childs et al, 200515
Gait speed 0.10 m/s Patients recovering from hip fracture Palombaro et al, 200668
General function score 12 Patients with chronic low back pain Hagg et al, 200340
Lower Extremity Functional Scale 9 Patients with lower extremity musculoskeletal dysfunction Binkley et al, 199911
Modified Low Back Pain Disability Questionnaire 6 Patients with low back pain Fritz and Irrgang, 200131
Neck Disability Index 7.0 Patients with cervical radiculopathy Cleland et al, 200619
Neck Disability Index 5.0 Physical therapy outpatients with musculoskeletal neck pain Stratford et al, 199980
Oswestry Disability Index 10 Patients with chronic low back pain Hagg et al, 200340
Patient-Specific Functional Scale 2.0 Patients with cervical radiculopathy Cleland et al, 200619
Quebec Back Pain Disability Scale 15 Patients with low back pain Fritz and Irrgang, 200131
Roland-Morris Back Pain Questionnaire 2 (baseline, 0-8); Patients with low back pain (duration, 6 wk) Stratford et al, 199879
4 (baseline, 5-12);
5 (baseline, 9-16);
8 (baseline, 13-20);
8 (baseline, 17-24)
SF-36 bodily pain subscale 7.8 Patients with hip or knee osteoarthritis Angst et al, 20014
SF-36 physical function subscale 3.3 Patients with hip or knee osteoarthritis Angst et al, 20014
SF-36 physical component summary 2.0 Patients with hip or knee osteoarthritis Angst et al, 20014
Simple shoulder test 10 Patients undergoing physical therapy treatment for shoulder Michener & McClure, 200260
pain of musculoskeletal, neurogenic, or undetermined
origin
Visual analogue scale (VAS) of back pain 18 Patients with chronic low back pain Hagg et al, 200340
Western Ontario and McMaster Universities 20% Patients with hip or knee osteoarthritis Barr et al, 19947
Osteoarthritis Index (WOMAC)
Zung Depression Scale 8 Patients with chronic low back pain Hagg et al, 200340
* Units are scale points unless otherwise indicated.
and the level of precision in the point es- that point estimate for the effect size give is excluded from the 95% CI, then we are
timate of the effect size. These judgments us an indication of just how small or how 95% confident that there is a clinically
are accomplished by comparing the raw large the true treatment effect might be meaningful benefit of treatment in the
effect size, with its accompanying CI, to in the population of interest. Therefore, population—even if the true magnitude
the MCID. Even when we conclude that we consider the 95% CI to determine of that benefit is at the limit of the CI sug-
results are clinically meaningful because whether the MCID is within that interval. gesting the smallest benefit of treatment.
the point estimate for the raw effect size If the MCID is within the 95% CI, then Guyatt et al38 characterize a positive trial
is greater than the MCID, we must rec- we cannot rule out at a 95% level of confi- in which the 95% CI excludes the MCID
ognize that the true size of the treatment dence that the true population treatment as “a definitive trial.”
effect may be more or less than the point effect might be trivial (less than MCID). Following the example above from Hy-
estimate from sample data. The upper On the other hand, if the raw effect size land et al,45 we can consider the raw point
and lower limits of the 95% CI around is greater than the MCID and the MCID estimate of the treatment effect (3.5 cm
on the pain VAS) in the context of its 95% al is definitive for this outcome, because if the true magnitude of the treatment
CI and the MCID (3.0 cm). This study the 95% CI around the point estimate for effect is at the limit of the CI suggesting
had a small sample of subjects in the 2 the treatment effect excludes the MCID. the largest between-group difference.
groups considered here: 10 patients in Guyatt et al38 characterize a negative trial
the control group and 11 patients in the Interpretations of Apparently Negative in which the 95% CI excludes the MCID
calcaneal taping group. Entering those Trials: MCID, Effect Size, and CI Limits as “definitely negative.” A reader critically
sample sizes and the posttreatment pain A clinical trial is termed “negative” when appraising a negative trial in which the
VAS means and SDs for the 2 groups into we fail to reject the null hypothesis. In 95% CI around the treatment effect ex-
the PEDro Confidence Interval Calcula- a negative trial, authors conclude that cludes the MCID can be confident that
tor spreadsheet (FIGURE 1), we find that the results are not statistically significant the failure to find a statistically signifi-
95% CI is 2.2 to 4.9. We conclude from and that the experimental treatment is cant difference is not attributable to a
this CI that the true treatment effect size no more effective than the comparison. type II error. In other words, if precision
in the target population is no less than 2.2 Guyatt et al38 use the phrase “appar- in the study is sufficient for the 95% CI
cm on the pain VAS, and no greater than ently negative trial” to communicate the to exclude the MCID, the study has ad-
4.9 cm. Inasmuch as the MCID (3.0 cm) idea that critical appraisal requires an equate statistical power to detect a clini-
is not excluded by the 95% CI, we can- evidence-based practitioner to be wary cally meaningful difference if one exists
not rule out a trivial treatment effect in of results from negative trials unless in the target population.
the target population. This is because the adequate statistical power can be dem- Authors in a recent RCT23 found no
study results are compatible with true onstrated. The danger is that an under- statistically significant difference (P =
treatment effects as small as 2.3, 2.5, or 2.7 powered trial might fail to find statistical .33) for knee flexion range of motion
(etc), which are all smaller than the MCID significance in sample data even when outcomes among 3 groups: a control
and are therefore not clinically meaning- there is a meaningful benefit of treatment group receiving no time on a continuous
ful. This analysis does not change the fact in the target population (a type II error). passive motion (CPM) machine, a treat-
that a statistically significant treatment ef- Authors will frequently attempt to ad- ment group receiving CPM treatments
fect was found favoring the experimental dress this issue by revealing details of the of 35 minutes duration once daily, and
treatment, nor does it change the fact that statistical power analysis used to estimate another treatment group receiving CPM
the best estimate33 of the population treat- the required sample size before the study treatments of 2 hours duration once dai-
ment effect (3.5 cm) is clinically meaning- was conducted. This approach is unsatis- ly. The authors considered 10° to be the
ful. Rather, this illustration demonstrates fying in part because a priori power com- MCID for this outcome. FIGURE 2 provides
the imprecision inherent in studies with putations require estimations of variance a graphical display of 95% CIs for each
small sample sizes and suggests that ad- that may or may not reflect the observed of the 3 between-group comparisons at
ditional evidence with larger samples and variance in sample data. Guyatt et al38 the time of discharge from hospital. The
correspondingly greater precision (less suggest a different method for determin- 2 dotted vertical lines represent MCIDs
variability) is required before we consider ing whether a negative trial has sufficient of 10° favoring either of the 2 groups for
this finding definitive.38 statistical power. Here again we consider each plotted comparison. The solid verti-
Adequate precision to rule out a trivial the 95% CI around the point estimate of cal line represents the null value (0°) for
treatment effect in a positive trial is il- the raw effect size, to determine whether the between-group differences. The 95%
lustrated in a study of radial shock wave the MCID is within that interval. If the CIs around each of the between-group
therapy for calcific tendinitis of the shoul- MCID is within the 95% CI, then we can- effect sizes are represented by horizontal
der.14 Posttreatment pain VAS scores not rule out at a 95% level of confidence lines with vertical anchors at each end,
(mean SD) were significantly better that the true population treatment effect reflecting upper and lower limits of the
(P = .004) in the treatment group (0.90 might be clinically meaningful (greater CIs. Each 95% CI includes the null val-
0.99) than in the control group (5.85 than MCID), even though the authors ue, suggesting no statistically significant
2.23). The between-group difference failed to reject the null hypothesis.50 This differences—a finding consistent with re-
was 4.96 cm (95% CI: 4.23 to 5.67). If we circumstance would reveal inadequate sults from the traditional hypothesis test
accept the MCID value of 3.0 cm for the statistical power in the study, suggesting (P = .33). However, only 1 of the 3 95%
pain VAS,55 we consider this study to be that we should not accept any conclusion CIs excludes the MCID. Therefore, sta-
convincing evidence for a clinically mean- that the treatment is ineffective. On the tistical power in this study was adequate
ingful benefit of treatment, inasmuch as other hand, if the MCID is excluded from to rule out a clinically meaningful treat-
the study results suggest an average treat- the 95% CI, then we are 95% confident ment effect in the target population for 1
ment effect no less than 4.23 cm in the that there is no clinically meaningful ben- between-group comparison (CTL-EXP1);
target population. In other words, the tri- efit of treatment in the population, even but the study power was insufficient to
the arthroscopy with debridement group
100
(49.9 23.3) was not significantly dif-
ferent (P = .85) from the placebo group
90
(50.8 23.2). The difference between
means was 0.9 (95% CI: –7.7 to 9.4).
80
Therefore, the largest treatment effect
favoring arthroscopy in the target popu-
70
lation consistent with results from this
Flexion Amplitude (deg)
study would be 9.4 points on the AIMS

60
pain subscale: a trivial difference. Given
50
that the MCID was excluded from the
95% CI, we conclude that the study had
40
adequate precision and sufficient sta-
tistical power to have found a clinically
30
meaningful difference, if one existed, in
the target population. This interpretation
20
is the same as that expressed by the au-
thors: “If the 95 percent confidence inter-
10
val around the estimated size of the effect
does not include the minimal important
0 difference, one can reject the hypothesis

that the arthroscopic procedures have a
CTL EXP1 EXP2
small but clinically important benefit.”63
Results for Dichotomous Outcomes: Risk

–10° 0° 10° Reduction and Number Needed to Treat
Although authors of clinical trials in
physical therapy most often select con-
–5.8° 9.2°
tinuous outcome variables, there are
CTL–EXP1 many important naturally dichotomous

outcomes that should be included in
–10.3° 4.5°
studies of orthopaedic and sports physi-
CTL–EXP2 cal therapy. Dichotomous outcomes are
those that patients either experience
–12.1° 2.9°
or do not experience. Examples are re-
EXP1–EXP2 current dislocations, failure to achieve
complete recovery, failure to return to
competition, recurrence of low back pain,
receiving injections, and subsequent sur-
gery. Because the statistical methods
for analyzing dichotomous outcomes
FIGURE 2. Results from a negative trial showing 95% confidence intervals in relation to the minimal clinically quantify reduction in risk, dichotomous
important difference. From Denis M, Moffet H, Caron F, Ouellet D, Paquet J, Nolet L. Effectiveness of continuous outcomes are usually operationalized as
passive motion and conventional physical therapy after total knee arthroplasty: a randomized clinical trial. Phys negative outcomes (numbers of patients
Ther. 2006 Feb;86(2):174-85. Modified with permission.
who did have a recurrent dislocation,
patients who were not able to return to
rule out a small but potentially mean- throscopy in patients with knee osteoar- sport, etc). Important continuous scale
ingful difference for 2 of the 3 between- thritis.63 Authors determined the MCID outcomes can be dichotomized using
group comparisons. for the pain subscale of the Arthritis the MCID to report numbers of patients
Adequate precision and statistical Impact Measurement Scales (AIMS) to who achieve or fail to achieve clinically
power are illustrated in a negative trial be 10 points. At the 6-week follow-up meaningful improvements in motion,
comparing arthroscopy to placebo ar- measurement, the average pain score for strength, pain reduction, etc.65 For ex-
Examples of Number Needed to Treat (NNT) Values for
TABLE 2
Various Physical Therapy-Related Interventions*
Clinical Question NNT 95% Confidence Interval

How effective is early cardiac rehabilitation on health-related quality-of-life score in patients experiencing a cardiovascular 6 3 to 21
incident? (Comparison treatment: usual care)66
How effective is vitamin D supplementation in preventing falls in ambulatory or institutionalized older adults? (Comparison 15 8 to 53
treatment: calcium or placebo)12
How effective is a multidisciplinary intensive diabetes education program on improving glycemic control or decreasing 1.8 1.5 to 2.4
diabetes-related distress in patients with diabetes? (Comparison group: standard care)52
How effective is adding 3 stretching sessions to a typical weekly infantry training program on reducing incidence of overuse 8 4.6 to 33.9
injury in military basic trainees? (Comparison treatment: typical infantry training)43
How effective is range-of-motion exercise and joint mobilization on improving wrist extension following Colles fracture? 2.3 2 to 17
(Comparison treatment: home exercise program)83
How effective is combined cervico-thoracic manipulation and exercise therapy in reducing headache frequency in patients 1.9 1 to 3
with persistent headache? (Comparison treatment: self-care instruction)51
How effective is a stabilization program in decreasing pain and disability in patients with low back pain who are categorized 1.6 1.2 to 10.2
as being hypermobile in the lumbar spine? (Comparison treatment: lumbopelvic manipulation)32
How effective is combined manual physical therapy, exercise, and unloaded treadmill walking on perceived recovery for 2.6 1.8 to 7.8
patients with lumbar spinal stenosis? (Comparison treatment: flexion exercises and treadmill walking program)84
How effective is combined manual physical therapy and exercise for avoiding total knee arthroplasty surgery up to 1 year 7 4 to 105
posttreatment? (Comparison treatment: placebo ultrasound)25
* Results are presented without regard for levels of evidence or the extent to which validity threats were protected in the referenced studies. These factors varied
widely among studies cited.
ample, Clegg et al18 dichotomized their 75% (95% CI: 5% to 100%). The NNT is evidence from Deyle et al25 would be con-
primary outcome: the WOMAC pain computed by taking the reciprocal of the sidered “definitive.” On the other hand, if
subscale with raw scores ranging from 0 ARR: 1.0 ÷ 0.15 = 7 (95% CI: 4 to 105). a clinician judges a 30% RRR to be mini-
to 500. Authors dichotomized this scale Reporting results in this way reveals that, mally clinically meaningful, the point esti-
by reporting percents of patients in each although the risk of needing surgery with- mate from Deyle et al25 (75% RRR) would
study group who achieved at least 20% in 1 year was 20% in the placebo group, be considered promising; but the wide CI
improvement after treatment. This cut risk was reduced by 15% in absolute terms around that treatment effect would lead
score is the MCID recommended by de- and by 75% in relative terms by provid- the clinician to seek additional evidence,
velopers of the WOMAC.7 Results for di- ing manual therapy and exercise. The perhaps from a larger trial with greater
chotomous outcomes can be reported as wide 95% CIs around the point estimates precision.
odds ratios61 but are frequently reported reveal considerable imprecision in the The NNT is defined as the number of
as absolute risk reduction (ARR), rela- results. The principles discussed above patients who would need to be treated
tive risk reduction (RRR), and number for appraising “apparently” positive and on average to prevent 1 bad outcome or
needed to treat (NNT).65 negative trials apply equally to assessing achieve 1 desirable outcome in a given
Deyle et al25 reported the number of dichotomous outcomes. However, rather period of time.54 Therefore, when a low
patients who had knee replacement sur- than comparing the MCID to point es- NNT is associated with a treatment, this
gery by the time of a 1-year follow up in timates and associated CIs for mean ef- indicates that relatively few patients need
each of 2 groups with knee osteoarthri- fect sizes, a clinical judgment is required to receive this treatment in order to avoid
tis. In the placebo group, 8 of 41 patients (depending on multiple considerations of 1 bad outcome. Therefore, NNT values
(20%) had surgery compared to only 2 of context) to determine the minimal clini- are used as a measure of treatment effec-
42 patients (5%) receiving manual thera- cally important amount of risk reduction tiveness and are helpful in cost-benefit
py and exercise. The ARR is the difference for comparison with the point estimates calculations. However, it should be noted
between these 2 proportions: 20% – 5% and associated CIs for ARR, RRR, and that NNT values alone are not sufficient
= 15% (95% CI: 1% to 28%). The RRR is NNT. For example, if a clinician consid- to determine if an intervention approach
the reduction in risk relative to that in the ers a 5% RRR for needing total knee ar- should be implemented. Patient values
comparison group: (20% – 5%) ÷ 20% = throplasty to be clinically meaningful, the and preferences, the severity of the out-
34$8 %$%23 3!8)-'#3)5% 3!-$!1$)9%$%!-):%1%-#%)7%$ 3!-$!1$)9%$%!-):%1%-#%)7%$
%!- %!- %)'(3
.$%1!3%3.+.61)2*.&")!2
!+,)5!!1!

.9%-"%1'

;
4"3.3!+

%23&.1(%3%1.'%-%)38#()204!1% df P+

%23&.1.5%1!++%:%#39
P

)'(1)2*.&")!2
4"3.3!+ .3%23),!"+%
%23&.1(%3%1.'%-%)38-.3!//+)#!"+%
%23&.1.5%1!++%:%#3-.3!//+)#!"+%
; ;
!5.12%$%23 !5.123!8#3)5%
FIGURE 3. Forest plot demonstrating presentation of results from a meta-analysis in a systematic review (Hagen, 2004). From Hagen KB, Hilde G, Jamtvedt G, Winnem M. Bed rest for
acute low-back pain and sciatica. Cochrane Database of Systematic Reviews. 2004; Issue 4. Art. No.: CD001254. Reproduced with permission. Copyright © 2004 John Wiley & Sons Ltd.
come that would be avoided, and the ically use meta-analysis methods, when odds ratios. The null value for the treat-
cost and side effects of the intervention appropriate, to synthesize evidence from ment effect is represented as a central
are important determinants that should multiple single clinical trials.38 In this vertical line in a forest plot. Point esti-
be considered when making treatment way, results from the overall body of best mates for effect sizes plotted on one side
decisions. Thus, the threshold NNT will evidence, filtered and selected by explicit of the vertical reference line favor the ex-
almost certainly be different for different methodological quality criteria, are syn- perimental treatment; points plotted on
patients and there is no simple answer thesized to provide an overall estimate the other side of the line favor the com-
to the question of when an NNT is suf- of treatment effectiveness. Meta-analysis parison. If the CI around the point esti-
ficiently low to justify a treatment. TABLE methods allow pooling of sample sizes mate crosses the vertical line, results are
2 lists several physical therapy-related from among included studies, resulting in not statistically significant because those
interventions with associated outcomes, substantial advantages: (1) increased sta- results are consistent with a zero treat-
NNTs, and 95% CI values. tistical ability to detect significant treatment effect in the target population. FIG-
ment effects, and (2) enhanced precision URE 3 illustrates results from a systematic
Synthesized Results From Multiple in estimates of effect sizes, reflected in review39 for an outcome of pain intensity
Clinical Trials: Systematic Reviews narrower CIs around point estimates. at 12 weeks, comparing results obtained
In spite of language used above to char- Results of meta-analyses are typically in patients treated with bed rest com-
acterize results from a qualifying clini- presented in forest plots. A forest plot pared to patients who stayed active. Two
cal trial as “definitive,” a single trial will representing the simplest case from a me- RCTs were included in the meta-analysis:
rarely provide final or completely conclu- ta-analysis based on only 2 original stud- one with a statistically significant effect
sive evidence for treatment effectiveness. ies is shown in FIGURE 3. Note that results favoring the recommendation to stay ac-
This is why multiple RCTs with consis- from individual trials are represented by tive and one with no statistically signifi-
tent results provide stronger evidence point estimates (squares in this example), cant difference between groups. Without
than a single RCT. Consequently, a sin- with horizontal lines representing the meta-analysis the overall accumulation
gle systematic review with homogeneity CIs. Effect sizes for continuous scale out- of evidence might appear to be equivo-
of results from multiple RCTs provides a comes in a meta-analysis are transformed cal, with one study suggesting benefit
higher level of evidence (level 1a) than a to a normalized scale, such as a weighted and another suggesting no benefit. The
single RCT with good protections against mean difference (WMD) or a standard- synthesized result pooling data in a meta-
validity threats (level 1b).2 ized mean difference. For dichotomous analysis from both studies is represented
Systematic reviews are at the top of outcomes, effect sizes in a meta-analysis by the diamond shape labeled “subtotal”
the evidence hierarchy because they typ- are typically reported as relative risk or in FIGURE 3. This meta-analysis result from
Grading Scheme for Treatment Recommendations
TABLE 3
in a Clinical Practice Guideline
Grade of Clarity of
Recommendation* Risk/Benefit Methodological Strength of Support Evidence Implications
1A Clear RCTs without important limitations Strong recommendation; can apply to most patients in most
circumstances without reservation
1B Clear RCTs with important limitations (inconsistent Strong recommendation, likely to apply to most patients
results, methodological flaws†)
1C+ Clear No RCTs directly addressing the question, but Strong recommendation; can apply to most patients in most
results from closely related RCTs can be circumstances
unequivocally extrapolated, or evidence from
observational studies may be overwhelming
1C Clear Observational studies Intermediate-strength recommendation; may change when stronger
evidence is available
2A Unclear RCTs without important limitations Intermediate-strength recommendation; best action may differ depending
on circumstances or patients' or societal values

2B Unclear RCTs with important limitations (inconsistent Weak recommendation; alternative approaches likely to be better for some
results, methodological flaws) patients under some circumstances
2C Unclear Observational studies Very weak recommendations; other alternatives may be equally
reasonable
Abbreviation: RCT, randomized controlled trial.

* Since grade B and C studies are flawed, it is likely that most recommendations in these classes will be level 2. The following considerations will bear on
whether the recommendation is grade 1 or 2: the magnitude and precision of the treatment effect, patients’ risk of the target even being prevented, the nature of
the benefit and the magnitude of the risk associated with treatment, variability in patient preferences, variability in regional resource availability and health
care delivery practices, and cost considerations. Inevitably, weighing these considerations involves subjective judgment.
†
These situations include RCTs with both lack of blinding and subjective outcomes where the risk of bias in measurement of outcomes is high, RCTs with large
loss to follow-up.
Adapted with permission from Guyatt G, Hayward R, Richardson WS, et al. Moving from evidence to action. In Guyatt G, Rennie D. User’s Guide to the
Medical Literature: Essentials of Evidence-Based Practice. Chicago: American Medical Association, 2002.
aggregated evidence reveals a statistically STEP 3C. CRITICALLY integrate patient values, preferences, and
significant benefit in favor of the recom- APPRAISING THE LITERATURE: expectations in shared decision making
mendation to stay active: quite a different HOW CAN I APPLY THE when selecting a particular treatment.
conclusion from the equivocal judgment RESULTS TO PATIENT CARE? Also, the evidence will be relevant to a
suggested by a simple count of positive given patient only if outcomes measured
trials versus negative trials. in the clinical trial are consistent with the
T
he final question in a critical
appraisal of evidence involves a se- individual patient’s goals. Consideration
Synthesized Results From Multiple ries of deliberate judgments about must be given to whether the treatment as
Clinical Trials: Clinical Practice Guidelines the relevance and applicability of the evi- structured in the research study is accept-
Clinical practice guidelines integrate dence to a specific patient in the context able to the patient. Many issues must be
synthesized evidence with broader cul- of a specific clinical setting. An evidence- considered, such as anticipated frequency
tural, societal, and patient-interest con- based practitioner will need to decide and duration of patient visits, cost of the
siderations. Results in practice guidelines whether the patient under consideration treatment, possible discomfort or other
come in the form of recommendations is sufficiently similar to the patients in the adverse effects of the intervention of inter-
supported by specified levels of evidence. study or group of studies for the results est and of competing interventions (such
Readers performing a critical appraisal to be relevant. For example, the clinician as injections, surgery, or other noninva-
of a practice guideline should determine should determine whether the patients sive interventions), and how consistent
the method used by panel members to enrolled in the study were similar to his/ the treatment is with patient expecta-
grade treatment recommendations, and her own patient, including the inclu- tions. This final question also prompts the
then consider the relative strength of each sion and exclusion criteria, age, gender, practitioner to integrate personal clinical
recommendation. A common scheme for race, sociodemographics, stage of illness, expertise. Some treatments require spe-
grading recommendations in clinical prac- comorbidity and disability status, and cialty skills or specific equipment that
tice guidelines is reproduced in TABLE 3. prognosis. Next, the practitioner must may not be currently available and may
not be obtainable in a reasonable amount communication styles or skills, specialty part I of this series. However, it becomes
of time to help a particular patient. certifications, and years of practice in immediately apparent when appraising
Critical appraisal is an essential skill physical therapy.47,75 A study by Childs lower-level evidence that unprotected
for an evidence-based practitioner. Al- and colleagues16 found that experienced validity threats in these types of studies
though applying the principles outlined physical therapists with orthopaedic or permit substantial bias and severely limit
above for critical appraisal may be dif- sports certifications demonstrate greater confidence in reported results. The hier-
ficult to master initially, the process be- knowledge in managing musculoskeletal archy of evidence does not exclude expert
comes much easier with practice. Critical conditions than therapists without spe- opinion (level 5 evidence); but opinion
appraisal using these principles is the cialty certification. Despite these find- should be considered best evidence only
best method to facilitate independent ings, one cannot infer that patients cared with specific knowledge that higher-lev-
professional judgments about the valid- for by expert clinicians will achieve su- el evidence does not yet exist. Finally, it
ity, strength, and relevance of evidence perior outcomes when compared to the should be recognized that the results from
for therapy. A checklist to organize key outcomes of patients treated by novice higher levels of evidence, such as system-
judgments during a critical appraisal for clinicians.71,85 In fact, it has been dem- atic reviews, might conclude that there is
a RCT is included in APPENDIX A. onstrated that expert clinicians are often currently insufficient evidence to support
resistant to changing their practice be- one intervention option over another. In
STEP 4. INCORPORATING haviors even when their treatment ap- these instances, treatment decisions based
EVIDENCE INTO proaches have been disproven.5 Hence, on clinician expertise and experience (al-
CLINICAL PRACTICE while clinical expertise is important, it is though these are lower forms of evidence
insufficient to assure optimal outcomes. in most evidence hierarchies) may in fact
Reliance on clinical experience without be the most appropriate form of guidance
O
nce it has been determined
through critical appraisal that a including knowledge and application of to inform clinical decision making.
particular study or group of studies evidence to clinical care is inconsistent To illustrate how knowledge of current
provides valid, applicable evidence that with the principles of EBP.16,85 Therefore, best evidence, combined with critical ap-
a treatment yields clinically meaningful seeking and incorporating the best avail- praisal skills, can guide clinical decision
benefits, the clinician should integrate the able evidence should be an integral part making, consider the case of a 74-year-
evidence into clinical practice. If a given of the clinical decision-making process. old female with a history of spinal stenosis
patient is reasonably similar to those in Instituting behavior change among and cardiovascular disease who indicated
the study, a clinician should be able to in- practicing clinicians is one of the fore- that she developed her most recent bout of
tegrate valid evidence with considerable most barriers to successful integration low back pain after injuring her back while
confidence. However, any given patient of EBP.17,36,85 While some clinicians are playing with her great granddaughter 3
will have a unique set of prognostic at- quick to adopt change, many others are weeks previously. Her Modified Low Back
tributes. Clinicians must recognize that unfortunately resistant to change and Pain Disability Index was 20% and she
treatments typically are not uniformly rely predominantly on their clinical ex- indicated that her goals were to complete
effective inasmuch as reported results perience rather than incorporating evi- household activities without making her
are for average treatment effects.10 This dence into their practice.9 Although the back pain worse and to be able to play with
is another reason why the clinician must volume and quality of emerging evidence her great granddaughter in 2 weeks. The
integrate the best available evidence with in many areas of physical therapist prac- most impressive findings from the physi-
clinical expertise and the goals, values, tice is mounting rapidly, we acknowledge cal exam include generalized stiffness and
and expectations of the patient when de- that there are still many areas where evi- loss of motion in both hips and lumbar
termining which interventions are prefer- dence is sparse and inconclusive. In these spine in flexion. In consultation with the
able for a particular individual. instances, rather than waiting for the patient, you indicate that her goals seem
Many perceived barriers may prevent “perfect evidence,” clinicians should act realistic and that you wish to reassess her
successful integration of EBP into physi- on the research evidence that is currently Modified Low Back Pain Disability score
cal therapist practice.47,58 One barrier is available and follow up by using patient- in 2 weeks and expect her to demonstrate
excessive reliance on clinical expertise centered outcomes tools to determine at least a 6-point change. Your interven-
which can be associated with failure to those interventions which are effective for tion strategy includes patient education,
acknowledge and incorporate current a particular patient and those which are joint mobilization to the hips and lum-
best evidence into clinical practice. Ex- not.70 Critical appraisals for lower-level ev- bar spine, and implementation of a body
pertise in physical therapist practice has idence, such as cohort studies, case series, weight-supported walking program.
been described as possession of profes- and case reports, can be performed using This patient case illustrates several im-
sional values, decision-making processes, the same principles outlined above and in portant issues. Although this patient has 2
potentially negative prognostic factors—a Recognition of personal and profession- evidence (ie, peer-reviewed, published lit-
history of recurrent back pain and cardio- al limitations can be difficult and may result erature), while accounting for differences
vascular disease—her modified low back in avoidance of the issues, regardless of the between clinical and research settings and
pain disability score of 20 indicates a mild internal drive and motivation of the thera- contexts. It is ultimately through these qual-
level of disability. Because the MCID for pist.27 Developing competence in the EBP ity measurement processes and account-
the Modified Low Back Pain Disability process will require clinicians to acknowl- ability to EBP principles that therapists
Questionnaire is 6 points,31 this is chosen edge times of uncertainty and the need become clinicians of excellence.9
as the quantitative goal that seems to best for gathering information. Competence
match those described by the patient. The includes self-awareness on behalf of the SUMMARY
intervention strategy is based on a recently therapist and the ability to recognize per-
published clinical trial by Whitman and sonal limitations, which can be very diffi-
D
etermining the source, valid-
colleagues84 that used a program of patient cult. Straus et al81 have developed a series of ity, strength, and relevance of evi-
education, body-weight supported tread- questions (APPENDIX B) to facilitate introspec- dence for treatment decisions re-
mill training, and joint mobilization to the tive self-evaluation for the evidence-based quires successful integration of the EBP
spine and hip joints. The typical subjects in practitioner. Therapists should reflect sub- process. The goal of EBP is to improve
the clinical trial were women with an aver- jectively on their ability to proceed through efficiency and assist clinicians in selecting
age age of 69 years and a baseline Modi- the first 4 steps, but should also assess pa- interventions that will maximize patient
fied Oswestry score of 36, which seem to tient outcomes objectively and formally in outcomes rather than erroneously select-
closely match the characteristics of this the context of best available evidence. Phys- ing interventions with little or no demon-
patient. In addition, the average modified ical therapists should use reliable and valid strated effectiveness.56
low back pain disability score reduction at outcome measures for every patient they see The identification of appropriate fore-
6 weeks of the intervention program was in clinical practice to ascertain if true and ground questions, performing literature
approximately 10 points. Therefore, the clinically meaningful changes in patient sta- searches, critically analyzing the best
goal of a 6-point change in 2 weeks seems tus occurred (ie, did patient improvements available evidence, applying the best evi-
realistic. As discussed in a previous sec- exceed the outcome scale’s MDC and MCID dence to clinical practice, and ultimately
tion, however, MCIDs that are established scores). The data obtained through the use assuring the proficiency of the process
based on group data can be misleading if of valid and reliable outcomes tools, along will ultimately lead to optimal care for
applied to individual patients. Therefore, with the self-evaluation of effectiveness and our patients. Developing proficiency in the
a more conservative approach of establish- efficiency with the 4 steps, will enhance 5-step process to EBP requires strong ded-
ing goals that exceed the MCID threshold clinical practice. Clinicians may find it help- ication and effort from students as well as
might be a better guideline to ensure that ful to read one of the several case studies or practicing therapists, and at times can be
self-report measures represent true clini- case series where clinicians provide detailed quite challenging. However, as healthcare
cally important change. description of applying current best evi- providers, therapists should approach the
dence in managing patients with a variety challenge of successful integration of EBP
STEP 5. EVALUATING PERFOR- of conditions. For example, MacDonald and with enthusiasm, as the overall goal is to
MANCE ON STEPS 1 THROUGH 4 colleagues57 reported on the management provide the best quality of care and maxi-
of a series of patients with hip dysfunction mize positive outcomes for their patients.
who responded positively to novel manual They should embrace and not retreat from
A
lthough most of this commen-
tary addresses critical appraisal of therapy interventions. Similarly, Cleland the challenge of integrating the best avail-
evidence, this fifth and final step in et al21 and Waldrop82 have published case able evidence, clinical expertise, and pa-
the process of achieving successful imple- series that apply recently developed clinical tient values into clinical decisions for each
mentation of EBP is arguably the most im- prediction rules to patients. individual patient.37 T
portant. Self-assessment of practice begins As proposed by Flynn and colleagues,30 REFERENCES
as a student in the form of self-observation the use of minimal data collection forms
1. Physiotherapy Evidence Database (PEDro).
and judgmental processing and should that include key examination findings and Available at: http://www.pedro.fhs.usyd.edu.au/
continue through one’s professional ca- appropriate patient-centered outcome mea- calculator.html . Accessed July 11, 2008
reer.64 The skills of self-awareness assist sures will allow students as well as practic- 2. Levels of evidence and grades of recommen-
dations. Available at: http://www.cebm.net/
clinicians in identifying personal strengths ing clinicians to monitor their individual
levels_of_evidence.asp. Accessed July 11, 2008.
as well as limitations.27 It is with reflective clinical performance. With this informa- 3. Altman DG. Confidence intervals. In: Straus SE,
practice that physical therapists will refine tion, clinicians can compare average patient Richardson WS, Glasziou P, Haynes RB, eds.
their efficiency with integrating the best improvements in clinical settings to average Evidence-based Medicine: How to Practice and
Teach EBM. Edinburgh, UK: Churchill Living-
available evidence into clinical practice. patient improvements in the current best
stone; 2005. 6-32 into clinical practice. J Orthop Sports Phys Ther.
4. Angst F, Aeschlimann A, Stucki G. Smallest 17. Choudhry NK, Fletcher RH, Soumerai SB. Sys- 2006;36:577-587. http://dx.doi.org/10.2519/
detectable and minimal clinically important tematic review: the relationship between clinical jospt.2006.2159
differences of rehabilitation intervention experience and quality of health care. Ann Intern 31. Fritz JM, Irrgang JJ. A comparison of a modified
with their implications for required sample Med. 2005;142:260-273. Oswestry Low Back Pain Disability Question-
sizes using WOMAC and SF-36 quality of life 18. Clegg DO, Reda DJ, Harris CL, et al. Glu- naire and the Quebec Back Pain Disability Scale.
measurement instruments in patients with cosamine, chondroitin sulfate, and the two in Phys Ther. 2001;81:776-788.
osteoarthritis of the lower extremities. Arthri- combination for painful knee osteoarthritis. N 32. Fritz JM, Whitman JM, Childs JD. Lumbar spine
tis Rheum. 2001;45:384-391. http://dx.doi. Engl J Med. 2006;354:795-808. http://dx.doi. segmental mobility assessment: an examination
org/10.1002/1529-0131(200108)45:4<384::AID- org/10.1056/NEJMoa052771 of validity for determining intervention strate-
ART352>3.0.CO;2-0 19. Cleland JA, Fritz JM, Whitman JM, Palmer JA. gies in patients with low back pain. Arch Phys
5. Antman EM, Lau J, Kupelnick B, Mosteller F, The reliability and construct validity of the Neck Med Rehabil. 2005;86:1745-1752. http://dx.doi.
Chalmers TC. A comparison of results of meta- Disability Index and patient specific functional org/10.1016/j.apmr.2005.03.028
analyses of randomized control trials and rec- scale in patients with cervical radiculopa- 33. Gardner MJ, Altman DG. Statistics With Confi-
ommendations of clinical experts. Treatments thy. Spine. 2006;31:598-602. http://dx.doi. dence. London, UK: BMJ Books; 2005.
for myocardial infarction. JAMA. 1992;268:240- org/10.1097/01.brs.0000201241.90914.22 34. Gerber JP, Marcus RL, Dibble LE, Greis PE,
248. 20. Cleland JA, Glynn P, Whitman JM, Eberhart Burks RT, Lastayo PC. Safety, feasibility, and
6. Bandy W. Use of statistics in physical therapy SL, MacDonald C, Childs JD. Short-term ef- efficacy of negative work exercise via eccentric
over a 2-year period 2000-2002: implications fects of thrust versus nonthrust mobilization/ muscle activity following anterior cruciate liga-
for educators. J Phys Ther Ed. 2006;17:67-70. manipulation directed at the thoracic spine in ment reconstruction. J Orthop Sports Phys
7. Barr S, Bellamy N, Buchanan WW, et al. A patients with neck pain: a randomized clinical Ther. 2007;37:10-18. http://dx.doi.org/10.2519/
comparative study of signal versus aggregate trial. Phys Ther. 2007;87:431-440. http://dx.doi. jospt.2007.2362
methods of outcome measurement based on the org/10.2522/ptj.20060217 35. Greenfield ML, Kuhn JE, Wojtys EM. A statistics
WOMAC Osteoarthritis Index. Western Ontario 21. Cleland JA, Whitman JM, Fritz JM, Palmer JA. primer. Confidence intervals. Am J Sports Med.
and McMaster Universities Osteoarthritis Index. Manual physical therapy, cervical traction, and 1998;26:145-149.
J Rheumatol. 1994;21:2106-2112. strengthening exercises in patients with cervical
36. Grimshaw JM, Shirran L, Thomas R, et al.

8. Beaton DE, Boers M, Wells GA. Many faces radiculopathy: a case series. J Orthop Sports Changing provider behavior: an overview of
of the minimal clinically important difference Phys Ther. 2005;35:802-811. http://dx.doi. systematic reviews of interventions. Med Care.
(MCID): a literature review and directions org/10.2519/jospt.2005.2077 2001;39:II2-45.
for future research. Curr Opin Rheumatol. 22. Cohen L. Statistical Power Analysis for the 37. Guyatt G, Rennie D. User’s Guide to the Medical
2002;14:109-114. Behavioral Sciences. 2nd ed. Hillsdale, NJ: Law- Literature: Essentials of Evidence-Based Prac-
9. Berwick DM. Disseminating innovations in rence Erlbaum Associates; 1988. tice. Chicago, IL: American Medical Association;
health care. JAMA. 2003;289:1969-1975. http:// 23. Denis M, Moffet H, Caron F, Ouellet D, Paquet 2002.
dx.doi.org/10.1001/jama.289.15.1969 J, Nolet L. Effectiveness of continuous passive 38. Guyatt G, Walter S, Cook D, Jaeschke R. Therapy
10. Bhandari M, Haynes RB. How to appraise motion and conventional physical therapy after and understanding the results: confidence inter-
the effectiveness of treatment. World J Surg. total knee arthroplasty: a randomized clinical vals. In: Guyatt G, Rennie D, eds. User’s Guide to
2005;29:570-575. http://dx.doi.org/10.1007/ trial. Phys Ther. 2006;86:174-185. the Medical Literature: Essentials of Evidence-
s00268-005-7915-9 24. Deyle GD, Allison SC, Matekel RL, et al. Physical Based Practice. Chicago, IL: American Medical
11. Binkley JM, Stratford PW, Lott SA, Riddle DL. therapy treatment effectiveness for osteoarthri- Association; 2002:
The Lower Extremity Functional Scale (LEFS): tis of the knee: a randomized comparison of 39. Hagen KB, Hilde G, Jamtvedt G, Winnem M.
scale development, measurement properties, supervised clinical exercise and manual therapy Bed rest for acute low-back pain and sciatica.
and clinical application. North American Ortho- procedures versus a home exercise program. Cochrane Database Syst Rev. 2004;CD001254.
paedic Rehabilitation Research Network. Phys Phys Ther. 2005;85:1301-1317. http://dx.doi.org/10.1002/14651858.CD001254.
Ther. 1999;79:371-383. 25. Deyle GD, Henderson NE, Matekel RL, Ryder MG, pub2
12. Bischoff-Ferrari HA, Dawson-Hughes B, Willett Garber MB, Allison SC. Effectiveness of manual 40. Hagg O, Fritzell P, Nordwall A. The clinical im-
WC, et al. Effect of Vitamin D on falls: a meta- physical therapy and exercise in osteoarthritis portance of changes in outcome scores after
analysis. JAMA. 2004;291:1999-2006. http:// of the knee. A randomized, controlled trial. Ann treatment for chronic low back pain. Eur Spine
dx.doi.org/10.1001/jama.291.16.1999 Intern Med. 2000;132:173-181. J. 2003;12:12-20. http://dx.doi.org/10.1007/
13. Butcher SJ, Craven BR, Chilibeck PD, Spink 26. Domholdt E. Physical Therapy Research. 2nd s00586-002-0464-0
KS, Grona SL, Sprigings EJ. The effect of trunk ed. Philadelphia, PA: W.B. Saunders Co; 2000. 41. Hale SA, Hertel J, Olmsted-Kramer LC. The ef-
stability training on vertical takeoff velocity. 27. Epstein RM. Mindful practice. JAMA. fect of a 4-week comprehensive rehabilitation
J Orthop Sports Phys Ther. 2007;37:223-231. 1999;282:833-839. program on postural control and lower extremity
http://dx.doi.org/10.2519/jospt.2007.2331 28. Farrar JT, Young JP, Jr., LaMoreaux L, Werth JL, function in individuals with chronic ankle insta-
14. Cacchio A, Paoloni M, Barile A, et al. Effective- Poole RM. Clinical importance of changes in bility. J Orthop Sports Phys Ther. 2007;37:303-
ness of radial shock-wave therapy for calcific chronic pain intensity measured on an 11-point 311. http://dx.doi.org/10.2519/jospt.2007.2322
tendinitis of the shoulder: single-blind, random- numerical pain rating scale. Pain. 2001;94:149- 42. Hall T, Chan HT, Christensen L, Odenthal B,
ized clinical study. Phys Ther. 2006;86:672-682. 158. Wells C, Robinson K. Efficacy of a C1-C2 self-
15. Childs JD, Piva SR, Fritz JM. Responsiveness of 29. Fidler F, Thomason N, Cumming G, Finch S, sustained natural apophyseal glide (SNAG) in
the numeric pain rating scale in patients with Leeman J. Editors can lead researchers to confi- the management of cervicogenic headache. J
low back pain. Spine. 2005;30:1331-1334. dence intervals, but can’t make them think: sta- Orthop Sports Phys Ther. 2007;37:100-107.
16. Childs JD, Whitman JM, Sizer PS, Pugia ML, tistical reform lessons from medicine. Psychol 43. Hartig DE, Henderson JM. Increasing hamstring
Flynn TW, Delitto A. A description of physical Sci. 2004;15:119-126. flexibility decreases lower extremity overuse
therapists’ knowledge in managing musculosk- 30. Flynn TW, Wainner RS, Fritz JM. Spinal manipu- injuries in military basic trainees. Am J Sports
eletal conditions. BMC Musculoskelet Disord. lation in physical therapist professional degree Med. 1999;27:173-176.
2005;6:32. http://dx.doi.org/10.1186/1471-2474- education: A model for teaching and integration 44. Herbert RD. How to estimate treatment effects
from reports of clinical trials. II: Dichotomous high-quality evidence on therapy. Phys Ther. 74. Schmitt JS, Di Fabio RP. Reliable change and
outcomes. Aust J Physiother. 2000;46:309-313. 2004;84:644-654. minimum important difference (MID) propor-
45. Hyland MR, Webber-Gaffney A, Cohen L, Licht- 59. Make B. How can we assess outcomes tions facilitated group responsiveness compari-
man PT. Randomized controlled trial of calca- of clinical trials: the MCID approach. sons using individual threshold criteria. J Clin
neal taping, sham taping, and plantar fascia COPD. 2007;4:191-194. http://dx.doi. Epidemiol. 2004;57:1008-1018. http://dx.doi.
stretching for the short-term management of org/10.1080/15412550701471231 org/10.1016/j.jclinepi.2004.02.007
plantar heel pain. J Orthop Sports Phys Ther. 60. Michener LA, McClure PW, Sennett BJ. Ameri- 75. Shepard KF, Hack LM, Gwyer J, Jensen GM.
2006;36:364-371. http://dx.doi.org/10.2519/ can Shoulder and Elbow Surgeons Standardized Describing expert practice in physical therapy.
jospt.2006.2078 Shoulder Assessment Form, patient self-report Qual Health Res. 1999;9:746-758.
46. Jaeschke R, Singer J, Guyatt GH. Measurement section: reliability, validity, and responsiveness. 76. Sim J, Reid N. Statistical inference by confi-
of health status. Ascertaining the minimal clini- J Shoulder Elbow Surg. 2002;11:587-594. http:// dence intervals: issues of interpretation and
cally important difference. Control Clin Trials. dx.doi.org/10.1067/mse.2002.127096 utilization. Phys Ther. 1999;79:186-195.
1989;10:407-415. 61. Moiler K, Hall T, Robinson K. The role of fibular 77. Stratford PW, Binkley FM, Riddle DL. Health
47. Jensen GM, Gwyer J, Shepard KF. Expert prac- tape in the prevention of ankle injury in basket- status measures: strategies and analytic meth-
tice in physical therapy. Phys Ther. 2000;80:28- ball: A pilot study. J Orthop Sports Phys Ther. ods for assessing change scores. Phys Ther.
43; discussion 44-52. 2006;36:661-668. http://dx.doi.org/10.2519/ 1996;76:1109-1123.
48. Jette DU, Bacon K, Batty C, et al. Evidence- jospt.2006.2259 78. Stratford PW, Binkley J, Solomon P, Finch E, Gill
based practice: beliefs, attitudes, knowledge, 62. Montori VM, Kleinbart J, Newman TB, et al. C, Moreland J. Defining the minimum level of
and behaviors of physical therapists. Phys Ther. Tips for learners of evidence-based medicine: detectable change for the Roland-Morris ques-
2003;83:786-805. 2. Measures of precision (confidence inter- tionnaire. Phys Ther. 1996;76:359-365; discus-
49. Johnson AJ, Godges JJ, Zimmerman GJ, vals). CMAJ. 2004;171:611-615. http://dx.doi. sion 366-358.
Ounanian LL. The effect of anterior versus org/10.1503/cmaj.1031667 79. Stratford PW, Binkley JM, Riddle DL, Guyatt
posterior glide joint mobilization on external 63. Moseley JB, O’Malley K, Petersen NJ, et al. GH. Sensitivity to change of the Roland-Morris
rotation range of motion in patients with shoul- A controlled trial of arthroscopic surgery Back Pain Questionnaire: part 1. Phys Ther.
der adhesive capsulitis. J Orthop Sports Phys for osteoarthritis of the knee. N Engl J Med.
1998;78:1186-1196.
2002;347:81-88. http://dx.doi.org/10.1056/
Ther. 2007;37:88-99. http://dx.doi.org/10.2519/

80. Stratford PW, Riddle DL, Binkley JM, Spadoni
jospt.2007.2307 NEJMoa013259
G, Westaway MD, Padfield B. Using the Neck
50. Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to 64. Musolino GM. Fostering reflective practice: self-
Disability Index to make decisions concerning
assess equivalence: the importance of rigorous assessment abilities of physical therapy stu-
individual patients. Physiother Can. 1999;51:107-
methods. BMJ. 1996;313:36-39. dents and entry-level graduates. J Allied Health.
112.
51. Jull G, Trott P, Potter H, et al. A randomized con- 2006;35:30-42.
81. Straus SE, Richardson WS, Glasziou P, Haynes
trolled trial of exercise and manipulative therapy 65. Newman D, Allison SC. Risk and physical thera-
RB. Evidence-Based Medicine. 3rd ed. Edin-
for cervicogenic headache. Spine. 2002;27:1835- py? J Orthop Sports Phys Ther. 2007;37:287-289.
burgh, UK: Elsevier/Churchill Livingstone; 2005.
1843; discussion 1843. http://dx.doi.org/10.2519/jospt.2007.0106
82. Waldrop MA. Diagnosis and treatment of cervi-
52. Keers JC, Groen H, Sluiter WJ, Bouma J, Links 66. Oldridge N, Perkins A, Marchionni N, Fumagalli
cal radiculopathy using a clinical prediction rule
TP. Cost and benefits of a multidisciplinary S, Fattirolli F, Guyatt G. Number needed to treat
and a multimodal intervention approach: a case
intensive diabetes education programme. J in cardiac rehabilitation. J Cardiopulm Rehabil.

series. J Orthop Sports Phys Ther. 2006;36:152-
Eval Clin Pract. 2005;11:293-303. http://dx.doi. 2002;22:22-30.
org/10.1111/j.1365-2753.2005.00536.x 67. O’Leary S, Jull G, Kim M, Vicenzino B. Specificity 159. http://dx.doi.org/10.2519/jospt.2006.2056
53. Koumantakis GA, Watson PJ, Oldham JA. Trunk in retraining craniocervical flexor muscle perfor- 83. Watt CF, Taylor NF, Baskus K. Do Colles’ fracture
muscle stabilization training plus general exer- mance. J Orthop Sports Phys Ther. 2007;37:3-9. patients benefit from routine referral to phys-
cise versus general exercise only: randomized http://dx.doi.org/10.2519/jospt.2007.2237 iotherapy following cast removal? Arch Orthop
controlled trial of patients with recurrent low 68. Palombaro KM, Craik RL, Mangione KK, Tom- Trauma Surg. 2000;120:413-415.
back pain. Phys Ther. 2005;85:209-225. linson JD. Determining meaningful changes 84. Whitman JM, Flynn TW, Childs JD, et al. A
54. Laupacis A, Sackett DL, Roberts RS. An as- in gait speed after hip fracture. Phys Ther. comparison between two physical therapy
sessment of clinically useful measures of the 2006;86:809-816. treatment programs for patients with lum-
consequences of treatment. N Engl J Med. 69. Portney LG, Watkins MP. Foundations of Clini- bar spinal stenosis: a randomized clinical
1988;318:1728-1733. cal Research: Applications to Practice. Upper trial. Spine. 2006;31:2541-2549. http://dx.doi.
55. Lee JS, Hobden E, Stiell IG, Wells GA. Clinically Saddle River, NJ: Prentice Hall Health; 2000. org/10.1097/01.brs.0000241136.98159.8c
important change in the visual analog scale 70. Reinertsen JL. Zen and the art of physician 85. Whitman JM, Fritz JM, Childs JD. The influence
after adequate pain control. Acad Emerg Med. autonomy maintenance. Ann Intern Med. of experience and specialty certifications on
2003;10:1128-1130. 2003;138:992-995. clinical outcomes for patients with low back
56. MacDermid JC. An introduction to evidence- 71. Resnik L, Hart DL. Using clinical outcomes to pain treated within a standardized physical
based practice for hand therapists. J Hand Ther. identify expert physical therapists. Phys Ther. therapy management program. J Orthop Sports
2004;17:105-117. http://dx.doi.org/10.1197/j. 2003;83:990-1002. Phys Ther. 2004;34:662-672; discussion 672-
jht.2004.02.001 72. Rydeard R, Leger A, Smith D. Pilates-based 665. http://dx.doi.org/10.2519/jospt.2004.1535
57. MacDonald CW, Whitman JM, Cleland JA, Smith therapeutic exercise: effect on subjects with 86. Wise RA, Brown CD. Minimal clinically important
M, Hoeksma HL. Clinical outcomes following nonspecific chronic low back pain and func- differences in the six-minute walk test and
manual physical therapy and exercise for hip tional disability: a randomized controlled trial. the incremental shuttle walking test. COPD.
osteoarthritis: A case series. J Orthop Sports J Orthop Sports Phys Ther. 2006;36:472-484. 2005;2:125-129.
Phys Ther. 2006;36:588-599. http://dx.doi. http://dx.doi.org/10.2519/jospt.2008.2669
org/10.2519/jospt.2006.2233 73. Sackett DL, Straws SE, Richardson WS, Rosen-
@
58. Maher CG, Sherrington C, Elkins M, Herbert RD, berg W, Haynes RB. Evidence-Based Medicine:
Moseley AM. Challenges for evidence-based How to Practice and Teach EBM. London, UK:
MORE INFORMATION
physical therapy: accessing and interpreting Harcourt Publishers Limited; 2000. WWW.JOSPT.ORG
APPENDIX A
CHECKLIST FOR CRITICAL APPRAISAL OF A RANDOMIZED CONTROLLED TRIAL

Not
Yes No Can’t Tell Applicable
Are the results valid?
Was a randomization procedure explicitly reported? ___ ___ ___ ___
Was group assignment concealed from those enrolling patients? ___ ___ ___ ___
Were groups reasonably homogenous at baseline? ___ ___ ___ ___
Were the patients blinded to the treatment they received? ___ ___ ___ ___
Were treating clinicians blinded to group membership? ___ ___ ___ ___
Were data collectors blinded to group membership? ___ ___ ___ ___
Was the follow-up period sufficiently long? ___ ___ ___ ___
Did any patients drop out or switch group assignment? ___ ___ ___ ___
If there were dropouts or switchover patients, was an intention-to-treat analysis performed? ___ ___ ___ ___
Was the overall research experience equivalent for groups, other than the treatment(s) of interest? ___ ___ ___ ___
What are the results?

Are the treatment effects statistically significant (a positive trial)? ___ ___ ___ ___
In a positive trial, is the treatment effect size clinically meaningful (equal to or larger than the MCID*)? ___ ___ ___ ___
In a positive trial, does the 95% confidence interval around the point estimate of the treatment effect exclude the MCID? ___ ___ ___ ___
In a negative trial, does the 95% confidence interval around the point estimate of the treatment effect exclude the MCID? ___ ___ ___ ___
How can I apply the results to patient care?
Is my patient sufficiently similar to patients in the treatment group? ___ ___ ___ ___
Are the outcomes measured in the study relevant to my patient’s goals? ___ ___ ___ ___
Is the treatment compatible with my patient’s values, preferences, and expectations? ___ ___ ___ ___
Are the anticipated benefits worth the costs and potential for any adverse effects? ___ ___ ___ ___
Do I have the clinical skills and any required equipment to provide the treatment? ___ ___ ___ ___
Abbreviation: MCID, minimal clinically important difference.
APPENDIX B
SELF-EVALUATION QUESTIONS FOR EVIDENCE-BASED PRACTITIONERS*

Self-evaluation in asking answerable questions A self-evaluation in critically appraising the evidence for its validity and potential
1. Am I asking any clinical questions at all? usefulness
2. Am I asking well-formulated questions: 1. Am I critically appraising external evidence at all?
Jme#fWhjgk[ij_ediWXekjÇXWYa]hekdZÈademb[Z][5 2. Are the critical appraisal guides becoming easier for me to apply?
<ekh#ehj^h[[#fWhjgk[ij_ediWXekjÇ\eh[]hekdZÈZ_W]dei_i"cWdW][c[dj"[jY5 3. Am I becoming more accurate and efficient in applying some of the critical
)$7c?ki_d]WÇcWfÈjebeYWj[coademb[Z][]WfiWdZWhj_YkbWj[gk[ij_edi5 appraisal measures (such as likelihood ratios, NNTs, and the like)?
*$9Wd?][jcoi[b\ÇkdijkYaÈm^[dWia_d]gk[ij_edi5 4. Am I creating any appraisal summaries?
5. Do I have a working method to save my questions for later answering? A self-evaluation in integrating the critical appraisal with clinical expertise and applying
A self-evaluation in finding the best external evidence the result in clinical practice
1. Am I searching at all? 1. Am I integrating my critical appraisals into my practice at all?
2. Am I becoming more accurate and efficient in adjusting some of the critical
2. Do I know the best sources of current evidence for my clinical discipline?

3. Have I achieved immediate access to searching hardware, software, and the best appraisal measures to fit my individual patients (pretest probabilities, NNT/f, etc.)?
3. Can I explain (and resolve) disagreements about management decisions in terms of
evidence for my clinical discipline?
this integration?
4. Am I finding useful external evidence from a widening array of sources?
5. Am I becoming more efficient in my searching? A self-evaluation of changing practice behavior
6. Am I using truncations, Booleans, MeSH headings, thesaurus, limiters, and 1. When new evidence suggests a change in practice, am I identifying barriers to
intelligent free text when searching MEDLINE? this change?
7. How do my searches compare with those of research librarians or other respected 2. Have I carried out any check, such as audits of my diagnostic, therapeutic,
colleagues who have a passion for providing best current patient care? or other EBM performance?
* Reproduced with permission from Straus SE, Richardson WS, Glasziou P, Haynes RB. Evidence-Based Medicine: How to Practice and Teach EBM. 3rd ed. Edinburgh, UK: Churchill
Livingstone; 2005. © 2005 Elsevier.

Jospt.2008.2725 A Primer On Selected Aspects of EBP Relating To Questions of Treatment II

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jospt.2008.2725 A Primer On Selected Aspects of EBP Relating To Questions of Treatment II

Uploaded by

Copyright:

Available Formats

[ CLINICAL COMMENTARY ]

J. TIMOTHY NOTEBOOM, PT, PhD, SCS¹IJ;F>;D9$7BB?IED"PT, PhD²

A Primer on Selected Aspects of

termines that patients from the study suf-

line to posttreatment in shoulder external

Outcome Scale Suggested MCID* Clinical Context Published Study

study would be 9.4 points on the AIMS

0 difference, one can reject the hypothesis

Results for Dichotomous Outcomes: Risk

CTL–EXP1 many important naturally dichotomous

Clinical Question NNT 95% Conﬁdence Interval

4"3.3!+     

on circumstances or patients' or societal values

Abbreviation: RCT, randomized controlled trial.

Reliance on clinical experience without be the most appropriate form of guidance

36. Grimshaw JM, Shirran L, Thomas R, et al.

Ther. 2007;37:88-99. http://dx.doi.org/10.2519/

intensive diabetes education programme. J in cardiac rehabilitation. J Cardiopulm Rehabil.

CHECKLIST FOR CRITICAL APPRAISAL OF A RANDOMIZED CONTROLLED TRIAL

What are the results?

Abbreviation: MCID, minimal clinically important difference.

SELF-EVALUATION QUESTIONS FOR EVIDENCE-BASED PRACTITIONERS*

2. Do I know the best sources of current evidence for my clinical discipline?

You might also like

J. TIMOTHY NOTEBOOM, PT, PhD, SCS¹IJ;F>;D9$7BB?IED"PT, PhD²

4"3.3!+