Professional Documents
Culture Documents
Enhancing Causal Interpretations of Quality Improvement Interventions
Enhancing Causal Interpretations of Quality Improvement Interventions
Viewpoint
G Cable
Abstract
In an era of chronic resource scarcity it is Key messages
critical that quality improvement profes- + In an era of chronic resource scarcity it is
sionals have confidence that their project critical that quality improvement profes-
activities cause measured change. A com- sionals have confidence that their project
monly used research design, the single activities cause measured change.
group pre-test/post-test design, provides + A commonly used research design, the
little insight into whether quality im- single group pre-test/post-test design,
provement interventions cause measured provides little insight into whether quality
outcomes. A re-evaluation of a quality improvement interventions cause
improvement programme designed to re- measured outcomes.
duce the percentage of bilateral cardiac + Many other quasi-experimental designs
catheterisations for the period from Janu- can be employed in most contexts instead
ary 1991 to October 1996 in three cath- of single group pre-test/post-test designs,
eterisation laboratories in a north eastern and provide much greater causal inter-
state in the USA was performed using an pretability of the findings of the quality
interrupted time series design with improvement project evaluation. One of
switching replications. The accuracy and the most powerful of these is interrupted
causal interpretability of the findings were time series designs.
considerably improved compared with the
original evaluation design. Moreover, the Implications of these findings for
re-evaluation provided tangible evidence quality improvement
in support of the suggestion that more The adoption of more rigorous research
rigorous designs can and should be more designs to evaluate quality improvement
widely employed to improve the causal eVorts will enable quality improvement pro-
interpretability of quality improvement fessionals to know with greater confidence
eVorts. Evaluation designs for quality whether the eVorts actually work. In turn,
improvement projects should be con- projects determined to be successful based
structed to provide a reasonable oppor- on this more compelling evidence can more
tunity, given available time and resources, confidently be diVused as best practices. A
for causal interpretation of the results. latent benefit of improving the level of
Evaluators of quality improvement initia- rigour in the evaluation research design is
tives may infrequently have access to ran- that the perception of quality improvement
domised designs. Nonetheless, as shown projects as “scientific” will be enhanced.
here, other very rigorous research designs
are available for improving causal inter-
pretability. Unilateral methodological
surrender need not be the only alternative improvement project should include an evalua-
to randomised experiments. tion research design that permits rigorous test-
(Quality in Health Care 2001;10:179–186) ing of the extent to which improvement eVorts
actually cause measured change. The most
Keywords: causal interpretations; quality improvement; commonly employed research designs—single
The Quality Institute,
interrupted time series design; implementation fidelity
Atlantic Health group pre-test/post-test designs and one shot
System, 325 Columbia case studies (that is, a single group design with
Turnpike, Florham Actuating beneficial change is the raison d’être a post-test only)—provide little evidence of the
Park, NJ 07932, USA
G Cable, director of
of quality improvement eVorts. In an era when causal impact of improvement activities.5 Con-
research private entities and governments are increas- sequently, methodological rigour is often un-
ingly less willing to pay for care, quality necessarily sacrificed when designs are avail-
Correspondence to: improvement initiatives must additionally be able which can improve the causal
Dr G Cable able to demonstrate superiority over competing interpretability of the results. The purpose of
drcableg@hotmail.com
strategies, or the option of doing nothing at this paper is to make a case for improving the
Accepted 30 March 2001 all.1–4 In the current milieu, every major quality rigour of research designs in order to enhance
www.qualityhealthcare.com
180 Cable
the causal interpretability of quality improve- the trial are due largely to variables they them-
ment eVorts. selves have manipulated.5–7 In many contexts,
The first section of the paper provides back- however, it is not feasible to design and imple-
ground regarding the comparative virtues of ment a randomised controlled trial because
diVerent research designs for making causal investigators have little control over how cases
statements about intervention eVects. This are assigned to treatment arms, or an appropri-
section also contains a description of a rigorous ate comparison group simply cannot be assem-
quasi-experimental design that can often be bled. Unfortunately, in quality improvement
employed in quality improvement studies research this often leads investigators to use the
instead of less rigorous single group pre-test/ single group pre-test/post-test design or one
post-test designs. The second part of the paper closely related as the primary fall back design,
contains a comparison of the results of a resulting in a large decrease in the level of
re-evaluation of a quality improvement project internal validity.
conducted to reduce the rates of bilateral In contrast to randomised controlled de-
cardiac catheterisations with the results of the signs, single group pre-test/post-test designs
original evaluation. The original evaluation have comparatively low internal validity in part
used a single group pre-test/post-test design because they provide the investigator with no
while the quasi-experimental design described counterfactual—that is, no evidence regarding
in the first part of the paper is employed in the what would have happened in the absence of
re-evaluation. The comparison shows the value the intervention. In the absence of a relevant
of improving the rigour of the evaluation counterfactual, changes in the outcome meas-
research design for enhancing the causal inter- ure from pre-test to post-test can be attributed
pretability of improvement eVorts. to many factors other than the intervention.8
These factors include, but are not limited to,
such things as external events, cyclical varia-
Comparing the internal validity of tions in the outcome measure, and undiscov-
research designs: randomised trials, ered changes in the instrumentation employed
single group pre-test/post-test designs, to collect outcome. The fall oV in the level of
and quasi-experiments internal validity is wholly unnecessary in most
The primary purpose of research design is to instances, given the availability of several rigor-
provide investigators with evidence regarding ous quasi-experimental designs.5 6 8
the degree to which an intervention causes One family of quasi-experimental designs
measured outcomes. A causal relationship is with generally high levels of internal validity is
one in which four conditions are met: (1) a interrupted times series designs. Time series
measurable cause precedes a measurable eVect are data collected at equal intervals over time.
and the timing of the eVect is consistent with These data may be collected for any unit of
the nature of the mechanism behind the cause; analysis. In medicine, time series data might be
(2) the magnitude (including duration) of the collected to capture weekly medication order-
eVect is proportional to that of the cause; (3) ing errors in a hospital, to monitor variations in
the eVect is not present in the absence of the hourly temperature readings for a septic
action; and (4) all plausible competing expla- patient, or as monthly rates of bilateral cardiac
nations for the eVect can be ruled out. The catheterisations (as in the example presented
degree to which an investigator can infer cause below). Time series data can then be repre-
is reflected in the level of internal validity of the sented through graphical techniques and math-
research design. Designs with high internal ematical modelling in a way that permits the
validity provide the investigator with great con- identification of systematic (and potentially
fidence, ceteris paribus, that a manipulated causal) factors and non-systematic factors.5 9–11
variable—for example, in this context, the Interrupted time series, as the name implies,
activities of a quality improvement eVort— are time series during which an event occurs
caused the observed change in the outcome that is thought to “interrupt” the existing
variable(s).5 6 numerical variation in the series in some
Randomised controlled trials are widely rec- systematic manner, as when administration of
ognised as having high levels of internal validity the antibiotics (the event) interrupts the hourly
primarily due to the creation by the investigator variation in the temperature of the septic
of two or more mathematically equivalent patient. Following the event the series is
comparison groups.5–7 Group equivalence is expected to change in level, slope, and/or shape
achieved through the process of random based on some a priori understanding of the
assignment, assignment that assures that each mechanism through which the event causes
case has the same probability of being assigned change. An interrupted time series design can
to the two (or more) arms of the trial. When be depicted as follows with a commonly used
sample sizes are suYciently large, random notation of research design5 6:
assignment of cases to treatment arms provides O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11 O12 O13
great confidence before the implementation of O14 O15 O16 O17 O18
the intervention that the arms are equivalent on In this example the series is 18 equal interval
all known and unknown factors that might periods long. The “O”s depict the collection of
aVect the outcome measures employed in the data for each period (O for observation). The
trial. Hence, upon normal completion of the position of the “X” indicates that the event
trial investigators can have great confidence, occurred between periods 6 and 7.
quantifiable by statistical intervals and other Interrupted time series designs can be used
measures, that diVerences between the arms of to determine whether the empirical post-event
www.qualityhealthcare.com
Enhancing causal interpretations in quality improvement 181
www.qualityhealthcare.com
182 Cable
www.qualityhealthcare.com
Enhancing causal interpretations in quality improvement 183
120
100 100
100 98 96
94
90 89 90 91 90 91 89 91 91
89
93 93 88 89 88 89
% bilateral catheterisations
88 85 86
88 88 82
86 87
85 83 78
80 83 82 81 82 77
77 78 77 78
74
67
67 64
60 58
57
52 52
47
44
40 43
40 35 35 35
38 33 33 33
Lab A intervention (July) 34 36 32
32
Lab B intervention (Oct) 29 28 27 28
25 25
20
Lab C intervention (Dec)
0
Jan 91
Mar 91
May 91
Jul 91
Sep 91
Nov 91
Jan 92
Mar 92
May 92
Jul 92
Sep 92
Nov 92
Jan 93
Mar 93
May 93
Jul 93
Sep 93
Nov 93
Jan 94
Mar 94
May 94
Jul 94
Sep 94
Nov 94
Jan 95
Mar 95
May 95
Jul 95
Sep 95
Nov 95
Jan 96
Mar 96
May 96
Jul 96
Sep 96
Figure 1 Percentage bilateral catheterisations at laboratory A monthly from January 1991 to October 1996, with
delineation of pre-intervention and post-intervention periods for laboratory A, and laboratories B and C.
SAS ETS version 6.12. Graphics were pro- and 8 months before the July implementation
duced in Excel version 5.0. of the intervention in laboratory A. Figure 3
shows that the decline in bilateral catheterisa-
RESULTS OF THE RE-EVALUATION tion rates in laboratory C appear to start as
Monthly bilateral catheterisation data are early as June or July 1994 and level oV in the
depicted for laboratories A, B, and C in figs 1, subsequent months leading up to the interven-
2, and 3, respectively. Three vertical lines were tion in December. However, the declines
drawn in each figure to indicate the timing of appear to have resumed at an accelerated rate
the intervention in each laboratory. Laboratory following implementation of the intervention
A’s data (fig 1) reveal a steep fall in bilateral in laboratory C. At this point it is clear that the
catheterisation rates immediately following original evaluation could have used a simple
intervention, a decline that had no precedent plot of monthly bilateral catheterisation rates in
during the 42 month pre-intervention period in each laboratory to reveal systematic elements
this laboratory. Laboratory B’s data (fig 2) sug-
of variation that may have been present before
gest that the post-intervention declines in this
implementation of the quality improvement
laboratory are a continuation of the decrease in
bilateral catheterisation rates that began in activities. This alone would have significantly
about November 1993, 11 months before improved the internal validity of the original
implementation of the intervention in October evaluation design.
100
94
92
90 86 86 85 85
83 83 83 84
80 80 79 80 79
80 78 82 81 77
79
73 73 73 78 78 78 78 78 73
% bilateral catheterisations
76 76 70 71 70
74 69
70 72 66
70
68
66
68 67 68 60
60
60
Begin trend down (Nov 1993) 58 59 60
57 51
50 48 48 48
48
41 39 40
40
40
32
30 35
30 32 33
Lab A intervention (July)
28 28
27 27
25
20 Lab B intervention (Oct)
Lab C intervention (Dec)
10 11
0
Jan 91
Mar 91
May 91
Jul 91
Sep 91
Nov 91
Jan 92
Mar 92
May 92
Jul 92
Sep 92
Nov 92
Jan 93
Mar 93
May 93
Jul 93
Sep 93
Nov 93
Jan 94
Mar 94
May 94
Jul 94
Sep 94
Nov 94
Jan 95
Mar 95
May 95
Jul 95
Sep 95
Nov 95
Jan 96
Mar 96
May 96
Jul 96
Sep 96
Figure 2 Percentage bilateral catheterisations at laboratory B monthly from January 1991 to October 1996, with
delineation of pre-intervention and post-intervention periods for laboratory B, and laboratories A and C.
www.qualityhealthcare.com
184 Cable
% bilateral catheterisations
91 92
90 84
80 78
80 82 83 76
70 70
67
64
66
60 55 55
60 54
56
53 48
44 50 50
47 48
40 43 43
Lab A intervention (July) 38
0
Jan 91
Mar 91
May 91
Jul 91
Sep 91
Nov 91
Jan 92
Mar 92
May 92
Jul 92
Sep 92
Nov 92
Jan 93
Mar 93
May 93
Jul 93
Sep 93
Nov 93
Jan 94
Mar 94
May 94
Jul 94
Sep 94
Nov 94
Jan 95
Mar 95
May 95
Jul 95
Sep 95
Nov 95
Jan 96
Mar 96
May 96
Jul 96
Sep 96
Figure 3 Percentage bilateral catheterisations at laboratory C monthly from January 1991 to October 1996, with
delineation of pre-intervention and post-intervention periods for laboratory C, and laboratories A and B.
Some evidence exists of a “local” history p=0.80). The data from laboratory C were also
threat to the internal validity of the re- modelled using a natural logarithm transfor-
evaluation in which an external event or events mation. The intervention at laboratory C was
has aVected bilateral catheterisation rates in associated with a mean percentage decrease of
only one of the laboratories.5 8 Specifically, in 0.03 per month (t=–4.81, p=0.02) in an
laboratory B, but not A or C, something other ARIMA (0,1,1) model without a constant
than the intervention appears to have initiated (Ljung–Box Q, p=0.99).
a long monotonic decline in rates beginning in Finally, laboratories A and C are in close
about November 1993 that continued through- geographical proximity to each other (a few
out the remainder of the period of study. This miles apart within the same city), raising the
could be in part the result of a change in the possibility that “cross talk” regarding labora-
composition of patients in the service area of tory A’s intervention occurred which began a
laboratory B (for example, a younger healthier de facto intervention in laboratory C before the
population), practice changes within the labo- implementation of the formal intervention in
ratory, or a change in how bilateral catheterisa- laboratory C. We therefore examined whether
tions were measured. Since we do not have the implementation of laboratory A’s interven-
these data for laboratory B, or any data for tion had a measurable eVect on laboratory C’s
laboratories near it, we are unable to determine bilateral catheterisation rates, independent of
whether this downward trend was something laboratory C’s formal intervention. Although
unique to laboratory B or the result of event(s) fig 3 provides some graphical evidence in sup-
common to other laboratories in proximity to port of this hypothesis, the evidence from the
it. Regardless of this, it is evident that the iden- Box–Tiao modelling process suggests that no
tification of this change in bilateral catheterisa- statistically significant decrease occurred in the
tion rates in laboratory B beginning in Novem- data in laboratory C as a result of the
ber 1993 was also not possible with the original implementation of the intervention in labora-
PRO research design. tory A. Moreover, no statistical evidence
The results of Box and Tiao modelling indi- existed that the October intervention in
cate that the intervention in laboratory A was laboratory B had an independent eVect on the
associated with a mean decrease in bilateral rates in laboratory C. Similarly, no evidence
catheterisations of almost 2% each month (t= was seen that the intervention in laboratory A
–0.40, p=0.04) in the post-intervention period had an independent eVect on bilateral cath-
as part of an ARIMA (0,1,1) model (Ljung– eterisation rates in laboratory B.
Box Q, p=0.95, that is, fail to reject the
hypothesis that model residuals are not auto- DISCUSSION OF RE-EVALUATION RESULTS
correlated, that is, the model fits the data well). Our results strongly suggest that the quality
The data from laboratory B were first trans- improvement interventions reduced bilateral
formed into natural logarithmic form to catheterisation rates in two of the three labora-
achieve homogeneity of variance in the series (a tories, laboratories A and C. The evidence
necessary prerequisite in Box–Jenkins mod- indicates that the intervention in laboratory A
els).10 11 The intervention in laboratory B was resulted in an immediate fall in bilateral
associated with a not statistically significant catheterisation rates that continued during the
and small percentage decrease of 0.0005 per period of study. The eVect of the intervention
month in the post-intervention period as part in laboratory C was somewhat delayed and
of an ARIMA (0,1,1) model (Ljung–Box Q, smaller in magnitude. Visual evidence suggests
www.qualityhealthcare.com
Enhancing causal interpretations in quality improvement 185
that near the end of the period of study the re-evaluation provides tangible evidence in
declines might have ceased in laboratory C. support of our position that more rigorous
Laboratory A was the laboratory that aug- designs can and should be more widely
mented the activities common to all of the employed to improve the causal interpretability
laboratory interventions by changing the pack- of quality improvement eVorts.
aging of the catheterisation trays, thus requir-
ing that cardiologists make special requests to Conclusions
catheterise both sides of the heart. The imme- This paper has attempted to make a case for
diate large magnitude of the eVect resulting improving the rigor of evaluation designs used
from intervention in laboratory A suggests that in quality improvement projects in order to
this innovation may have increased the success enhance the causal interpretability of the
of the intervention. Perhaps of greater signifi- results. We have focused on a particular
cance for our purposes is the fact that the suc- interrupted time series design—switching
cess of the change in packaging of catheterisa- replications—as one time series design that can
tion trays could not have been discovered using be used in many contexts in place of designs
the original single group pre-test/post-test with lower internal validity. We recognise that
design. Hence, the importance of using an in some contexts a switching replications
appropriately rigorous research design is design is not feasible because the intervention
brought into greater relief. cannot be implemented in a second group.
Our findings contrast clearly with those of When working under this constraint, single
the original evaluation which described large group interrupted time series will still provide
numerical decreases “following” the quality greater ability to make causal inferences than
improvement eVorts—that is, between 1993 the seemingly ubiquitous single group pre-test/
and 1995—in all of the catheterisation labora- post-test design.5 6
tories.14 The original analysis did not, however, Two especially compelling interrupted time
tell the entire story of the quality improvement series alternatives to a single group pre-test/
eVorts. Rates did, indeed, fall in all the labora- post-test design are the single group inter-
tories between 1993 and 1995, but more rigor- rupted design with a removed treatment and
ous evidence from the re-evaluation suggests the single group interrupted design with a
that the improvement eVorts were eYcacious non-equivalent dependent variable.5 The re-
in only two of the three laboratories. moved treatment design can be used when the
The design employed in our re-evaluation investigators have the ability to institute the
therefore provides demonstrably greater confi- intervention and then subsequently remove it,
dence in the causal eVect of the improvement preferably at a randomly selected post-
eVorts in laboratories A and C, as well as intervention period.5 8 An example of the
evidence regarding the magnitude and dura- design is depicted below for a time series of 18
tion of the eVects (through the Box and Tiao periods:
models). Moreover, in contrast to the findings O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11 O12 R
of the original evaluation, the switching replica- O13 O14 O15 O16 O17 O18
tions design provides greater evidence that the Again, “X” indicates the timing of the
intervention in laboratory B was not eVective. implementation of the quality improvement
Potentially undetected external events intervention. “R” depicts the timing of the
threaten our ability to attribute cause with even removal of the intervention. The design
greater confidence to the interventions in labo- increases causal interpretability over a simple
ratories A and C. Specifically, there may have interrupted time series by providing the inves-
been events external to the quality improve- tigator with a more powerful test of both the
ment intervention that occurred ahead of, or at presence and absence of the intervention.5
the same time as, the implementation of the The interrupted time series design with a
improvement activities in laboratories A and C non-equivalent dependent variable can be
that can explain post-intervention changes in depicted as follows:
the series. For example, in March 1994 the US Outcome measure: O1 O2 O3 O4 O5 O6 X O7
government’s Agency for Health Care Policy O8 O9 O10 O11 O12 O13 O14 O15 O16 O17 O18
and Research and National Heart, Lung, and Related measure: O1 O2 O3 O4 O5 O6 X O7 O8
Blood Institute jointly released practice guide- O9 O10 O11 O12 O13 O14 O15 O16 O17 O18
lines for the diagnosis and management of This is a single group interrupted time series
unstable angina.17 The guideline contains design because both series are data collected
recommendations regarding the use of cardiac from the same unit of analysis—for example,
catheterisation. We cannot therefore rule out floor, patient, hospital. The first series are data
the possibility that some part of the decline in for the outcome measure expected to be
bilateral catheterisation rates in both laborato- aVected by the quality improvement interven-
ries can be attributed to the momentum for tion. The second is a time series coeval with the
change in practice created by the release of first that normally varies in the same way as the
these guidelines. However, this threat also outcome measure, but is not expected to
would have aVected the original evaluation. change after the intervention. Thus, if the
Moreover, the single group pre-test/post-test improvement eVort works as designed, investi-
design used in the original evaluation is subject gators would expect to see changes only in the
to a panoply of additional threats.5 6 8 In sum, post-intervention series of the outcome meas-
we believe our evaluation research design has ure.
radically improved the accuracy and the causal These variants of the interrupted times series
interpretability of the findings. Moreover, the design provide powerful alternatives to single
www.qualityhealthcare.com
186 Cable
group pre-test/post-test designs. Given the near 1 Smith S, Freeland M, HeZer S, et al. The next ten years of
health spending: what does the future hold? The Health
chronic paucity of resources available to Expenditures Projection Team. Health AVairs
provide care and conduct research, it is 1998;17:128–40.
imperative that quality improvement projects 2 Iglehart J. The American health care system—expenditures.
N Engl J Med 1999;340:70–6.
are able to demonstrate their eVectiveness.1 2 It 3 Kuttner R. The American health care system—employer-
is not suYcient to show that measures changed sponsored health coverage. N Engl J Med 1999;340:248–
for the better “following” interventions and 52.
4 Kuttner R. The American health care system—Wall Street
then to assume that the change was caused by and health care. N Engl J Med 1999;340:664–8.
the intervention, post hoc, ergo proptor hoc. The 5 Cook TD, Campbell DT. Quasi-experimentation: design and
presence of favourable post-intervention analysis issues for field settings. Boston: Houghton MiZin,
1979.
changes in outcome measures may lead an 6 Campbell DT, Stanley JC. Experimental and quasi-
organisation to believe that the quality im- experimental designs for research. Boston: Houghton MiZin,
1963.
provement eVort was successful, regardless of 7 Meinert C. Clinical trials: design, conduct and analysis. New
the internal validity of the evaluation design, York: Oxford University Press, 1986.
and to continue to expend resources on the 8 Mohr L. Impact analysis for program evaluation. Chicago: The
Dorsey Press, 1988.
“successful” quality improvement programme. 9 Tukey J. Exploratory data analysis. Reading, MA: Addison-
As long as outcomes are favourable, the Wesley, 1977.
inference that the improvement eVort worked 10 Box GEP, Tiao GC. Interventions analysis with applications
to economic and environmental problems. J Am Stat Assoc
will have no real consequences. If outcomes 1975;70:70–9.
deteriorate, however, no mechanism would 11 Pankratz A. Forecasting with dynamic regression models. New
exist to identify the causes of the deterioration York: John Wiley and Sons, 1983.
12 Bodenheimer T. The American health care system: the
since the original evaluation design was ill movement for improved quality in health care. N Engl J
suited to determine whether the intervention Med 1999;340:488–92.
initially worked. 13 Jencks SF, Wilensky GR. The health care quality improve-
ment initiative: a new approach to quality assurance in
Evaluation designs for quality improvement Medicare. JAMA 1992;268:900–3.
projects should be constructed to provide a 14 New Jersey Peer Review Organization (PRO). Reducing the
use of combined right/left heart catheterisation in Medicare
reasonable opportunity, given available time patients with uncomplicated coronary heart disease. New
and resources, for causal interpretation of the Jersey: PRO of New Jersey, 1997.
results. Evaluators of quality improvement ini- 15 Pepine CJ, Allen HD, Bashore TM, et al. American College
of Cardiology/American Heart Association guidelines for
tiatives may infrequently have access to ran- cardiac catheterization and cardiac catheterization labora-
domised designs. Nonetheless, as we have tories. Ad Hoc Task Force on Cardiac Catheterization. Cir-
culation 1991;84:2213–47.
shown here, other very rigorous research 16 Ljung GM, Box GEP. On a measure of lack of fit in time
designs are available for improving causal series models. Biometrika 1978;65:297–303.
interpretability. Unilateral methodological sur- 17 Agency for Health Care Policy and Research/ National
Heart, Lung and Blood Institute. Unstable angina: diagnosis
render need not be the only alternative to ran- and management. Clinical practice guideline. Washington:
domised experiments. Public Health Service, 1994.
www.qualityhealthcare.com