Enhancing Causal Interpretations of Quality Improvement Interventions

Quality in Health Care 2001;10:179–186 179
Viewpoint
Enhancing causal interpretations of quality

improvement interventions
G Cable
Abstract
In an era of chronic resource scarcity it is Key messages
critical that quality improvement profes- + In an era of chronic resource scarcity it is
sionals have confidence that their project critical that quality improvement profes-
activities cause measured change. A com- sionals have confidence that their project
monly used research design, the single activities cause measured change.
group pre-test/post-test design, provides + A commonly used research design, the
little insight into whether quality im- single group pre-test/post-test design,
provement interventions cause measured provides little insight into whether quality
outcomes. A re-evaluation of a quality improvement interventions cause
improvement programme designed to re- measured outcomes.
duce the percentage of bilateral cardiac + Many other quasi-experimental designs
catheterisations for the period from Janu- can be employed in most contexts instead
ary 1991 to October 1996 in three cath- of single group pre-test/post-test designs,
eterisation laboratories in a north eastern and provide much greater causal inter-
state in the USA was performed using an pretability of the findings of the quality
interrupted time series design with improvement project evaluation. One of
switching replications. The accuracy and the most powerful of these is interrupted
causal interpretability of the findings were time series designs.
considerably improved compared with the
original evaluation design. Moreover, the Implications of these findings for
re-evaluation provided tangible evidence quality improvement
in support of the suggestion that more The adoption of more rigorous research
rigorous designs can and should be more designs to evaluate quality improvement
widely employed to improve the causal eVorts will enable quality improvement pro-
interpretability of quality improvement fessionals to know with greater confidence
eVorts. Evaluation designs for quality whether the eVorts actually work. In turn,
improvement projects should be con- projects determined to be successful based
structed to provide a reasonable oppor- on this more compelling evidence can more
tunity, given available time and resources, confidently be diVused as best practices. A
for causal interpretation of the results. latent benefit of improving the level of
Evaluators of quality improvement initia- rigour in the evaluation research design is
tives may infrequently have access to ran- that the perception of quality improvement
domised designs. Nonetheless, as shown projects as “scientific” will be enhanced.
here, other very rigorous research designs
are available for improving causal inter-
pretability. Unilateral methodological
surrender need not be the only alternative improvement project should include an evalua-
to randomised experiments. tion research design that permits rigorous test-
(Quality in Health Care 2001;10:179–186) ing of the extent to which improvement eVorts
actually cause measured change. The most
Keywords: causal interpretations; quality improvement; commonly employed research designs—single
The Quality Institute,
interrupted time series design; implementation fidelity
Atlantic Health group pre-test/post-test designs and one shot
System, 325 Columbia case studies (that is, a single group design with
Turnpike, Florham Actuating beneficial change is the raison d’être a post-test only)—provide little evidence of the
Park, NJ 07932, USA
G Cable, director of
of quality improvement eVorts. In an era when causal impact of improvement activities.5 Con-
research private entities and governments are increas- sequently, methodological rigour is often un-
ingly less willing to pay for care, quality necessarily sacrificed when designs are avail-
Correspondence to: improvement initiatives must additionally be able which can improve the causal
Dr G Cable able to demonstrate superiority over competing interpretability of the results. The purpose of
drcableg@hotmail.com
strategies, or the option of doing nothing at this paper is to make a case for improving the
Accepted 30 March 2001 all.1–4 In the current milieu, every major quality rigour of research designs in order to enhance
www.qualityhealthcare.com
180 Cable
the causal interpretability of quality improve- the trial are due largely to variables they them-
ment eVorts. selves have manipulated.5–7 In many contexts,
The first section of the paper provides back- however, it is not feasible to design and imple-
ground regarding the comparative virtues of ment a randomised controlled trial because
diVerent research designs for making causal investigators have little control over how cases
statements about intervention eVects. This are assigned to treatment arms, or an appropri-
section also contains a description of a rigorous ate comparison group simply cannot be assem-
quasi-experimental design that can often be bled. Unfortunately, in quality improvement
employed in quality improvement studies research this often leads investigators to use the
instead of less rigorous single group pre-test/ single group pre-test/post-test design or one
post-test designs. The second part of the paper closely related as the primary fall back design,
contains a comparison of the results of a resulting in a large decrease in the level of
re-evaluation of a quality improvement project internal validity.
conducted to reduce the rates of bilateral In contrast to randomised controlled de-
cardiac catheterisations with the results of the signs, single group pre-test/post-test designs
original evaluation. The original evaluation have comparatively low internal validity in part
used a single group pre-test/post-test design because they provide the investigator with no
while the quasi-experimental design described counterfactual—that is, no evidence regarding
in the first part of the paper is employed in the what would have happened in the absence of
re-evaluation. The comparison shows the value the intervention. In the absence of a relevant
of improving the rigour of the evaluation counterfactual, changes in the outcome meas-
research design for enhancing the causal inter- ure from pre-test to post-test can be attributed
pretability of improvement eVorts. to many factors other than the intervention.8
These factors include, but are not limited to,
such things as external events, cyclical varia-
Comparing the internal validity of tions in the outcome measure, and undiscov-
research designs: randomised trials, ered changes in the instrumentation employed
single group pre-test/post-test designs, to collect outcome. The fall oV in the level of
and quasi-experiments internal validity is wholly unnecessary in most
The primary purpose of research design is to instances, given the availability of several rigor-
provide investigators with evidence regarding ous quasi-experimental designs.5 6 8
the degree to which an intervention causes One family of quasi-experimental designs
measured outcomes. A causal relationship is with generally high levels of internal validity is
one in which four conditions are met: (1) a interrupted times series designs. Time series
measurable cause precedes a measurable eVect are data collected at equal intervals over time.
and the timing of the eVect is consistent with These data may be collected for any unit of
the nature of the mechanism behind the cause; analysis. In medicine, time series data might be
(2) the magnitude (including duration) of the collected to capture weekly medication order-
eVect is proportional to that of the cause; (3) ing errors in a hospital, to monitor variations in
the eVect is not present in the absence of the hourly temperature readings for a septic
action; and (4) all plausible competing expla- patient, or as monthly rates of bilateral cardiac
nations for the eVect can be ruled out. The catheterisations (as in the example presented
degree to which an investigator can infer cause below). Time series data can then be repre-
is reflected in the level of internal validity of the sented through graphical techniques and math-
research design. Designs with high internal ematical modelling in a way that permits the
validity provide the investigator with great con- identification of systematic (and potentially
fidence, ceteris paribus, that a manipulated causal) factors and non-systematic factors.5 9–11
variable—for example, in this context, the Interrupted time series, as the name implies,
activities of a quality improvement eVort— are time series during which an event occurs
caused the observed change in the outcome that is thought to “interrupt” the existing
variable(s).5 6 numerical variation in the series in some
Randomised controlled trials are widely rec- systematic manner, as when administration of
ognised as having high levels of internal validity the antibiotics (the event) interrupts the hourly
primarily due to the creation by the investigator variation in the temperature of the septic
of two or more mathematically equivalent patient. Following the event the series is
comparison groups.5–7 Group equivalence is expected to change in level, slope, and/or shape
achieved through the process of random based on some a priori understanding of the
assignment, assignment that assures that each mechanism through which the event causes
case has the same probability of being assigned change. An interrupted time series design can
to the two (or more) arms of the trial. When be depicted as follows with a commonly used
sample sizes are suYciently large, random notation of research design5 6:
assignment of cases to treatment arms provides O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11 O12 O13
great confidence before the implementation of O14 O15 O16 O17 O18
the intervention that the arms are equivalent on In this example the series is 18 equal interval
all known and unknown factors that might periods long. The “O”s depict the collection of
aVect the outcome measures employed in the data for each period (O for observation). The
trial. Hence, upon normal completion of the position of the “X” indicates that the event
trial investigators can have great confidence, occurred between periods 6 and 7.
quantifiable by statistical intervals and other Interrupted time series designs can be used
measures, that diVerences between the arms of to determine whether the empirical post-event
Enhancing causal interpretations in quality improvement 181
series is diVerent in some systematic fashion

from the series before the event. Powerful In 1991 the American College of Cardiol-
mathematical modelling techniques have been ogy and the American Heart Association
developed to determine the eVect of the event published guidelines regarding cardiac cath-
eterisation and catheterisation laboratories
on the series, independent of other systematic
in response to the increases in the numbers
factors such as other discrete events and
of catheterisations being performed,
seasonal or cyclical variations in the series.10 11
changes in the reason why they were being
The net eVect is a design that provides much
performed, and the setting in which they
greater confidence than single group pre-test/
were being performed—that is, other than in
post-test designs that the event “interrupting” hospital based laboratories. In addition to
the series actually caused post-event changes in clinical practice issues, the guidelines ad-
the series. dressed ethical concerns including patient
An especially powerful interrupted time safety, conflicts of interest related to owner-
series design is one with “switching replica- ship, operation, self-referrals, and advertis-
tions” in which time series of identical length ing of services.
are assembled for two or more non-equivalent
(that is, not the product of random assignment) Box 1 Rationale behind new guidelines for right
but nonetheless similar groups—for example, heart catheterisations.
two hospitals in the same area. Each series in
the design will have experienced the event of single group pre-test/post-test design used in
interest, but at diVerent periods in the series. the original evaluation.
The switching replications design for two
groups in which the event has occurred is Comparison of switching replications
depicted as follows: interrupted time series design with a
group 1: O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 single group pre-test and post-test
design: PRO Cardiac Catheterisation
O11 O12 O13 O14 O15 O16 O17 O18
Project
group 2: O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11
BACKGROUND TO THE ORIGINAL PROJECT
O12 X O13 O14 O15 O16 O17 O18
Since the implementation of a programme
In this example the event occurred between
developed by the Health Care Finance Admin-
periods 6 and 7 in the first group under study
istration in the mid 1990s to improve care pro-
and between periods 12 and 13 in the second vided to Medicare patients, medical peer
group. Each series can serve as a review organisations (PROs) have completed
comparison—a counterfactual—for the other, hundreds of improvement projects.12 13 In 1994
because the series are coeval. As a result, the a PRO in a north eastern state in the USA ini-
internal validity of this design greatly exceeds tiated a quality improvement project designed
that of single group pre-test/post-test designs as to reduce the use of bilateral cardiac catheteri-
well as that of a single group interrupted time sations in the Medicare population (aged 65
series.5 8 and over) within a few participating catheteri-
When a study uses concomitant time series sation laboratories in the state.14 The project
in which the events occur at diVerent periods as was initiated in response to new guidelines
in the switching replications design, research- which held that right heart catheterisations are
ers can more readily detect the potential pres- unnecessary for diagnostic purposes in the
ence of “history” threats to internal validity. absence of specific clinical indications (box
History threats are events or processes other 1).15
than the event of interest that can aVect the The intervention for each laboratory in-
variation of the outcome measure(s) under cluded the following components: (1) a policy
study. For this reason, history threats to change requiring documentation to perform a
internal validity are sometimes referred to as bilateral catheterisation; (2) staV education
“external” events.8 History threats often go regarding the policy change; (3) a multidiscipli-
undetected and potentially confound the inves- nary approach to the development, implemen-
tigator’s ability to imply cause from the eVect of tation, and assessment of the improvement
the interventions.5 6 8 However, when the de- plan; and (4) development of plans of action to
sign includes at least two series as in a switch- address policy non-compliance (box 2).14
ing replications design, a “common” history In the original analysis the PRO used a single
threat will probably register as a change that group pre-test/post-test design to evaluate the
occurs at similar periods, and of similar magni- success of the improvement eVorts in each
tude and duration in all series. Hence, the laboratory. The original evaluation measured
threat posed by the external event to the causal change from a pre-intervention period from
interpretability of the intervention eVect is January to December 1993 (that is, one data
essentially neutralised because the investigator point for the pre-intervention period) to each
can detect whether and when it is present in quarter of 1995 and the 1995 period as a whole
more than one series.5 8 (one data point for the post-intervention
In the next section the switching replications period).14 No data were analysed from 1994,
design is used to re-evaluate a quality improve- the year in which interventions were imple-
ment eVort implemented in the mid 1990s in mented. This design can be depicted for each
one state in the USA. The re-evaluation shows laboratory as:
how the use of this design provides greater O1993 X1994 O1995where the “O”s represent
insight into the extent to which improvement measurement periods and “X” the timing of
eVorts caused the measured outcomes than the the intervention. Hence, it is clear from the
182 Cable
three laboratories. In this re-evaluation,

+ A policy change requiring documenta- monthly bilateral catheterisation rates were
tion to perform a cardiac catheterisation analysed for the period from January 1991 to
on both sides of the heart including a October 1996, 70 consecutive months of bilat-
checklist of the indications for the eral catheterisation rates for each laboratory.
bilateral catheterisation to be part of the The data were assembled from Medicare
patient’s medical record discharge databases, the same source of data as
+ StaV in-service education regarding the was used in the original evaluation.14 This
policy change
extended duration was chosen to provide suY-
+ A multidisciplinary approach to the
ciently long pre-intervention and post-
development, implementation, and as-
intervention periods to examine the impact of
sessment of the improvement plan—
the interventions after controlling for poten-
specifically, inclusion of a representative
tially confounding systematic variation (dis-
from all professions potentially aVected
cussed in greater detail below). To maintain
by the plan
confidentiality, the three laboratories are re-
+ A mechanism for monitoring the imple-
ferred to hereafter as laboratory A, B and C,
mentation of the plan
respectively. Laboratory A implemented their
+ Development of plans of action to
address policy non-compliance (for ex- intervention in July 1994, laboratory B in
ample, what the administrative steps October, and laboratory C in December,
would be should a physician refuse to respectively. Our re-evaluation design can be
comply with policy changes) depicted as follows:
Laboratory A: O91, Jan O. O. O. O. O94, Jul X O94,
Aug O. O. O. O. O. O. O. O. O. O. O96, Oct
Box 2 Elements common to all laboratory
Laboratory B: O91, Jan O. O. O. O. O. O. O. O.
interventions.
O94, Oct X O94, Nov O. O. O. O. O. O96, Oct
Laboratory C: O91, Jan O. O. O. O. O. O. O. O.
notation depicting this design that temporal
O. O. O. O94, Dec X O95, Jan. O. O. O96, Oct .
variation in bilateral catheterisations went
It should be apparent that the switching rep-
unmeasured at intervals within the pre-
lications design permits a more rigorous test
intervention period, which included all of 1993
than the original design of the possibility that
up to the point when interventions were imple-
the implementation of laboratory A’s interven-
mented in 1994 in each laboratory. The omis-
tion in July independently aVected the bilateral
sion of information could have provided insight
catheterisation rates for laboratories B and C
into seasonal or cyclical variation in bilateral
catheterisation rates, as well as trending before implementation of their interventions.
evidence. By extension, this design also facilitates the
The results of the original evaluation indi- evaluation of whether laboratory C’s rates were
cated that bilateral catheterisation rates de- independently aVected by the implementation
clined in each laboratory in 1995 from baseline of laboratory A’s and/or laboratory B’s inter-
levels of 1993. In laboratory A the decline from vention.
1993 to 1995 was from 87.9% to 41%, in labo-
ratory B rates fell from 82% to 40.4%, and in STATISTICAL ANALYSIS
laboratory C the decline was from 97.9% to The quantitative impact of individual interven-
58.1%.14 The use of this single group pre-test tions was determined using Box and Tiao’s
and post-test design in which no data were col- methods. Based on Box and Jenkin’s autore-
lected during 1994, the year in which the gressive integrated moving average (ARIMA)
projects were implemented in each laboratory, models, Box–Tiao methods allow the model-
makes it diYcult to assess the nature of the ling of systematic structures in each laborato-
impact of the individual interventions. More ry’s series, including potentially confounding
importantly, this design is poorly equipped to seasonal and cyclical eVects.5 10 11 In other
assess whether the improvement eVorts caused words, this part of the modelling process
the measured changes. Moreover, the design permits the analyst to filter out the potentially
ignores the potentially unique eVect on rates in confounding eVects of systematic variation in
one laboratory in which it was decided to aug- time series in order to determine the magni-
ment the project activities common to all labo- tude and structure of the eVect of the interven-
ratory interventions by changing the packaging tions. After first constructing ARIMA models
of the catheterisation trays. This component of for each laboratory, variables representing the
the intervention required the cardiologist to timing of the impact of the intervention in each
make a specific request in order to catheterise laboratory were added. Three possible forms of
both sides of the heart which compelled him or the impact were tested in each laboratory: (1)
her to provide the clinical justification to cath- an abrupt permanent eVect; (2) a gradual per-
eterise the second side of the heart—for exam- manent eVect; and (3) an abrupt temporary
ple, pulmonary hypertension, mitral or aortic eVect. Statistical hypotheses were tested at
valve disease.14 á=0.05. Ljung–Box’s Q statistic was employed
as a goodness of fit measure for the models of
METHODS USED TO RE-EVALUATE THE PRO each laboratory. The statistic tests the null
PROJECT hypothesis that model residuals are not auto-
An interrupted time series design with switch- correlated. Thus, failure to reject this hypoth-
ing replications was employed to assess the esis indicates that the empirical model fits the
impact of the individual interventions in the data well.16 Analyses were conducted using
120
100 100
100 98 96
94
90 89 90 91 90 91 89 91 91
89
93 93 88 89 88 89
% bilateral catheterisations
88 85 86
88 88 82
86 87
85 83 78
80 83 82 81 82 77
77 78 77 78
74
67
67 64
60 58
57
52 52
47
44
40 43
40 35 35 35
38 33 33 33
Lab A intervention (July) 34 36 32
32
Lab B intervention (Oct) 29 28 27 28
25 25
20
Lab C intervention (Dec)
0
Jan 91
Mar 91
May 91
Jul 91
Sep 91
Nov 91
Jan 92
Mar 92
May 92
Jul 92
Sep 92
Nov 92
Jan 93
Mar 93
May 93
Jul 93
Sep 93
Nov 93
Jan 94
Mar 94
May 94
Jul 94
Sep 94
Nov 94
Jan 95
Mar 95
May 95
Jul 95
Sep 95
Nov 95
Jan 96
Mar 96
May 96
Jul 96
Sep 96
Figure 1 Percentage bilateral catheterisations at laboratory A monthly from January 1991 to October 1996, with
delineation of pre-intervention and post-intervention periods for laboratory A, and laboratories B and C.
SAS ETS version 6.12. Graphics were pro- and 8 months before the July implementation
duced in Excel version 5.0. of the intervention in laboratory A. Figure 3
shows that the decline in bilateral catheterisa-
RESULTS OF THE RE-EVALUATION tion rates in laboratory C appear to start as
Monthly bilateral catheterisation data are early as June or July 1994 and level oV in the
depicted for laboratories A, B, and C in figs 1, subsequent months leading up to the interven-
2, and 3, respectively. Three vertical lines were tion in December. However, the declines
drawn in each figure to indicate the timing of appear to have resumed at an accelerated rate
the intervention in each laboratory. Laboratory following implementation of the intervention
A’s data (fig 1) reveal a steep fall in bilateral in laboratory C. At this point it is clear that the
catheterisation rates immediately following original evaluation could have used a simple
intervention, a decline that had no precedent plot of monthly bilateral catheterisation rates in
during the 42 month pre-intervention period in each laboratory to reveal systematic elements
this laboratory. Laboratory B’s data (fig 2) sug-
of variation that may have been present before
gest that the post-intervention declines in this
implementation of the quality improvement
laboratory are a continuation of the decrease in
bilateral catheterisation rates that began in activities. This alone would have significantly
about November 1993, 11 months before improved the internal validity of the original
implementation of the intervention in October evaluation design.
100
94
92
90 86 86 85 85
83 83 83 84
80 80 79 80 79
80 78 82 81 77
79
73 73 73 78 78 78 78 78 73
76 76 70 71 70
74 69
70 72 66
70
68
66
68 67 68 60
60
60
Begin trend down (Nov 1993) 58 59 60
57 51
50 48 48 48
48
41 39 40
40
40
32
30 35
30 32 33
Lab A intervention (July)
28 28
27 27
25
20 Lab B intervention (Oct)
10 11
0
Jan 91
Mar 91
May 91
Jul 91
Sep 91
Nov 91
Jan 92
Mar 92
May 92
Jul 92
Sep 92
Nov 92
Jan 93
Mar 93
May 93
Jul 93
Sep 93
Nov 93
Jan 94
Mar 94
May 94
Jul 94
Sep 94
Nov 94
Jan 95
Mar 95
May 95
Jul 95
Sep 95
Nov 95
Jan 96
Mar 96
May 96
Jul 96
Sep 96
Figure 2 Percentage bilateral catheterisations at laboratory B monthly from January 1991 to October 1996, with
delineation of pre-intervention and post-intervention periods for laboratory B, and laboratories A and C.
184 Cable
120 Pre-intervention decline begins (June 1994)
100 100 100 100 100 100

100 100 100 100 100 100 100 100 100 100
100 96 95 97 97 97 96 97 97
94 94
96 97 96 96
97 97
96
97 97 97
93 93 93 93 88
91 92
90 84
80 78
80 82 83 76
70 70
67
64
66
60 55 55
60 54
56
53 48
44 50 50
47 48
40 43 43
Lab A intervention (July) 38
Lab B intervention (Oct)

20
0
Jan 91
Mar 91
May 91
Jul 91
Sep 91
Nov 91
Jan 92
Mar 92
May 92
Jul 92
Sep 92
Nov 92
Jan 93
Mar 93
May 93
Jul 93
Sep 93
Nov 93
Jan 94
Mar 94
May 94
Jul 94
Sep 94
Nov 94
Jan 95
Mar 95
May 95
Jul 95
Sep 95
Nov 95
Jan 96
Mar 96
May 96
Jul 96
Sep 96
Figure 3 Percentage bilateral catheterisations at laboratory C monthly from January 1991 to October 1996, with
delineation of pre-intervention and post-intervention periods for laboratory C, and laboratories A and B.
Some evidence exists of a “local” history p=0.80). The data from laboratory C were also
threat to the internal validity of the re- modelled using a natural logarithm transfor-
evaluation in which an external event or events mation. The intervention at laboratory C was
has aVected bilateral catheterisation rates in associated with a mean percentage decrease of
only one of the laboratories.5 8 Specifically, in 0.03 per month (t=–4.81, p=0.02) in an
laboratory B, but not A or C, something other ARIMA (0,1,1) model without a constant
than the intervention appears to have initiated (Ljung–Box Q, p=0.99).
a long monotonic decline in rates beginning in Finally, laboratories A and C are in close
about November 1993 that continued through- geographical proximity to each other (a few
out the remainder of the period of study. This miles apart within the same city), raising the
could be in part the result of a change in the possibility that “cross talk” regarding labora-
composition of patients in the service area of tory A’s intervention occurred which began a
laboratory B (for example, a younger healthier de facto intervention in laboratory C before the
population), practice changes within the labo- implementation of the formal intervention in
ratory, or a change in how bilateral catheterisa- laboratory C. We therefore examined whether
tions were measured. Since we do not have the implementation of laboratory A’s interven-
these data for laboratory B, or any data for tion had a measurable eVect on laboratory C’s
laboratories near it, we are unable to determine bilateral catheterisation rates, independent of
whether this downward trend was something laboratory C’s formal intervention. Although
unique to laboratory B or the result of event(s) fig 3 provides some graphical evidence in sup-
common to other laboratories in proximity to port of this hypothesis, the evidence from the
it. Regardless of this, it is evident that the iden- Box–Tiao modelling process suggests that no
tification of this change in bilateral catheterisa- statistically significant decrease occurred in the
tion rates in laboratory B beginning in Novem- data in laboratory C as a result of the
ber 1993 was also not possible with the original implementation of the intervention in labora-
PRO research design. tory A. Moreover, no statistical evidence
The results of Box and Tiao modelling indi- existed that the October intervention in
cate that the intervention in laboratory A was laboratory B had an independent eVect on the
associated with a mean decrease in bilateral rates in laboratory C. Similarly, no evidence
catheterisations of almost 2% each month (t= was seen that the intervention in laboratory A
–0.40, p=0.04) in the post-intervention period had an independent eVect on bilateral cath-
as part of an ARIMA (0,1,1) model (Ljung– eterisation rates in laboratory B.
Box Q, p=0.95, that is, fail to reject the
hypothesis that model residuals are not auto- DISCUSSION OF RE-EVALUATION RESULTS
correlated, that is, the model fits the data well). Our results strongly suggest that the quality
The data from laboratory B were first trans- improvement interventions reduced bilateral
formed into natural logarithmic form to catheterisation rates in two of the three labora-
achieve homogeneity of variance in the series (a tories, laboratories A and C. The evidence
necessary prerequisite in Box–Jenkins mod- indicates that the intervention in laboratory A
els).10 11 The intervention in laboratory B was resulted in an immediate fall in bilateral
associated with a not statistically significant catheterisation rates that continued during the
and small percentage decrease of 0.0005 per period of study. The eVect of the intervention
month in the post-intervention period as part in laboratory C was somewhat delayed and
of an ARIMA (0,1,1) model (Ljung–Box Q, smaller in magnitude. Visual evidence suggests
that near the end of the period of study the re-evaluation provides tangible evidence in
declines might have ceased in laboratory C. support of our position that more rigorous
Laboratory A was the laboratory that aug- designs can and should be more widely
mented the activities common to all of the employed to improve the causal interpretability
laboratory interventions by changing the pack- of quality improvement eVorts.
aging of the catheterisation trays, thus requir-
ing that cardiologists make special requests to Conclusions
catheterise both sides of the heart. The imme- This paper has attempted to make a case for
diate large magnitude of the eVect resulting improving the rigor of evaluation designs used
from intervention in laboratory A suggests that in quality improvement projects in order to
this innovation may have increased the success enhance the causal interpretability of the
of the intervention. Perhaps of greater signifi- results. We have focused on a particular
cance for our purposes is the fact that the suc- interrupted time series design—switching
cess of the change in packaging of catheterisa- replications—as one time series design that can
tion trays could not have been discovered using be used in many contexts in place of designs
the original single group pre-test/post-test with lower internal validity. We recognise that
design. Hence, the importance of using an in some contexts a switching replications
appropriately rigorous research design is design is not feasible because the intervention
brought into greater relief. cannot be implemented in a second group.
Our findings contrast clearly with those of When working under this constraint, single
the original evaluation which described large group interrupted time series will still provide
numerical decreases “following” the quality greater ability to make causal inferences than
improvement eVorts—that is, between 1993 the seemingly ubiquitous single group pre-test/
and 1995—in all of the catheterisation labora- post-test design.5 6
tories.14 The original analysis did not, however, Two especially compelling interrupted time
tell the entire story of the quality improvement series alternatives to a single group pre-test/
eVorts. Rates did, indeed, fall in all the labora- post-test design are the single group inter-
tories between 1993 and 1995, but more rigor- rupted design with a removed treatment and
ous evidence from the re-evaluation suggests the single group interrupted design with a
that the improvement eVorts were eYcacious non-equivalent dependent variable.5 The re-
in only two of the three laboratories. moved treatment design can be used when the
The design employed in our re-evaluation investigators have the ability to institute the
therefore provides demonstrably greater confi- intervention and then subsequently remove it,
dence in the causal eVect of the improvement preferably at a randomly selected post-
eVorts in laboratories A and C, as well as intervention period.5 8 An example of the
evidence regarding the magnitude and dura- design is depicted below for a time series of 18
tion of the eVects (through the Box and Tiao periods:
models). Moreover, in contrast to the findings O1 O2 O3 O4 O5 O6 X O7 O8 O9 O10 O11 O12 R
of the original evaluation, the switching replica- O13 O14 O15 O16 O17 O18
tions design provides greater evidence that the Again, “X” indicates the timing of the
intervention in laboratory B was not eVective. implementation of the quality improvement
Potentially undetected external events intervention. “R” depicts the timing of the
threaten our ability to attribute cause with even removal of the intervention. The design
greater confidence to the interventions in labo- increases causal interpretability over a simple
ratories A and C. Specifically, there may have interrupted time series by providing the inves-
been events external to the quality improve- tigator with a more powerful test of both the
ment intervention that occurred ahead of, or at presence and absence of the intervention.5
the same time as, the implementation of the The interrupted time series design with a
improvement activities in laboratories A and C non-equivalent dependent variable can be
that can explain post-intervention changes in depicted as follows:
the series. For example, in March 1994 the US Outcome measure: O1 O2 O3 O4 O5 O6 X O7
government’s Agency for Health Care Policy O8 O9 O10 O11 O12 O13 O14 O15 O16 O17 O18
and Research and National Heart, Lung, and Related measure: O1 O2 O3 O4 O5 O6 X O7 O8
Blood Institute jointly released practice guide- O9 O10 O11 O12 O13 O14 O15 O16 O17 O18
lines for the diagnosis and management of This is a single group interrupted time series
unstable angina.17 The guideline contains design because both series are data collected
recommendations regarding the use of cardiac from the same unit of analysis—for example,
catheterisation. We cannot therefore rule out floor, patient, hospital. The first series are data
the possibility that some part of the decline in for the outcome measure expected to be
bilateral catheterisation rates in both laborato- aVected by the quality improvement interven-
ries can be attributed to the momentum for tion. The second is a time series coeval with the
change in practice created by the release of first that normally varies in the same way as the
these guidelines. However, this threat also outcome measure, but is not expected to
would have aVected the original evaluation. change after the intervention. Thus, if the
Moreover, the single group pre-test/post-test improvement eVort works as designed, investi-
design used in the original evaluation is subject gators would expect to see changes only in the
to a panoply of additional threats.5 6 8 In sum, post-intervention series of the outcome meas-
we believe our evaluation research design has ure.
radically improved the accuracy and the causal These variants of the interrupted times series
interpretability of the findings. Moreover, the design provide powerful alternatives to single
186 Cable
group pre-test/post-test designs. Given the near 1 Smith S, Freeland M, HeZer S, et al. The next ten years of
health spending: what does the future hold? The Health
chronic paucity of resources available to Expenditures Projection Team. Health AVairs
provide care and conduct research, it is 1998;17:128–40.
imperative that quality improvement projects 2 Iglehart J. The American health care system—expenditures.
N Engl J Med 1999;340:70–6.
are able to demonstrate their eVectiveness.1 2 It 3 Kuttner R. The American health care system—employer-
is not suYcient to show that measures changed sponsored health coverage. N Engl J Med 1999;340:248–
for the better “following” interventions and 52.
4 Kuttner R. The American health care system—Wall Street
then to assume that the change was caused by and health care. N Engl J Med 1999;340:664–8.
the intervention, post hoc, ergo proptor hoc. The 5 Cook TD, Campbell DT. Quasi-experimentation: design and
presence of favourable post-intervention analysis issues for field settings. Boston: Houghton MiZin,
1979.
changes in outcome measures may lead an 6 Campbell DT, Stanley JC. Experimental and quasi-
organisation to believe that the quality im- experimental designs for research. Boston: Houghton MiZin,
1963.
provement eVort was successful, regardless of 7 Meinert C. Clinical trials: design, conduct and analysis. New
the internal validity of the evaluation design, York: Oxford University Press, 1986.
and to continue to expend resources on the 8 Mohr L. Impact analysis for program evaluation. Chicago: The
Dorsey Press, 1988.
“successful” quality improvement programme. 9 Tukey J. Exploratory data analysis. Reading, MA: Addison-
As long as outcomes are favourable, the Wesley, 1977.
inference that the improvement eVort worked 10 Box GEP, Tiao GC. Interventions analysis with applications
to economic and environmental problems. J Am Stat Assoc
will have no real consequences. If outcomes 1975;70:70–9.
deteriorate, however, no mechanism would 11 Pankratz A. Forecasting with dynamic regression models. New
exist to identify the causes of the deterioration York: John Wiley and Sons, 1983.
12 Bodenheimer T. The American health care system: the
since the original evaluation design was ill movement for improved quality in health care. N Engl J
suited to determine whether the intervention Med 1999;340:488–92.
initially worked. 13 Jencks SF, Wilensky GR. The health care quality improve-
ment initiative: a new approach to quality assurance in
Evaluation designs for quality improvement Medicare. JAMA 1992;268:900–3.
projects should be constructed to provide a 14 New Jersey Peer Review Organization (PRO). Reducing the
use of combined right/left heart catheterisation in Medicare
reasonable opportunity, given available time patients with uncomplicated coronary heart disease. New
and resources, for causal interpretation of the Jersey: PRO of New Jersey, 1997.
results. Evaluators of quality improvement ini- 15 Pepine CJ, Allen HD, Bashore TM, et al. American College
of Cardiology/American Heart Association guidelines for
tiatives may infrequently have access to ran- cardiac catheterization and cardiac catheterization labora-
domised designs. Nonetheless, as we have tories. Ad Hoc Task Force on Cardiac Catheterization. Cir-
culation 1991;84:2213–47.
shown here, other very rigorous research 16 Ljung GM, Box GEP. On a measure of lack of fit in time
designs are available for improving causal series models. Biometrika 1978;65:297–303.
interpretability. Unilateral methodological sur- 17 Agency for Health Care Policy and Research/ National
Heart, Lung and Blood Institute. Unstable angina: diagnosis
render need not be the only alternative to ran- and management. Clinical practice guideline. Washington:
domised experiments. Public Health Service, 1994.

Enhancing Causal Interpretations of Quality Improvement Interventions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Enhancing Causal Interpretations of Quality Improvement Interventions

Uploaded by

Copyright:

Available Formats

Quality in Health Care 2001;10:179–186 179

Enhancing causal interpretations of quality

series is diVerent in some systematic fashion

three laboratories. In this re-evaluation,

120 Pre-intervention decline begins (June 1994)

100 100 100 100 100 100

Lab B intervention (Oct)

You might also like