Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Medical Care

Research and Review


Supplement to
Volume 65 Number 6
December 2008 5S-35S
Lessons From Evaluations © 2008 Sage Publications
10.1177/1077558708324236
of Purchaser Pay-for- http://mcr.sagepub.com
hosted at

Performance Programs http://online.sagepub.com

A Review of the Evidence


Jon B. Christianson
University of Minnesota
Sheila Leatherman
University of North Carolina
Kim Sutherland
University of Cambridge

There has been a growing interest in the use of financial incentives to encourage
improvements in the quality of health care. Several articles have reviewed past studies
of the impact of specific incentive arrangements, but these studies addressed small-
scale experiments, making their findings arguably of limited relevance to current
improvement efforts. In this article, the authors review evaluations of more recent pay-
for-performance initiatives instituted by health plans or by provider organizations in
cooperation with health plans. Findings show improvement in selected quality mea-
sures in most of these initiatives, but the contribution of financial incentives to that
improvement is not clear; the incentives typically were implemented in conjunction
with other quality improvement efforts, or there was not a convincing comparison
group. However, the literature relating to purchaser pay-for-performance initiatives
does underscore several important issues that deserve attention going forward that
relate to the design and implementation of pay-for-performance initiatives.

Keywords: pay for performance; evaluation; implementation

T he use of financial incentives to influence behavior is common in all areas of


commerce including health care, where managed care plans and public programs
have employed, for more than 30 years, different types of risk sharing arrangements
with contracting providers (Bodenheimer & Grumbach, 1996; Robinson, 2001). For

Authors’ Note: This article, submitted to Medical Care Research and Review on June 21, 2008, was
revised and accepted for publication on July 9, 2008.
The Health Foundation in the United Kingdom and the Robert Wood Johnson Foundation in the United
States provided financial support for the development of this article. This supplemental theme issue of
Medical Care Research and Review was supported by the Health Foundation, an independent charitable foun-
dation based in the UK. The findings and conclusions of this article in this publication are those of the authors
and do not necessarily represent the views of the Health Foundation.

5S
6S Medical Care Research and Review

the most part, the intent of these arrangements has been to control expenditures
on care or at least moderate rates of increase, although quality-related rewards were
part of payment arrangements in many HMOs as early as 15 years ago (Palsbo et al.,
1993). Recently, however, there has been growing interest in the use of finan-
cial incentives to encourage improvements in the quality of care. Rosenthal,
Fernandopulle, Song, and Landon (2004) summarized the largest (pay for perfor-
mance) P4P programs underway in the United States in 2003, and Rosenthal, Landon,
Normond, Frank, and Epstein (2006) documented programs implemented by HMOs.
McElduff et al. (2004), Roland (2004), and Smith and York (2004) addressed P4P ini-
tiatives in Great Britain, and Pink, Brown, Studer, Reiter, and Leatt (2006) contrasted
P4P programs in the United States, the United Kingdom, and Australia. Several
Medicaid programs in the United States are experimenting with incentive arrangements
that reward quality (Felt-Lisk, Gimm, & Peterson, 2007), while the Medicare program
has mounted demonstrations that test different quality-related incentive programs as
well (e.g., see Kahn, Ault, Isenstein, Potetz, & Van Gelder, 2006). (We refer to pay-
ment arrangements that specifically reward quality as P4P. We use the term providers
in reference to a care delivery entity—physician practice, medical group, hospital,
integrated delivery system—that has its performance measured as part of a P4P
program. We use the term purchasers in reference to private sector employers and
health plans and public entities that sponsor P4P initiatives.)
The current interest in P4P on the part of health care purchasers in the United
States likely reflects several factors, including the evolution of the science of qual-
ity measurement in health care, research suggesting that there are substantial oppor-
tunities for quality improvement (e.g., McGlynn et al., 2003), the endorsement of
P4P by the Institute of Medicine (2006) as a component of a larger strategy for qual-
ity improvement, and the support given P4P by large private employers as a com-
plement to their “consumerism” strategy for health care reform (Galvin & Milstein,
2002). However, survey and interview data provide a basis for concerns about
whether financial incentives will motivate institutional and individual providers to
invest in quality of care improvements or whether financial incentives are even nec-
essary to improve care. For example, a physician survey conducted in Canada
(Anderson et al., 2006, p. 470) found that physicians considered participation in a
specific P4P program to be “burdensome and time-consuming.” And in-depth inter-
views with physician practice executives revealed that a desire to be a “good doctor”
may be a more powerful incentive for performance improvement than financial
rewards (Bokhour et al., 2006). Overall, the literature based on physician surveys
and interviews suggests that physicians have very mixed views concerning P4P,
(e.g., Bodenheimer, May, Berenson, & Coughlan, 2005; Casalino, Alexander, Hin,
& Konetzka, 2007; Colman, Wynn, Stevenson, & Cheater, 2001; Keating, Landon,
Ayanian, Borbas, & Guadagnoli, 2004; Young et al., 2007). Without physician sup-
port for, or at least acceptance of, P4P the success of P4P programs directed at insti-
tutions or physician practices is anything but guaranteed. At this point, what can we
Christianson et al. / Pay-for-Performance Programs 7S

conclude about the impact of purchaser-driven P4P initiatives on quality of care?


What lessons can be learned from the implementation of these initiatives to date?

New Contribution
Past literature reviews relating to P4P have summarized findings from small-scale
experiments in rewarding physicians for improvements in preventive care (Achat,
McIntyre, & Burgress, 1999; Armour et al., 2001; Dudley et al., 2004; Petersen,
Woodard, Urech, Daw, & Sookanan, 2006; Rosenthal & Frank, 2006; Scott & Hall,
1995; Town, Kane, Johnson, & Butler, 2005). These reviews included essentially the
same set of studies, reflecting the status of the literature at the time the reviews were
undertaken and the decision on the part of authors to include only studies that employed
randomized designs. The reviews all noted the limited research findings on this topic
and found few significant impacts on quality attributable to financial rewards.
The small-scale experiments described in these reviews differ markedly from P4P
initiatives currently being implemented. These experiments were directed only at
physicians and rewarded only a limited number of preventive care measures. And
they involved relatively small numbers of physician practices that served primarily
low-income populations.
This review makes a new contribution to the literature in two ways. First, it
focuses on recent peer-reviewed evaluations of “real-world” purchaser P4P initia-
tives, including initiatives directed at hospitals as well as physicians. An assessment
of these evaluations has direct relevance for purchasers considering the implementa-
tion of P4P initiatives. Second, the review also identifies and discusses implementa-
tion issues evident in published evaluations of purchaser P4P initiatives. This
contrasts with previous reviews that did not systematically assess findings related to
implementation.

Literature Search Method

To identify articles for this review, electronic searches were performed by the
Centre for Reviews and Dissemination at the University of York, with supplementary
searches undertaken by the research team. We conducted electronic searches of
MEDLINE, EMBASE, Cochrane Database of Systematic Reviews, Database of
Reviews of Effects, Econlit, the Agency for Healthcare Research and Quality, the
Organisation for Economic Co-operation and Development, and the World Health
Organization. Broad inclusion criteria were adopted because of the methodological
challenges inherent in assessing the impact of incentives on outcomes and processes
of health care. (For a complete listing of keywords used in the search, see
Christianson, Leatherman, & Sutherland, 2008). Our formal search extended
8S Medical Care Research and Review

through June 2007, but we subsequently added a small number of published articles
that came to our attention through August 2007.
We selected articles that were empirical in nature and that focused on evaluating
and understanding the impact of purchaser P4P programs on quality. Unlike some past
reviews, we did not require that articles employ a randomized design for inclusion in
this review. While randomized trials can provide valuable evidence concerning the
impact of an intervention on outcomes, they are less instructive regarding the mecha-
nisms by which change occurs. Berwick (2008) pointed out that classic randomized
designs are generally not adequate for studying the “complex, unstable, nonlinear
social change” (p. 1183) that characterizes most quality improvement efforts in health
care. They also are limited in their ability to identify issues that can arise as programs
are implemented in real-world settings. Results from evaluations of purchaser-driven
P4P initiatives are likely to be regarded by other purchasers as more relevant to their
own decisions concerning the design and launching of new P4P initiatives.
For each appropriate article identified through our search process, we drafted an
article summary using a standard abstract format. The majority of the articles that
are included in this review analyzed P4P initiatives set in the United States. This no
doubt reflects the multiplicity of payers in the United States, which creates more
opportunities to evaluate P4P initiatives as well as the relatively large number of
U.S. academic health services researchers.

Conceptual Overview

When assessing evaluations of the impact of P4P initiatives on quality of care, we


argue that it is important to understand both the context in which the P4P initiative is
implemented and the structure of the financial incentives employed (Pawson, 2003;
Pawson & Tilley, 1997). With regard to the latter, in the experimental studies discussed
in previous review articles the structure of the financial incentives was relatively straight-
forward. In contrast, the payment incentives in evaluations of purchaser P4P initiatives,
the focus of this review, reflect the more complicated reimbursement arrangements that
exist in practice, which we view as a potential strength of these studies. However, the
effectiveness of any financial incentive scheme in eliciting changes in provider behavior
depends not only on the amount and type of payment (in the Pawson & Tilley [1997]
framework, the “mechanism” of change) but also on the context in which the payment
arrangements are implemented (see Town, Wholey, Kralewski, & Dowd, 2004), includ-
ing the characteristics of the providers receiving payments (e.g., whether incentives are
directed at large physician groups or small practices). The same payment structure,
employed in different contexts, could yield quite different results relating to quality of
care. In the remainder of this section, we describe several general issues related to con-
text and to the structure of financial incentives in P4P programs that have been raised in
the literature as factors that could influence outcomes of real-world P4P programs.
Christianson et al. / Pay-for-Performance Programs 9S

Contextual Factors
Entity receiving payment. Much of the general discussion of P4P speaks of finan-
cial incentives for “providers” or “paying providers” to improve various aspects of per-
formance, without addressing what this might mean in practice. For instance, when the
“provider” is the hospital and the reward is for improvements in hospital procedures,
it is expected that the payment will motivate hospital administrators to restructure
processes or take other steps to encourage change in the desired direction. However, it
is not clear that the financial incentive actually will be passed through to physicians,
nurses, or others delivering care to patients. Hospitals could respond in any number of
ways to the financial incentive. The results could reflect the effectiveness of the hos-
pital as an organization in managing change processes as much as the type or level of
the incentive payment. Similarly, when physicians are rewarded for improvement in
chronic care processes, the impact of the reward could depend on whether the physi-
cian practices in a solo setting or as part of a group. If the reward is paid to the group,
then how the group decides to “pass through” the monies (if at all) to individuals in the
group could have a major impact on the type or amount of behavioral change that
occurs (Christianson, Knutson, & Mazze, 2006; Young & Conrad, 2007).

Concurrent incentive programs. In countries such as the United States, with plural-
istic health care systems featuring many different purchasers, it is common for
providers to be paid in multiple ways that differ in their incentives to improve quality
and their quality goals. For example, one scheme might reward a physician group for
achieving benchmarks on diabetes treatment, while another might emphasize medica-
tion management for heart failure patients in its reward structure. Some of the finan-
cial incentives implemented by different payers may reinforce each other, but others
may not. This situation becomes even more complicated when individual purchasers
have incentive schemes that reward cost and utilization control along with quality
enhancement or when medical care organizations reward provider productivity along
with achievement of quality benchmarks (Reschovsky & Hadley, 2007). Providers
often must decide where to allocate resources and attention in response to a compli-
cated set of financial signals from a large number of purchasers. In this context, one
might expect a different response than if the same incentive were implemented in a
system where there was one payer or a dominant payer whose actions others mimic-
ked and where providers were rewarded based on a single set of quality measures.

The problem of small numbers. Small providers may not have sufficient numbers
of patients with specific medical problems (e.g., diabetes) to construct a reliable
measure of performance. When there are a relatively small number of patients asso-
ciated with a specific provider, performance on quality metrics is likely to reflect, to
a significant degree, random variation or factors other than the efforts of physicians
(Hofer et al., 1999). For physicians practicing in medical groups, or for hospitals that
10S Medical Care Research and Review

are part of local systems, this small numbers problem could be addressed by aggre-
gating the performance of individual providers to the group or system level.
However, the “incentive effect” of the payment at the individual provider level then
becomes a matter of organizational policy (Christianson et al., 2006), and it will not
always be the case that providers that perform the best on P4P metrics receive higher
payments (Bokhour et al., 2006). Where multiple payers are involved, the small
numbers problem can be addressed in P4P initiatives by pooling patient data across
payers. This may or may not yield numbers of patients “large enough” for a valid
assessment of provider performance. And this approach requires that payers share
data and agree on a common set of performance measures to reward, both likely
to be daunting tasks. Where the small numbers problem cannot be effectively
addressed, the resulting performance measures may lack credibility and weight with
providers, who therefore may resist allocating resources to performance improve-
ment relating to those measures.

Structure of Payment Arrangement


Size of P4P payments relative to provider revenues. P4P initiatives can range
from small-scale programs, where relatively little of a provider’s revenue is affected
by financial incentives, to efforts such as the P4P initiative in the United Kingdom
targeted at general practitioners, where reward dollars could potentially result in
major increases in practice revenues (Roland, 2004). In response to small-scale
programs, providers may determine that it makes little financial sense to invest in
practice reforms necessary to achieve the designated improvements or benchmarks.
In contrast, where the P4P initiative has a potentially significant effect on provider
incomes, providers may respond in fundamentally different ways. Financial incen-
tives that are structured the same with respect to goal achievement in a specific clin-
ical area could have quite different effects depending on the importance of P4P
payments relative to overall practice revenues.

Number of measures on which rewards are based. Linking a portion of provider


payment to performance on a limited number of predetermined metrics may encour-
age providers to “manage to the metric,” reconfiguring practice resources to improve
P4P scores, with the possibility that quality of care in other areas languishes or even
declines. Managing to the metric, in theory, can stimulate providers to seek out
patients who are likely to do well, all else equal, in the areas being measured while
referring less attractive patients to other providers when possible. A concern often
expressed is that less educated, or possibly less motivated, patients will find it more
difficult to access care because providers may view them as less dependable in
“doing their part” to manage their illnesses. Also, Berwick (1995) has noted that P4P
can discourage innovation in areas not addressed by P4P metrics.
Christianson et al. / Pay-for-Performance Programs 11S

Risk-adjusting P4P payments. Patients are not likely to be randomly distributed


across providers with respect to the severity of health conditions. Some providers
become expert at caring for “difficult cases” and consequently accumulate a dispro-
portionate number of these patients in their case loads. Where this occurs, these
providers could be graded poorly on metrics related to the percentage of patients in
compliance with P4P standards. The solution is to use statistical methods to “risk
adjust” performance measures to even the playing field across providers with respect
to severity of patient mix. However, risk-adjustment techniques can be difficult to
explain, require sophisticated statistical methods to implement, and may not be
entirely successful. Also, providers may view them as arbitrary “black boxes” and be
suspicious of their validity.

Sources of P4P funds. An obvious cost of any P4P scheme consists of the funds
paid out to providers who achieve improvements in quality of care or perform at tar-
geted levels with respect to P4P metrics. If these payouts are not “new money,” then
providers are likely to view P4P as an attempt to redistribute, with the risk of reduc-
ing, reimbursement (Ferman, 2004) rather than an effort to improve quality of care.
However, P4P programs that funnel substantial additional funds into the health care
system may entail costs (including administrative costs) that are not acceptable or
sustainable by purchasers who believe expenditures are already too high. In this
case, purchasers may be tempted to set payments at a level that is not sufficient to
induce change in provider behaviors or sustain their P4P programs over time.

Participation costs incurred by providers. The general premise that individuals will
work harder, or will provide more, of a particular behavior if rewarded to do so is intu-
itively appealing. It also seems reasonable that higher rewards will result in greater
effort, up to a point. However, the size of the payment necessary to elicit the desired
behavioral change will be related to the transaction costs incurred by providers in rais-
ing their performance (Fisher, 2006; Young & Conrad, 2007), and this in turn will be
influenced by provider characteristics and the nature of P4P performance metrics.
Therefore, there is no single “ideal” payment level needed to bring about the results
desired by purchasers. In some instances, P4P will be ineffective because the perfor-
mance reward is “too small,” while in other cases the size of the reward will be “more
than necessary” to bring about change. It is likely that some P4P programs will not be
effective simply because of the difficulty in specifying appropriate reward levels.

Description of Findings: Evaluations


Addressing Impacts on Quality Measures

In this section, we focus on findings related to the impact of purchaser P4P initia-
tives on measures of quality. In synthesizing this literature, we follow the general
12S Medical Care Research and Review

“context-mechanism-outcome” framework suggested by Pawson, Greenhalgh,


Harvey, and Walshe (2005) without adopting all of its specifics. That is, for these arti-
cles, we contrast the contexts in which the P4P initiatives were implemented, the
mechanisms by which the financial incentives were transmitted from purchasers to
providers (which we call the structure of payments), and the findings of the evalua-
tions relating to the impacts of P4P on targeted measures of quality and other aspects
of care delivery. We do this first for evaluations of P4P programs directed at physi-
cians and then turn to evaluations of programs that target hospitals. First, however, we
discuss the basic designs that are used in the articles to evaluate impacts, irrespective
of whether the P4P initiative seeks to influence physician or hospital behaviors.
Almost all of the “impact” evaluations of P4P initiatives employ either a quasi-
experimental or a before–after study design. Each design has its limitations. In quasi-
experimental designs, the evaluator seeks to identify a contemporaneous comparison
group that has characteristics that mirror, as closely as possible, the characteristics of
providers receiving P4P payments. The performance in the comparison group then is
compared to that of the P4P group of providers over time. Because no comparison
group is ever ideal, most studies use data on provider characteristics in the P4P and
comparison groups, along with statistical methods, to adjust for differences between
the two groups. Even then, there is the possibility that the groups differ on character-
istics that are either not observable or that, while observable in theory, cannot be mea-
sured with existing data sources. As a result, when interpreting estimated impacts of
P4P initiatives, evaluators using quasi-experimental designs usually are careful to
note that any differences they observe could reflect underlying differences in the
groups rather than the impact of the P4P initiative. Where no significant differences
are found, there also is the possibility that real differences exist but are obscured by
unmeasured, uncontrolled differences in group characteristics.
“Before–after” evaluation designs also are used frequently to evaluate the impact
of P4P, but they typically are viewed as weaker than quasi-experimental designs
because the control group they employ is not as convincing. In these evaluation
designs, changes in quality are tracked before and after implementation of the P4P
initiative, but only for the providers affected by P4P. Essentially, the outcomes prior
to P4P (sometimes projected forward based on past trends in the data) are taken to
represent what would have happened in the absence of P4P; in effect, the providers
in the P4P initiative serve as their own control group. The credibility of this approach
is greatest where no obvious environmental changes that could have influenced
provider behaviors relative to the P4P performance metrics occurred simultaneously
with the implementation of the P4P initiative. Researchers adopting the before–after
approach in assessing P4P impacts generally do so because they were not able to
employ a contemporaneous control group. They acknowledge the limitations inher-
ent in this design and sometimes attempt to address them by comparing their find-
ings to national trend data on performance indicators. Alternatively, they restrict
their analysis to relatively short time periods to minimize the potential impact of
Christianson et al. / Pay-for-Performance Programs 13S

external environmental changes and the need to fit statistical trend lines to past
performance data.

Case-Specific Evaluations of Physician P4P Initiatives


Our literature search identified nine evaluations where the primary intent was to
assess the relationship between a single P4P initiative and quality improvement or
achievement of quality benchmarks by physicians or physician groups (Table 1).

Context. The nine evaluations addressed physician P4P initiatives that were imple-
mented between 1987 (Morrow, Gooding, & Clark, 1995) and 2004 (Campbell et al.,
2007). One initiative was undertaken by a large integrated delivery system that owned
a managed care plan (Larsen, Cannon, & Towner, 2003), while another was developed
by an HMO in collaboration with a large contracting physician network (Levin-Scherz,
DeVita, & Timbie, 2006). Six were implemented by managed care organizations (a
PPO, two IPA model HMOs, two network model HMOs, and a group practice HMO).
There was one initiative sponsored by the National Health Service (NHS) in the United
Kingdom (Campbell et al., 2007). The scope of these P4P initiatives ranged from the
NHS program, which encompassed all general practitioners in the United Kingdom, to
a relatively small pilot program in upstate New York (Beaulieu & Horrigan, 2005). The
entities receiving P4P payments included individual physicians or small physician
practices (Beaulieu & Horrigan, 2005; Campbell et al., 2007; Chung, Chernicoff,
Nakao, Nickel, & Legorreta, 2003; Greene et al., 2004; Morrow et al., 1995), medical
groups (Amundson, Solberg, Reed, Martini, & Carlson, 2003; Rosenthal, Frank, Li, &
Epstein, 2005), and large physician networks (Levin-Scherz et al., 2006).

Structure of payment arrangements. The P4P initiatives also varied considerably


in the number and types of measures that they targeted with financial incentives. One
program addressed only compliance with a treatment guideline for acute sinusitis
(Greene et al., 2004), while others used a small number of measures related to treat-
ment of diabetes (Beaulieu & Horrigan, 2005; Larsen et al., 2003) or the encour-
agement of smoking cessation (Amundson et al., 2003). Three P4P programs used a
variety of measures of quality in the treatment of different conditions or the provi-
sion of preventive care (Levin-Scherz et al., 2006; Morrow et al., 1995; Rosenthal
et al., 2005). In two cases, the P4P initiatives were very broad in their scope, employ-
ing point systems that encompassed medical quality, patient satisfaction, and busi-
ness operations (Campbell et al., 2007; Chung et al., 2003). Metrics for diabetes care
were included in six of the nine programs. As Beaulieu and Horrigan (2005) noted,
this likely reflects the fact that there are documented gaps in the treatment of dia-
betes, accepted guidelines for treatment, and credible quality measures.
Seven of the nine P4P initiatives offered bonus payments of some type for achiev-
ing quality benchmarks or performing at relatively high levels, while two returned a
(text continues on p. 19S)
Table 1
Evaluations of the Impact of Programs That Provide Financial Incentives to Physicians for Quality

14S
Quality Effect of
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

Amundson, Minnesota Physicians in 19 Audits of 14,489 Documentation Bonus pools Documentation No


Solberg, Reed, to 20 medical ambulatory patient of tobacco established for increased contemporaneous
Martini, and groups records from 1996 use and each medical significantly comparison
Carlson (2003) participating in to 1998 discussion of group, with a for 13 of 20 group was
a network tobacco use, portion of groups, and present, and
model HMO with medical bonus discussion increases in
group targets payment improved for “discussion”
of 80% for directed to 7 groups may be in part
each performance because of
on tobacco better
quality documentation
measures
Beaulieu and Upstate New 21 physicians and Performance Composite Physicians Composite It is not possible
Horrigan York 624 diabetic self-reported by measure of receive $3 scores to separate the
(2005) patients physicians three performance PMPM for increased effect of other
times in study in delivering Medicare by 48% changes
year diabetes care patients and introduced at
according to $.75 for the same time
best practices commercial for from the effect
composite of financial
score above incentives, and
6.86; $1.50 and physician
$.37 for a score participants
above 6.23, were
and $.75 and volunteers
$.18 for a 50%
improvement,
with score
below 6.23

(continued)
Table 1 (continued)
Quality Effect of
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

Campbell et al. United Kingdom 42 primary care 3 years of clinical Measures of New funds Significant but Quality
(2007) practices records, 2 years asthma, awarded based relative small improvement
before P4P diabetes, and on points for increase in efforts were
program and 1st coronary 146 quality the trend rate underway prior
year of program heart disease indicators for asthma to P4P and
relating to 10 and diabetes continued in
chronic care the 1st year of
illnesses as indicators P4P.
well as after P4P.
organization
of care and
patient
experience.
Payments of
$133 per point
with 1,050
possible
points.
Chung, Hawaii 800 of Claims data from Use of ACE 3.5% of base fees Consistent, No
Chernicoff, approximately 1997 to 2000 inhibitors or were earned, statistically contemporaneous
Nakao, Nickel, 1,500 eligible ARB in heart on average, by significant control group
and Legorreta physicians in a failure, participating improvement
(2003) PPO measurement physicians; in use of
volunteered to of HbAlc in 17 physicians ACE
participate in diabetes, and received inhibitors or
first year rates of maximum ARB and in
childhood rewards in measurement
immunizations 2001, with of HbAlc
14 receiving
$10,000 and
3 receiving

15S
$13,000

(continued)
Table 1 (continued)
Quality Effect of

16S
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

Felt-Lisk, Gimm, California Physician Encounter data Percentage From 2003 to There was Little information
and Peterson practices in from 2002 to of plan 2005, four of overall was provided
(2007) 5 different 2005, Medicaid members five plans paid improvement regarding
Medicaid administrative meeting bonuses to in statistical
managed care data, meeting HEDIS well- contracting performance methods used to
plans that notes, and baby visit practices based related to estimate
received interviews guidelines on the guidelines for difference in
payments for proportion of well-baby differences
performance children who care but there effects. More
and practices met well-baby was no effect successful
in 2 plans that visit guidelines. in 2 plans, programs
did not The fifth made possible small offered greater
payments effects in 2 rewards and had
directly to plans. There better
physicians appeared to communication
from a bonus be substantial with physicians
pool. improvement about program
in only one characteristics.
plan.
Greene et al. Rochester, New 500 internists, Medical claims Treatment Amount withheld Mean overall The financial
(2004) York 200 family organized using exceptions from capitation exceptions incentives were
practitioners episode per episode payment per episode part of a
and 200 treatment group in treatment decreased from decreased multifaceted
pediatricians methods of sinusitis 15% to 10% because of intervention and
for top 5% of decrease in it is not possible
performers and use of less to determine the
increased to effective or specific effect of
20% for the inappropriate financial
bottom 5% antibiotics incentives; no
contemporaneous
control group

(continued)
Table 1 (continued)
Quality Effect of
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

Larsen, Cannon, Utah 400 employed Laboratory data, Six different Overall financial There was Financial incentives
and Towner physicians who health plan performance incentive statistically were part of a
(2003) were part of claims, physician measures totaled 0.5% significant complex,
integrated billing, clinical related to the to 1.0% of improvement multifaceted
health system information treatment of compensation in all six quality
systems diabetes with about indicators improvement
half directed intervention, so
at diabetes it is not possible
care to determine the
effect of the
financial
incentives by
themselves; no
contemporaneous
control group
Levin-Scherz, Massachusetts Physicians Medical claims Performance Bonus payments Significantly It is not possible to
DeVita, and participating in relative to and the return greater isolate the effect
Timbie (2006) a provider benchmarks of portion of improvement of financial
network that for diabetes withholds in in diabetes incentives from
contracts with and asthma managed care measures, other quality
health plans are care, selected contracts; relative to the improvement
compared to from larger magnitude of comparison efforts
physicians not group of the incentives group; no implemented at
in the network performance not stated significant the same time
measures improvement
in asthma
care

(continued)

17S
Table 1 (continued)

18S
Quality Effect of
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

Morrow, Northeastern Primary care Audited medical Rates of Payment in Significant Longitudinal study
Gooding, and United physicians chart data childhood addition to improvements with no
Clark (1995) States contracting MMR base capitation in all contemporaneous
with an IPA immunization, payment to measures control group
model HMO cholesterol primary care
screening for physicians;
adults, amount of
appropriate payment for
charting of quality
information indicators not
specified
Rosenthal, California and Analytic sample Claims-based Cervical cancer Payments of Significant Most (75%) of the
Frank, Li, and the Pacific included 134 performance screening, $0.23 PMPM improvement dollars were
Epstein (2005) Northwest medical groups data aggregated mammography, for each target in cervical earned by groups
contracting with to the physician and HbA1c achieved. A cancer that had
a health plan in group level testing group with screening achieved the
California that (selected from 10,000 plan relative to the benchmarks
were exposed to 10 measures members control prior to the
a financial subject to could group; no incentive
incentive and financial potentially significant program,
33 groups in the incentives) earn $270,000 improvement however there
Northwest per year. in the other was substantial
contracting with two measures improvement in
the same low performance
groups but not groups
exposed to the
incentive

Note: PMPM = per member per month; P4P = pay for performance; ARB = angiotensin receptor blocker; HEDIS = Healthcare Effectiveness Data and
Information Set; AMI = acute myocardial infarction. Some of the contents of Table 1 appeared in Christianson, Leatherman, and Sutherland (2007).
Christianson et al. / Pay-for-Performance Programs 19S

percentage of withheld funds (Greene et al., 2004; Levin-Scherz et al., 2006). Thus,
in the majority of cases health plans rewarded performance with “new money.”
There was a penalty for poor performance in only one instance (Levin-Scherz et al.,
2006), and in only one initiative was improvement in performance rewarded along
with exceptional performance (Beaulieu & Horrigan, 2005). With the exception of
the P4P initiative in the United Kingdom and one program in the United States, the
amount of potential P4P payment was not large relative to overall practice revenues,
although this information was not provided in all of the evaluations. Larsen et al.
(2003) reported that P4P payments constituted 0.5% to 1% of total compensation,
and Levin-Scherz et al. (2006) noted that a moderate-sized practice could earn an
additional $3,000 to $5,000 annually in P4P payments. Rosenthal et al. (2005) esti-
mated that an average-sized medical group with 10,000 health plan patients could
earn as much as $270,000 annually in P4P funds, equivalent to 5% of the capitation
payments made by the health plan to the group but less than 1% of a group’s over-
all revenues. On the other hand, in the P4P initiative evaluated by Beaulieu and
Horrigan (2005), practices could earn a bonus equal to 12% of per member per
month payments from the plan. Without question, the greatest potential for increased
physician income from P4P was present in the U.K. initiative, evaluated by
Campbell et al. (2007). The NHS committed about $3.2 billion to the initiative over
3 years, beginning in 2004. The maximum annual payment that a practice could
receive if it achieved all available points was $139,400 (Doran et al., 2006).

Effects of P4P initiatives. Although payments were structured in a variety of ways


and there was considerable variation in the contexts of the P4P initiatives, the evalu-
ations reported quite similar outcomes. In every initiative, there was significant
improvement with respect to at least one quality metric. Evaluations that reported
improvement in a larger number of metrics typically used a before–after research
design. There also would appear to be the potential for “volunteer bias” in reported
outcomes (e.g., Chung et al., 2003), although the procedures used to recruit physician
practices to P4P initiatives and the latitude given practices to “opt out” were not
always described. In three study settings where a before–after design was used, the
P4P initiative was implemented as part of a larger quality improvement effort, so that
factors other than financial incentives may have influenced the results (Amundson
et al., 2003; Greene et al., 2004; Larsen et al., 2003). In the U.K. P4P analysis, the
physician practices included in the study had been participating in quality improve-
ment efforts prior to P4P and continued to do so after P4P was implemented. In this
situation, it was reasonable for the evaluators to interpret their findings as represent-
ing the incremental effect of the P4P program. In the three conditions they studied
(asthma, coronary heart disease, and type 2 diabetes), there was a modest increase in
the trend rate for improvement for asthma and diabetes care after implementation of
P4P but no significant increase with respect to coronary heart disease, where perfor-
mance was quite high prior to P4P. In an effort to determine if there were unintended
20S Medical Care Research and Review

negative impacts of P4P in physician practices, the evaluators also tracked trends in
17 quality indicators not covered under P4P. They found no differences in trends in
these indicators and the P4P measures. Overall, they concluded that P4P could make
a useful contribution to quality improvement efforts (Campbell et al., 2007).
Three evaluations employed a quasi-experimental research design. Beaulieu and
Horrigan (2005) reported significant improvement in five of six process measures for
diabetes care and two of three outcome measures. However, P4P payments were
coupled with other quality improvement efforts implemented by the managed care
organization, making the contribution of financial incentives to improvements in
quality unclear. Levin-Scherz et al. (2006) found improvements in claims-based dia-
betes measures from 2001 to 2003 when compared to an index plan. There were
smaller improvements in asthma measures, but these measures were at a high level
in the baseline year, leaving less room for improvement. Again, the financial rewards
were combined with other efforts to improve care (e.g., patient outreach activities,
the development of a registry, physician profiling), so that it was impossible to deter-
mine the incremental impact of the incentives alone.
The evaluation design employed by Rosenthal et al. (2005) held the greatest
promise for determining the incremental impact of P4P on measures of quality. Out
of 10 measures in the health plan’s P4P initiative, they selected 3 for evaluation: cer-
vical cancer screening, mammography, and hemoglobin A1c testing. Within the
plan, some medical groups received P4P payments, while groups in another region
did not, permitting the use of a relatively strong quasi-experimental design. Data on
performance measures had been reported to medical groups in both regions for sev-
eral years before the program was implemented, so that changes after P4P was
implemented were not likely to reflect a “reporting effect.” Compared to the groups
not receiving a P4P payment, the groups receiving a payment demonstrated better
performance only in cervical cancer screening.

Cross-Cutting Analyses of Physician P4P Initiatives


Two studies analyzed data from multiple health plans but had different objectives
and employed different analytic approaches. Using data from 2000 to 2001, Ettner
et al. (2006) examined the association between reimbursement incentives in 10
health plans and a variety of measures of the treatment of diabetes. They found that
care processes were better for providers who were reimbursed on a salary basis with
quality and satisfaction scores determining a portion of physician payment. In their
study design, it was not possible to determine if the financial incentives resulted in
better performance or if use of particular payment approaches was more common in
situations where physicians performed better on the metrics.
Felt-Lisk et al. (2007) studied a Medicaid P4P demonstration involving contract-
ing health plans in California. Five health plans rewarded physicians for achieving
benchmarks for well-baby care, with the structure of the payment approach varying
Christianson et al. / Pay-for-Performance Programs 21S

by health plan. Four of the five plans offered bonuses of varying magnitudes to a
contracting physician entity based on the number of children whose care met well-
baby care guidelines, while the fifth plan made payments directly to physicians from
an existing bonus funds pool. Felt-Lisk et al. reported favorable trends in well baby
visits from 2002 to 2005, using a “difference in difference” analytic approach.
However, they did not present the details of this analysis. They also found substan-
tial variation in the experiences of the five P4P programs, with the favorable overall
results reflecting primarily the success of a single program.

Institutional P4P Initiatives


Five studies addressed the impact of payments to institutional providers for
improving quality or reaching quality targets, with three of these studies addressing
impacts on quality of the Centers for Medicare & Medicaid Services (CMS) Premier
Hospital Quality Initiative in the United States.

Context. The earliest of the five studies examined the impact of bonus payments,
beginning in 1995, to 21 public emergency departments in Victoria, Australia
(Cameron, Kennedy, & McNeil, 1999). A U.S. study that did not focus on the CMS
initiative evaluated a financial incentive program implemented by a health plan
in Hawaii. Fourteen hospitals participated in this plan in 2001, with the number
increasing to 17 in 2003. The CMS Premier Hospital Quality Initiative took advan-
tage of an ongoing program in which more than 600 hospitals voluntarily reported
information on the quality of their care. Of these reporting hospitals, 207 were also
participants in the CMS P4P initiative. A control group was constructed from reporting
hospitals that did not participate in the CMS Premier initiative.

Structure of payment arrangements. In the emergency department study, the depart-


ments received bonus payments at the beginning of each year and were required to
return varying portions of the bonus if they did not achieve performance targets relat-
ing to ambulance bypass, waiting time for patients at different levels of emergency, and
patients waiting more than 12 hours for hospital admission. Payments started at $7.2
million in total and rose to $17 million by 1997-1998. P4P payments for hospitals in
the Hawaii P4P program were based on points accumulated in 4 areas: process mea-
sures of care (20 points), outcomes measures (45 points), service satisfaction measures
(25 points), and business operations measures (10 points). An individual hospital’s
reward was determined as the product of the total award budget times the hospital’s
percentage of the health plan’s inpatient care expenses in a given year times the per-
centage of the maximum points (100) achieved by the hospital in that year.
Payment in the CMS Premier program was based on performance on 33 quality
measures relating to five clinical conditions. If a hospital fell into the top decile of par-
ticipating hospitals in a composite measure of quality in a given year, it received a 2%
22S Medical Care Research and Review

bonus payment added to its reimbursement. Hospitals in the second decile received a
1% payment. In a relatively unique aspect of the payment structure, hospitals that
failed to surpass, by the 3rd year, the performance level of hospitals in the lowest two
deciles, as established during the 1st year, incurred payment penalties of 1% to 2%.

Effects of P4P incentives. The evaluation of the emergency department P4P initiative,
using a before–after study design, reported significant improvement in two of the three
measures, with improvements sustained for 3 years. The evaluators attributed the success
of the program to that fact that it was designed collaboratively with the emergency
departments. Evaluators of the Hawaii health plan program tracked performance mea-
sures over a 4-year period, reporting improvement in aggregated rates of risk-adjusted
surgical complications and reduced lengths of stay for several surgical procedures but
mixed results for patient satisfaction. There were several limitations to the research
design they used to assess effects on outcomes, including voluntary hospital participa-
tion, lack of preprogram data, and absence of a contemporaneous comparison group.
Evaluations of the impact of the CMS Premier initiative all used research designs
with contemporaneous control groups, but hospitals could voluntarily choose to be
part of the P4P initiative, raising questions about the possibility of “volunteer bias”
in these findings as well. Grossbart (2006), analyzing outcomes within a single hos-
pital system in Ohio, found slightly greater improvement in P4P hospitals over time
in measures relating to acute myocardial infarction, heart failure, and pneumonia.
Glickman et al. (2007) evaluated the performance of 500 hospitals that were part of
a quality improvement program relating to treatment of acute myocardial infarction,
with 54 of these hospitals also participating in the P4P initiative. To assess possible
unintended consequences of the P4P Initiative, the evaluators tracked eight measures
of care not subject to P4P incentives, in addition to six measures that were. They
found slightly higher rates of improvement in two of the six measures that were
addressed by P4P: aspirin at discharge and smoking cessation counseling. There was
no significant difference in a composite score based on all six measures, nor was
there a significant affect on a composite of the eight other measures. Overall, the
authors concluded that P4P added little to the existing quality improvement effort.
The most comprehensive evaluation of the CMS Premier initiative was carried
out by Lindenauer et al. (2007), who compared 207 hospitals that voluntarily agreed
to participate in the initiative to other hospitals that chose not to participate. The
evaluators were able to compare before–after changes in performance of participat-
ing and nonparticipating hospitals, using multivariate methods to control for differ-
ences in hospital characteristics. They found significant improvements of from 2.6%
to 4.1% in composite performance measures over 2 years that were attributable to
the P4P initiative. Hospitals at all levels of performance at baseline demonstrated
improvement, but most of the bonus dollars went to hospitals with the highest per-
formance at baseline. Overall, the evaluators characterized the improvements under
the CMS Premier Hospital Quality Initiative as “modest” (see Table 2).
(text continues on p. 27S)
Table 2
Evaluations of the Impact of Programs That Provide Financial Incentives to Hospitals for Quality
Quality Effect of
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

Cameron, Victoria, 21 public hospital 3 years of Occasions of Bonus payment Significant and Success of
Kennedy, and Australia emergency performance “ambulance based on relatively program
McNeill (1999) departments data collected bypass;” hospital large attributed in
from the Victoria emergency through put improvements part to
Emergency waiting times made at in all areas collaboration
Department for 3 different beginning of except with emergency
Minimum classifications each year; patients departments in
Dataset of patients bonus pool waiting more its design
compared to initially $7.2 than 12 hours
national million in in emergency
performance 1995, with departments
thresholds; increase to
greater than $17 million by
12 hours 1997-1998.
waiting time Bonus is
before reduced
hospital depending on
admission degree to
(“access which targets
block”) are missed.
Glickman et al. United States 500 hospitals Patient data Six process of Hospitals in top Slightly higher There was no
(2007) participating in abstracted from care measures decile received rates of significant
a quality 2003 to 2006 were primary a 2% bonus; improvement difference in a
improvement outcome those in the for 2 of the 6 composite
initiative for measures; 8 second decile measures consisting of
acute care measures received 1% rewarded by measures not
myocardial not included bonus; those Medicare- rewarded by

23S
(continued)
Table 2 (continued)
Quality Effect of

24S
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

infarction, 54 in the below aspirin at Medicare,


of which were Medicare P4P measures for discharge and suggesting that
in a Medicare initiative lowest 2 smoking care in other
P4P program were also deciles cessation areas was not
tracked (established in counseling. adversely
1st year) No significant affected.
penalized 1% difference in
to 2% a composite
score of all 6
measures
rewarded by
Medicare.
Grossbart (2006) Ohio, 4 acute care Performance data Composite Hospitals in the The All system
Kentucky, hospitals that from hospital quality scores top decile improvement hospitals were
Pennsylvania, are part of the database on in 3 clinical received a 2% in the participating in
and same hospital 28,925 patients areas: AMI, bonus; those hospitals in systemwide
Tennessee system and from 2002 heart failure, in the second the P4P quality
participated in to 2004 and decile received program was improvement
a Medicare P4P pneumonia a 1% bonus; slightly initiatives; the
demonstration those below greater P4P program
were compared measures for overall than reportedly
to 6 system the lowest 2 improvement generated
hospitals deciles in the strong support
not in the (established in comparison on the part of
demonstration 1st year) group CEOs for
penalized 1% hospitals, with improvement
to 2% large efforts
improvements
in heart
failure care

(continued)
Table 2 (continued)
Quality Effect of
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

Lindenauer et al. United States 612 hospitals Hospital Individual Hospitals in top Adjusting for Most of bonus
(2007) voluntarily self-reported process decile received hospital payments went
reporting performance measures of 2% bonus; characteristics to hospitals that
information as data from 2003 acute those in second and baseline were high
part of a to 2006 myocardial decile received performance, performers at
national public infarction, 1% bonus; P4P hospitals baseline;
reporting heart failure, those below improved improvements
initiative, 207 and measures for from 2.6 to characterized as
of which pneumonia lowest two 4.1% over 2 “modest”;
participated in and composite deciles years possible
a Medicare scores for (established volunteer bias
P4P program AMI, heart in 1st year)
as well failure, penalized
pneumonia, 1% to 2%.
and all Bonuses
combined averaged
$71,960 per
year.
Berthiaume, Hawaii 14 (2001) to 17 Claims data, data Hospitals could In 2004, the Over a 4-year Participation in the
Chung, (2003) submitted by accumulate combined period there program was
Ryskina, hospitals hospitals in up to 100 payout across were voluntary, there
Walsh, and quality initiatives points for all hospitals improvements were no
Legoratta and patient performance was $9 in aggregated preprogram data
(2006) survey data in 4 areas: million. A rates of risk- available, and
process hospital's adjusted there was no
measures award was the surgical comparison
(20 pts.), product of the complications group in the
analysis

25S
(continued)
26S
Table 2 (continued)
Quality Effect of
Geographic Type of Data Measure or Financial Financial
Author Scope Physicians Analyzed Measures Incentives Incentives Comments

outcomes total award and decreases


measures budget in risk-
(45 pts.), specified by adjusted
service the health plan average
satisfaction times the length of stay
measures hospital's for several
(25 pts.), and percentage of surgical
business the health procedures.
operations plan's inpatient Results for
measures care payout in patient
(10 pts.). Not a given year satisfaction
all hospitals times the were mixed.
were eligible percent of
to be scored maximum
on every points
measure. achieved by
the hospital in
that year.

Note: P4P = pay for performance; AMI = acute myocardial infarction.


Christianson et al. / Pay-for-Performance Programs 27S

Implementation Issues

The primary objective of the evaluations reviewed to this point was determining
if P4P initiatives had an effect on quality of care measures. However, several of these
“impact” evaluations mounted supplemental studies in an attempt to answer ques-
tions regarding how the P4P program achieved its impact or why it was not more
effective. In this section, we combine their findings with the results of other studies
that responded to opportunities to investigate specific, often unexpected, provider
responses to P4P initiatives. We organize our discussion of this literature according
to specific implementation decisions and issues.

Level and Type of Payment


One implementation issue of interest to purchasers is the level at which P4P pay-
ments need to be set to achieve desired outcomes. There were no quantitative results
in the literature that directly addressed this question. In a cross-comparison of
Medicaid P4P programs, Felt-Lisk et al. (2007) found that the most successful plans
with respect to quality improvement paid the highest rewards.
Also of interest to purchasers are the implications of different methods for struc-
turing payments in P4P initiatives. Rosenthal et al. (2005) found that, when physi-
cian payments were based on achieving benchmarks, most (75%) of the dollars paid
out by the health plan went to medical groups that had achieved the benchmarks
prior to when the P4P program was implemented. In effect, the program rewarded
physicians for their historical performance. Lindenauer et al. (2007) reported a sim-
ilar pattern for hospitals in the CMS Premier initiative.

Risk Adjustment
There were no comparative analyses of the implications of different types of risk
adjustment for P4P programs. However, an analysis of the U.K. P4P program by
Doran et al. (2006) pointed out the influence of a particular method of risk adjust-
ment for the distribution of P4P dollars. They focused on the 1st-year evidence
regarding factors influencing performance of physicians relative to indicators, find-
ing that the median practice achieved 95.5% of the available points on which pay-
ments were based (75.0% achievement was anticipated). That is, physicians who
were able to exclude larger proportions of patients from the payment calculations
primarily because these patients suffered from relatively complicated medical con-
ditions (a form of risk adjustment allowed under program regulations) scored higher
than did other physicians. In fact, this was the best predictor of the amount of dol-
lars received by primary care practices in the U.K. P4P program.
28S Medical Care Research and Review

Communication of Incentives
An important part of implementing any P4P initiative is communicating with par-
ticipating providers regarding the nature of the payments. If providers are not aware
of the incentive program, or do not understand how its rewards are structured, it is
unlikely that improvements will be observed or, where they are, that these improve-
ments will be because of financial incentives as opposed to other components of a
quality improvement effort. In their analysis of several P4P efforts implemented in
Medicaid programs, Felt-Lisk et al. (2007) concluded that better results relative to
conformance with guidelines for well-baby care were achieved in Medicaid P4P
programs where there was better communication with physicians. In two early exper-
imental studies, Hillman and colleagues (1998, 1999) did not find significant impacts
for financial payments relating to measures of preventive care. In a follow-up analy-
sis, they found that only about half the participating physician practices were aware
of the incentives. In the United States, providers’ practices may receive rewards from
several different health plans, based on achievement of different quality of care indi-
cators. In this environment, effectively communicating with providers, especially
individual physicians, about the (often) complicated reward structures in specific P4P
initiatives would appear to pose a substantial challenge for purchasers.

Cost and Cost-Effectiveness


Purchasers certainly are concerned about whether P4P programs generate a return
on their investment. This has been hard to establish because most P4P programs (as
summarized above) have not conclusively established the contribution of financial
incentives to improvements in care processes or outcomes. Nevertheless, two pub-
lished studies have addressed this issue and found positive returns. Curtin, Beckman,
Pankow, Milillo, and Greene (2006), addressing diabetes care only, calculated the
return on investment in a P4P program from a health plan perspective. Payments from
a health plan to an IPA withheld dollars to be returned if the IPA met target perfor-
mance levels. Each year, about $15 million of these withheld funds were distributed
to 3,700 participating physicians, specialists as well as generalists. An average
primary care physician’s distribution ranged from $6,000 to $18,000 annually across
all performance measures. Diabetes care was one component of the overall perfor-
mance score on which the payout was based. Historical trend data (2000 to 2002)
were used to estimate what the costs of care would have been for diabetes patients in
2003-2004 in the absence of the P4P program, and this was compared to the cost of
the diabetes program. Claims data provided by the health plan were used in the analy-
sis, and savings were calculated from the perspective of the plan. The authors found
a positive return on investment of 1.6 to 1.0 in 2003 and 2.5 to 1.0 in 2004. The most
significant cost reductions occurred in the area of hospital care. The authors pointed
out that the P4P initiative essentially rewarded physicians for providing more care, in
Christianson et al. / Pay-for-Performance Programs 29S

most instances, for their patients with diabetes, presumably adding to direct treatment
costs. Thus, the positive rate of return was more impressive than if achieving perfor-
mance goals had required no additional treatment or reductions in treatment.
Nahra, Reiter, Hirth, Shermer, and Wheeler (2006) estimated the quality-adjusted
life years (QALYs) gained by patients hospitalized for heart treatment compared to
the money spent by a health insurer in performance payments. In a program involv-
ing 85 hospitals in Michigan, an insurer paid $22 million over 4 years in supple-
menting its DRG-based reimbursements, with the maximum additional payment
ranging from 1% to 2% from 2002 to 2003. The cost per QALY gained was between
$13,000 and $30,000, which, the authors noted, is well under the consensus value for
a QALY, suggesting that the insurer’s P4P initiative was cost-effective.

Practice Impacts
P4P programs can have impacts that are unanticipated. Several studies have doc-
umented a variety of impacts of this nature. For example, a very early study by
Langham, Gillam, and Thorogood (1995) examined changes in the distribution of
financial incentive payments for health promotion in the United Kingdom, focusing
on cardiovascular disease performance payments related to screening and the record-
ing of risk factors. Before implementing the incentive payment program, practices
had been reimbursed for holding health promotion clinics. After the shift to the new
approach, funds related to health promotion were found to be more evenly distrib-
uted across practices, but practices in areas with the highest measured need lost
funds relative to other practices. In general, the resulting distribution of payments
was unrelated to need or treatment.
Also in the United Kingdom, Srirangalingam, Sahathevan, Lasker, and Chowdhury
(2006) analyzed how referral patterns for diabetes care changed after introduction of the
NHS’s new financial reward system. Referrals from primary care to a hospital-based
diabetes service were tracked from November 2003 through November 2004, before
and after implementation of the new incentive system. The study setting was a poor sec-
tion of London. There was no significant impact on the total number of referrals to the
specialty clinic, but there was a significant increase in referrals for poor glycaemic con-
trol. The authors concluded that the P4P initiative led to an increase in referrals for
patients with unacceptable control along with a lower threshold for referrals.
Sutton and McLean (2006) assessed factors related to quality scores under a new
U.K. primary medical care contract. Data were analyzed for 60 practices in two NHS
areas in Scotland serving a population of 367,000. Linear regression analysis was
used to relate quality scores to various characteristics of the population, practitioner,
and practice. The most relevant finding is that practices with higher incomes from
other sources had lower quality scores. The authors speculated that the incentive
effect of the new contract was weaker when income from other sources makes up a
larger portion of practice income.
30S Medical Care Research and Review

Documentation Improvements
Simpson, Hannaford, Lefevre, and Williams (2006) analyzed the impact of a new
payment scheme for general practitioners on recording of quality indicators for
patients with stroke. The new payment system, introduced in Scotland in 2004, pro-
vided payments to practices that developed an accurate register of stroke patients and
for the recording of smoking habits, blood pressure, and cholesterol levels. There
were also payments for reaching targets with respect to blood pressure control and
other outcomes. Retrospective data from 310 (self-selected) of Scotland’s 850 prac-
tices were obtained from a central database in 2005, including data for 1 year before
the new incentive system was introduced and 1 year after. Binary logistic regression
was used to calculate odds ratios for recording of data. Documentation increased
from 32.3% to 52.1%. There was a large increase in record keeping among the old-
est patients and most affluent patients. Women had larger increases in documenta-
tion than did men. The authors noted that inequitable recording still persists, with
lower recording for women, older patients, and more deprived patients.

Effect on Physician Motivation or Satisfaction


In an ethnographic study of two physician practices, McDonald, Harrison,
Checkland, Campbell, and Roland (2007) reported that the financial incentives in the
P4P program in the United Kingdom did not damage the internal motivation of gen-
eral practitioners, nor did physicians question the quality targets or their implications.
Gene-Badia et al. (2007) examined the impact of a financial incentive program
that rewarded primary care teams in Catalonia, Spain, for achieving a set of clinical
objectives and also for participating in a professional development program. They
found limited effects on the quality of professional life and on patient satisfaction.

Implications

In this article, we reviewed published studies that assessed the effectiveness of


relatively large-scale, purchaser-driven P4P programs intended to improve quality of
care. In some studies, there was improvement in selected outcome measures.
Overall, however, the findings from these evaluations are quite difficult to interpret.
Most reported significant impacts, but it was often the case that P4P initiatives were
part of larger quality improvement efforts, making it difficult to assess the indepen-
dent effect of the financial incentives. And the small number of evaluations of dif-
ferent P4P initiatives directed at hospitals cautions against drawing definitive
conclusions regarding the potential for P4P to improve the quality of inpatient care.
While the findings regarding the impact of purchaser P4P initiatives on quality
measures are somewhat equivocal, these initiatives do provide guidance to public
Christianson et al. / Pay-for-Performance Programs 31S

and private purchasers regarding the design and implementation of P4P initiatives
(Christianson et al., 2008). For instance, while most initiatives rewarded perfor-
mance based on the relative rankings of providers, the experience of initiatives
where rewards were based on predetermined benchmarks is instructive for pur-
chasers for at least two reasons. First, when benchmarks are used, the bulk of reward
dollars, at least initially, is likely to flow to providers who have historically provided
high-quality care. In effect, these providers are being rewarded for past performance
as opposed to improvements in the quality of care they provide. It is interesting,
however, that in some P4P initiatives providers not reaching the benchmark levels of
performance did demonstrate improvement. Paying rewards based on benchmarks
also raises a second issue for purchasers. While benchmarks are relatively simple
to administer and explain to providers, they introduce budgeting uncertainty.
For instance, in the U.K. P4P initiative, the performance of general practitioners
exceeded predetermined benchmarks by a substantial amount in the first year, strain-
ing the budget of the NHS that had been allocated toward P4P (Galvin, 2006). One
interpretation was that the benchmarks were set too low because the data used in
establishing them were somewhat out of date (Galvin, 2006).
The evaluations of P4P initiatives also underscore the importance for purchasers
of allocating sufficient time and resources to ongoing management of their P4P ini-
tiatives. Adequate funds need to be allocated for communication with providers
regarding how their performance will be measured and how reward monies will be
distributed (Felt-Lisk et al., 2007). Also, performance metrics need to be monitored
on a continuous basis. As providers improve their performance, some metrics may
need to be replaced to avoid “ceiling effects,” while others may need to be adjusted
to conform to new information that alters recommended treatment processes. And
unless adequate attention is devoted to risk adjustment of performance metrics, the
credibility of P4P programs with providers can suffer.
The evaluations of P4P initiatives also suggest that initial improvements in per-
formance relative to quality measures may not always reflect actual improvements
in quality. For instance, improved performance on measures can be the result of
better documentation by providers of care they are delivering already. In fact, at the
beginning of any P4P initiative, this may be the least expensive and most effective
response on the part of providers seeking to share in P4P rewards. The “gaming” of
P4P rules also can be expected during early periods of P4P initiatives. In the United
Kingdom, there is suggestive evidence that some physicians took inappropriate
advantage of rules that allowed them to exempt certain patients from their panels for
measurement purposes and that this had a significant positive impact on their scores
and the P4P funds that they received (Doran et al., 2006).
There are large P4P programs underway in the United States and United Kingdom,
with more evaluations likely to appear in the peer-reviewed literature in the near
future. Because of the variation in the way in which these programs have been
32S Medical Care Research and Review

designed and implemented, synthesizing their findings to provide useful guidance to


decision makers will be challenging (Dudley, 2005). It will be especially important to
have comprehensive reporting of results in future studies (not limiting results to a sub-
set of quality measures rewarded by payers), accompanied by complete descriptions
of the study context, structure of payments, and possible confounding factors. Even
so, it seems likely that determining the marginal effect of payment incentives will not
be possible for most P4P initiatives. Purchasers typically implement various different
components simultaneously in their efforts to influence quality. Unless there is some
sequencing in the implementation of these components, there will be limited oppor-
tunity for evaluations to assess the impact of financial incentives. Even when there
is sequencing, the estimated incremental impact of financial incentives is likely to
depend on which components already have been implemented. There is great poten-
tial for purchasers to learn from process evaluations of ongoing P4P efforts with par-
ticular attention to accurate documentation of costs incurred by all parties as well as
continued tracking of outcomes. Studies that compare implementation experiences
across similar purchase P4P initiatives could be especially productive.

References
Achat, H., McIntyre, P., & Burgress, M. (1999). Health care incentives in immunization. Australian and
New Zealand Journal of Public Health, 23, 285-288.
Amundson, G., Solberg, L. I., Reed, M., Martini, E. M., & Carlson, R. (2003). Paying for quality
improvement: Compliance with tobacco cessation guidelines. Joint Commission Journal on Quality
& Safety, 29, 59-65.
Anderson, K. K., Sebaldt, R. J., Lohfeld, L., Burgress, K., Donald, F. C., & Kaczorowski, J. (2006). Views
of family physicians in southwestern Ontario on preventive care services and performance incentives.
Family Practice, 23, 469-471.
Armour, B. S., Pitts, M. M., Maclean, R., Cangialose, C., Kishel, M., Imai, H., et al. (2001). The effect
of explicit financial incentives on physician behavior. Archives of Internal Medicine, 161, 1261-1266.
Beaulieu, N. D., & Horrigan, D. R. (2005). Organizational processes and quality. Putting smart money to
work for quality improvement. HSR: Health Services Research, 40, 1318-1334.
Berthiaume, J. T., Chung, R. S., Ryskina, K. L., Walsh, J., & Legoratta, A. (2006). Aligning financial
incentives with quality of care in the hospital setting. Journal for Health Care Quality, 28, 36-50.
Berwick, D. M. (1995). The toxicity of pay for performance. Quality Management in Health Care, 4, 27-33.
Berwick, D. M. (2008). The science of improvement. Journal of the American Medical Association, 299,
1182-1184.
Bodenheimer, T., & Grumbach, K. (1996). Keeping your head in changing times. Journal of the American
Medical Association, 276, 1025-1031.
Bodenheimer, T., May, J. H., Berenson, R. A., & Coughlan, J. (2005). Can money buy quality? Physician
response to pay for performance (Issue Brief No. 102). Washington, DC: Center for Studying Health
System Change.
Bokhour, B. G., Burgress, J. F., Jr., Hook, J. M., White, B., Berlowitz, D., Gulden, M. R., et al. (2006).
Incentive implementation in physician practices: A qualitative study of practice executive perspectives
on pay for performance. Medical Care Research and Review, 63, 73S-95S.
Christianson et al. / Pay-for-Performance Programs 33S

Cameron, P. A., Kennedy, M. P., & McNeil, J. J. (1999). The effects of bonus payments on emergency
service performance in Victoria. Medical Journal of Australia, 171, 243-246.
Campbell, S., Reeves, D., Kontopantelis, E., Middleton, E., Sibbald, B., & Roland, M. (2007). Quality of
primary care in England with the introduction of pay for performance. The New England Journal of
Medicine, 357, 181-190.
Casalino, L. P., Alexander, G. C., Hin, L., & Konetzka, R. T. (2007). General internists’ views on pay-for-
performance and public reporting of quality scores: A national survey. Health Affairs, 26, 492-499.
Christianson, J. B., Knutson, D. J., & Mazze, R. S. (2006). Physician pay-for-performance:
Implementation and research issues. Journal of General Internal Medicine, 21, S9-S13.
Christianson, J. B., Leatherman, S., & Sutherland, K. (2007). Paying for quality: Understanding and
assessing physician pay-for-performance initiatives (Research Synthesis Report No. 13). Princeton,
NJ: Synthesis Project, Robert Wood Johnson Foundation.
Christianson, J. B., Leatherman, S., & Sutherland, K. (2008). Financial incentives for health care
providers and quality improvement. London: Health Foundation.
Chung, R. S., Chernicoff, H. O., Nakao, K. A., Nickel, R. C., & Legorreta, A. P. (2003). A quality-driven
physician compensation model: Four-year follow-up study. Journal for Healthcare Quality, 25, 31-37.
Coleman, T., Wynn, A. T., Stevenson, K., & Cheater, F. (2001). Qualitative study of pilot payment aimed at
increasing general practitioners’ antismoking advice to smokers. British Medical Journal, 323, 432-435.
Curtin, K., Beckman, H., Pankow, G., Milillo, Y., & Greene, R. A. (2006). Return on investment in pay
for performance: A diabetes case study. Journal of Healthcare Management, 51, 365-376.
Doran, T., Fullwood, C., Gravelle, H., Reeves, D., Kontopantelis, E., Hiroeh, U., et al. (2006). Pay-for-
performance programs in family practices in the United Kingdom. New England Journal of Medicine,
335, 375-384.
Dudley, R. A. (2005). Pay-for-performance research. How to learn what clinicians and policy makers need
to know. Journal of the American Medical Association, 294, 1821-1823.
Dudley, R. A., Frolich, A., Robinowitz, D. L., Talavera, J. A., Broadhead, P., Luft, H. S., et al. (2004).
Strategies to support quality-based purchasing: A review of the evidence (Technical Review No. 10,
AHRQ Publication No. 04-0057). Rockville, MD: Agency for Healthcare Research and Quality.
Ettner, S. L., Thompson, T. J., Stevens, M. R., Mangione, C. M., Kim, C., Steers, W. N., et al. (2006). Are
physician reimbursement strategies associated with processes of care and patient satisfaction for
patients with diabetes in managed care? HSR: Health Services Research, 41, 1221-1241.
Felt-Lisk, S., Gimm, G., & Peterson, S. (2007). Making pay-for-performance work in Medicaid. Health
Affairs—Web Exclusive, 26, w516-w527.
Ferman, J. H. (2004). Pay for performance: Obstacles/implications. Despite challenges, pay-for-
performance programs are here to stay. Healthcare Executive, 19, 44, 46.
Fisher, E. S. (2006). Paying for performance—Risks and recommendations. New England Journal of
Medicine, 355, 1845-1847.
Galvin, R. (2006). Pay-for-performance: Too much of a good thing? A conversation with Martin Roland.
Health Affairs—Web Exclusive, 25, w412-w419.
Galvin, R., & Milstein, A. (2002). Large employers’ new strategies in health care. New England Journal
of Medicine, 347, 939-942.
Gene-Badia, J., Escaramis-Babiano, G., Sans-Corrales, M., Sampietro-Colom, L., Aguado-Menguy, F.,
Cabezas-Peña, C., et al. (2007). Impact of economic incentives on quality of professional life and on
end-user satisfaction in primary care. Health Policy, 80, 2-10.
Glickman, S. W., Ou, F.-S., DeLong, E. R., Roe, M. T., Lytle, B. L., Mulgund, J., et al. (2007). Pay for
performance, quality of care, and outcomes in acute myocardial infarction. Journal of the American
Medical Association, 297, 2373-2380.
Greene, R. A., Beckman, H., Chamberlain, J., Partridge, G., Miller, M., Burden, D., et al. (2004).
Increasing adherence to a community-based guideline for acute sinusitis through education, physician
profiling and financial incentives. American Journal of Managed Care, 10, 670-678.
34S Medical Care Research and Review

Grossbart, S. R. (2006). What’s the return? Assessing the effect of “pay-for-performance” initiatives on
the quality of care delivery. Medical Care Research and Review, 63, 29S-48S.
Hillman, A. L., Ripley, K., Goldfarb, N., Nuamah, I., Weiner, J., & Lusk, E. (1998). Physician financial
incentives and feedback: Failure to increase cancer screening in Medicaid managed care. American
Journal of Public Health, 88, 1699-1701.
Hillman, A. L., Ripley, K., Goldfarb, N., Weiner, J., Nuamah, I., & Lusk, E. (1999). The use of physician
financial incentives and feedback to improve pediatric preventive care in Medicaid managed care.
Pediatrics, 104, 931-935.
Hofer, T. G., Hayward, R. A., Greenfield, S., Wagner, E. H., Kaplan, S. H., & Manning, W. G. (1999).
The unreliability of individual physician “report cards” for assessing the costs and quality of care of
a chronic disease. Journal of the American Medical Association, 281, 2098-2105.
Institute of Medicine. (2006). Pathways to quality health care. Rewarding provider performance. Aligning
incentives in Medicare. Washington, DC: National Academies Press.
Kahn, C. N., III, Ault, T., Isenstein, H., Potetz, L., & Van Gelder, S. (2006). Snapshot of hospital quality
reporting and pay-for-performance under Medicare. Health Affairs, 25, 148-162.
Keating, N. L., Landon, B. E., Ayanian, J. Z., Borbas, C., & Guadagnoli, E. (2004). Practice, clinical man-
agement, and financial arrangements of practicing generalists: Are they associated with satisfaction?
Journal of General Internal Medicine, 29, 410-418.
Langham, S., Gillam, S., & Thorogood, M. (1995). The carrot, the stick and the general practitioner: How
have changes in financial incentives affected health promotion activity in general practice? British
Journal of General Practice, 45, 665-668.
Larsen, D. L., Cannon, W., & Towner, S. (2003). Longitudinal assessment of a diabetes care management
system in an integrated health network. Journal of Managed Care Pharmacy, 9, 552-558.
Levin-Scherz, J., DeVita, N., & Timbie, J. (2006). Impact of pay-for-performance contracts and network
registry on diabetes and asthma HEDIS measures in an integrated delivery network. Medical Care
Research and Review, 63, 14S-28S.
Lindenauer, P. K., Remus, D., Roman, S., Rothberg, M. B., Benjamin, E. M., Ma, A., et al. (2007). Public
reporting and pay for performance in hospital quality improvement. New England Journal of
Medicine, 356, 486-496.
McDonald, R., Harrison, S., Checkland, K., Campbell, S. M., & Roland, M. (2007). Impact of financial
incentives on clinical autonomy and internal motivation in primary care: Ethnographic study. British
Medical Journal, 334, 1357-1362.
McElduff, P., Lyratzopoulos, G., Edwards, R., Heller, R. F., Shekelle, P., & Roland, M. (2004). Will
changes in primary care improve health outcomes? Modeling the impact of financial incentives intro-
duced to improve quality of care in the UK. Quality and Safety in Health Care, 13, 191-197.
McGlynn, E. A., Asch, S. M., Adams, J., Keesey, J., Hicks, J., DeCristofaro, A., et al. (2003). The qual-
ity of health care delivered to adults in the United States. New England Journal of Medicine, 348,
2635-2645.
Morrow, R. W., Gooding, A. D., & Clark, C. (1995). Improving physicians’ preventive health care behav-
ior through peer review and financial incentives. Archives of Family Medicine, 4, 165-169.
Nahra, T. A., Reiter, K. L., Hirth, R. A., Shermer, J. E., & Wheeler, J. R. C. (2006). Cost-effectiveness of
hospital pay-for-performance incentives. Medical Care Research and Review, 63, 49S-72S.
Palsbo, S. E., Miller, V. P., Pan, L., Bergsten, C., Hodges, D. N., & Barnes, C. (1993). HMO industry pro-
file 1993 edition. Washington, DC: Group Health Association of America.
Pawson, R. (2003). Nothing as practical as a good theory. Evaluation, 9, 471-490.
Pawson, R., Greenhalgh, T., Harvey, G., & Walshe, K. (2005). Realist review—A new method of sys-
tematic review designed for complex policy interventions. Journal of Health Services Research and
Policy, 10, S1:21-S1:34.
Pawson, R., & Tilley, N. (1997). Realistic evaluation. London: Sage.
Christianson et al. / Pay-for-Performance Programs 35S

Petersen, L. A., Woodard, L. D., Urech, T., Daw, C., & Sookanan, S. (2006). Does pay-for-performance
improve the quality of health care? Annals of Internal Medicine, 145, 265-272.
Pink, G. H., Brown, A. D., Studer, M. L., Reiter, K. L., & Leatt, P. (2006). Pay-for-performance in pub-
licly financed healthcare: Some international experience and considerations for Canada. Healthcare
Papers, 6, 8-26.
Reschovsky, J. D., & Hadley, J. (2007). Physician financial incentives: Use of quality incentives inches
up, and productivity still dominates (Issue Brief No. 108). Washington, DC: Center for
Studying Health System Change. Retrieved January 5, 2007, from http://www.hschange.org/
CONTENT/905/?PRINT=1
Robinson, J. C. (2001). Theory and practice in the design of physician payment incentives. Milbank
Quarterly, 79, 149-177.
Roland, M. (2004). Linking physicians’ pay to the quality of care—A major experiment in the United
Kingdom. New England Journal of Medicine, 251, 1448-1454.
Rosenthal, M. B., Fernandopulle, R., Song, H. R., & Landon, B. (2004). Paying for quality: Providers’
incentives for quality improvement. Health Affairs, 23, 127-141.
Rosenthal, M. B., & Frank, R. G. (2006). What is the empirical basis for paying for quality in health care?
Medical Care Research and Review, 63, 135-157.
Rosenthal, M. B., Frank, R. G., Li, Z., & Epstein, A. M. (2005). Early experience with pay-for-perfor-
mance: From concept to practice. Journal of the American Medical Association, 294, 1788-1793.
Rosenthal, M. B., Landon, B. E., Normand, S.-L. T., Frank, R. G., & Epstein, A. M. (2006). Pay for per-
formance in commercial HMOs. New England Journal of Medicine, 355, 1895-1902.
Scott, A., & Hall, J. (1995). Evaluating the effects of GP remuneration: Problems and prospects. Health
Policy, 21, 183-195.
Simpson, C. R., Hannaford, P. C., Lefevre, K., & Williams, D. 2006. Effect of the UK incentive-based
contract on the Management of patients with stroke in primary care. Stroke, 37, 2354-2360.
Smith, P., & York, N. (2004). Quality incentives: The case of UK general practitioners—An ambitious UK
quality improvement initiative offers the potential for enormous gains in the quality of primary health
care. Health Affairs, 23, 112-118.
Srirangalingam, U., Sahathevan, S. K., Lasker, S. S., & Chowdhury, T. A. (2006). Changing pattern of
referral to a diabetes clinic following implementation of the new UK GP contract. British Journal of
General Practice, 56, 624-626.
Sutton, M., & McLean, G. (2006). Determinants of primary medical care quality measured under the new
UK contract: Cross sectional study. British Medical Journal, 332, 389-390.
Town, R., Kane, R., Johnson, P., & Butler, M. (2005). Economic incentives and physicians’ delivery of
preventive care. A systematic review. American Journal of Preventive Medicine, 28, 234-240.
Town, R., Wholey, D. R., Kralewski, J., & Dowd, B. (2004). Assessing the influence of incentives on
physicians and medical groups. Medical Care Research and Review, 61, 80S-118S.
Young, G. J., & Conrad, D. A. (2007). Practical issues in the design and implementation of pay-for-
quality programs. Journal of Healthcare Management, 52, 10-19.
Young, G. J., Meterko, M., White, B., Bokhour, B. G., Stautter, K. M., Berlowitz, D., et al. (2007).
Physician attitudes toward pay-for-quality programs: Perspectives from the front line. Medical Care
Research and Review, 64, 331-343.

You might also like