Realistic Evaluation - Review of Tilley & Pawson

Evaluationhttp://evi.sagepub.
com
Reviews
Evaluation 1998; 4; 234
DOI: 10.1177/13563899822208446
The online version of this article can be found at:

http://evi.sagepub.com/cgi/content/abstract/4/2/234
Published by:
http://www.sagepublications.com
On behalf of:
The Tavistock Institute
Additional services and information for Evaluation can be found at:
Email Alerts: http://evi.sagepub.com/cgi/alerts
Subscriptions: http://evi.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

© 1998 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
Reviews
Evaluation
Copyright 앿 1998
SAGE Publications (London,
Thousand Oaks and New Delhi)
[1356–3890 (199804) 4:2; 234–246; 005133]
Vol 4(2): 234–246
The Reviews section features in occasional issues of the journal. We take a

pluralistic and transdisciplinary approach, including within the coverage not
only books but also ‘grey’ material of contemporary relevance to the diverse
evaluation community. ‘Classic’ texts may also be revisited for their
contemporary significance.
We encourage readers to enter into dialogue and exchange—whether
through providing feedback on the coverage of evaluation texts and the
extent to which the selection caters for the diverse interests of different
evaluation communities, or by suggesting texts for review. We hope that
review articles will provoke reflection and debate among the readership,
which will be given space in the journal.
Reviews Editor
Utilization-Focused Evaluation: The New Century Text,

3rd edn, Michael Quinn Patton. London: Sage, 1997.
£25.00. ISBN: 0–8039–5265–1.
This is the third edition of Michael Quinn Patton’s Utilization-Focused Evaluation.

However, potential readers familiar with other editions (the first was published in 1978
based on research conducted in 1975) should be in no doubt that the third edition is
worth reading in its entirety, or, at the very least, worth an intellectual ‘graze’. For
those unfamiliar with the other editions, I will review the book as if coming to Patton’s
work for the first time. I will mention aspects of the general tone of the book and go on
to outline its structure and underlying theme to give a flavour of the way the book
works and what it can offer the reader. The book glows with Patton’s evaluation
experience. It is theoretical, evocative and persuasive, yet able to furnish an evaluator
with valuable tips for Monday morning.
The book is generous to both experienced and novice evaluators alike, but in
strikingly different ways. The book allows an insight into the approaches, methods and
considerations that have characterized Patton’s work over the last 25 years. Through a
range of styles and inclusions, Patton offers advice on and embodiments of evaluation
practice across a wide spectrum of experiences in both public and private domains.
These inclusions include resources from evaluation projects in education, health,
criminal justice, agriculture, energy conservation, community development, corporate
planning, human services, poverty programmes, leadership development, wilderness
234

Reviews
experience, housing, staff training and mental health. The book is generous to the
established evaluation community in the way it presents and uses the work of
evaluators with quite different perspectives and operational criteria. It does not score
points or resort to caricature in order to persuade the reader of the strength of its
position over others. In places it leaves a debate in the air in order to allow the reader to
arrive at a conclusion on the relative persuasiveness of the arguments. Finally, while
the book is replete with the authority of Patton’s experience, the reader does not get an
impression of self-importance or pomposity. In many ways, Patton’s style is self-
effacing and down-to-earth. The book does not intimidate but encourages.
Tone and Style

The process of engagement with the text of this book is a ‘readerly’ experience. It is
written in a variety of styles. Each form of writing fulfils a different function in the text.
Written in first person, almost conversational style, the discursive core of the book
provides a framework in which literature is synthesized, experience is outlined and
main arguments unfold. At the outset of each section, Patton includes a homily,
anecdote or parable, sometimes from the mysterious Halcolm, which sets the scene in
non-technical metaphoric language. Within the text, Patton uses two communicative
devices to aid understanding and to contribute to the professional development of
evaluation. The first is the Exhibit. The fifty or so Exhibits are ‘evaluation goods’
which bring the text alive by embodying an idea, a schema, some principles, a
newspaper report, a short self-contained description, some data, a table or a figure. It
helps the reader situate a point in real time and place. It also helps the reader to keep an
eye on Patton’s ball, which is always how to enrich and enhance the use of evaluation
products. An example will illustrate this situating process. In a subsection titled
‘Beyond Technical Expertise’, Patton argues that decisions about methods are never
purely technical. They have a technical dimension but the ‘sociology of method’
intervenes with issues of constrained resources, conflicting priorities and value
preferences. His point is that the only reason to debunk that particular myth is to
enhance use. We can argue with the view that this is the ‘only reason’ but he notes that
use is:
. . . enhanced when practitioners, decision makers, and other users fully understand the
strengths and weaknesses of evaluation data, and that such understanding is increased by
being involved in making methods decisions. (p. 243)
This argument is supported by Exhibit 11.1, titled ‘Reasons Primary Users Should be
Involved in Methods Decisions’. Eight reasons are outlined which capture the
argument, and which could be used as a summary and for evaluation training
purposes.
The second device is the Menu which is described as a tool for working with
stakeholders in selecting evaluation decision options. The audience for the menu is
ultimately the evaluation user but it is provided in the text as a contribution to the
evaluating armoury of the reader. Essentially, menus offer the reader the range of
235

Evaluation 4(2)
options in a domain of evaluation practice. For example Menu 13.1 provides a list of
ten points of comparison that the outcomes of a programme might make for evaluation
purposes, the first three of which are reproduced below:
(1) The outcomes of selected ‘similar’ programmes.

(2) The outcomes of the same programme the previous year (or any other trend
period).
(3) The outcomes of a representative or random sample of programmes in the field
(p. 314).
The Menus provide the reader with useful starting points for professional decision
making. Alongside these communicative devices are visual jokes and cartoons, each of
which make a point about evaluation use. The different forms of writing make
Utilization-Focused Evaluation an unusually communication-rich text.
Structure and Central Theme

The book is structured in four parts, each of the four parts has five chapters, each
chapter, up to twelve subsections. The preface of the book sets out Patton’s stall. Patton
mentions (p. xiv) other editions of the book and uses the metaphor of a youngster
maturing to adulthood. The first book was hectoring and slightly shrill in tone because,
as he puts it, ‘nobody seemed to be paying attention’. The second is written more
confidently but is still alternatively ‘brash and shy, assertive and uncertain, like an
adolescent coming of age’. This third edition attempts the ‘more mature tone of the
elder’.
Patton suggests that while the evaluation profession might be characterized by its
rapidity of growth and a ‘clear track record of contributions to show’, the central
challenge to professional evaluation practice remains ‘doing evaluations that are useful
and actually used’. This is the core message of the book, it infuses both content and
style (as I suggest above). However, it does not dodge the central theoretical and
practical problems associated with defining the balance between producing useful
evaluations in order to have an ‘effect’ and sustaining what constitutes a professionally
appropriate relationship with key stakeholders during the process of evaluation design,
evidence gathering, analysis, reporting and use. If an evaluator adopts the view, as
Patton does, that evaluations ought to be useful and that much of what has passed for
programme evaluation has not been very useful, what are the repercussions for the
evaluation community? It is to this question the book turns in its final pages. However,
I jump ahead.
Part One, titled ‘Toward More Useful Evaluations’, contextualizes the thrust of the
book by reference to some of the key transformative processes in recent years. These
processes are from both within and without the professional evaluation domain. He
charts the promise and the disappointment of evaluation as calls for system and
programme accountability become endemic. This crisis of ‘given’ authority which
gives evaluation its mandate is accompanied by the professionalization of evaluation
through the development of ‘professional’ values, standards and codes of ethics which
236

Reviews
provide an occupational boundary for its practice. To some extent, this professional
self-consciousness is more advanced in the US and Canada than in other countries.
However, with the establishment of Evaluation Societies worldwide and the global rise
in public accountability processes, the professionalization to which Patton refers is
becoming part of the landscape.
The scene set, action moves quickly through a plot which is shaped by some clear
operational parameters. To some extent Part One and Part Four titled ‘Realities and
Practicalities of Utilization-Focused Evaluation’ both deal with the ethical, political
and strategic contexts for the realization of this approach. Patton concludes the book
with 14 fundamental premises which underscore utilization-focused evaluation.
I will offer a flavour of the feast by referring to an issue which emerges at the
beginning of the book, hovers in the background in the middle two parts, then comes
crashing back into the foreground in the last part. In Patton’s vision, utilization-focused
evaluation is the supreme ‘underlabourer’. It does not imply or advocate any evaluation
model, method, theory or use. Its shaping rationale is provided by its relationship with
‘primary intended users’. Its mission is to help users to select an evaluation most
appropriate for their particular situation. It can be moulded by any purpose, carried out
by any agency, conducted at any time, targeted at any focus, involve the collection of
any kind of evidence and under the influence of any design theory. What is not an
option is that it is not utilized. These precepts are captured in the following extract:
Utilization-focused evaluation is a process for making decisions about these issues [cf. those
mentioned above] in collaboration with an identified group of primary users focusing on their
intended uses of evaluation. (p. 22)
At first sight being a ‘supreme underlabourer’ presents a serious ethical problem for a
utilization-focused evaluator. As Patton himself acknowledges, this issue needs to be
given ‘serious consideration’. Ernest House (quoted on p. 365) accuses utilization-
focused evaluation as having an unacceptable ethic if taken to its logical conclusion. He
has called this position ‘clientism’. House demands ‘What if the client is Adolph
Eichman, and he wants the evaluator to increase the efficiency of his concentration
camps?’. How does Patton respond to this predicament? House argues that he must
have other criteria at work than the intended uses of primary users. Patton does,
although they remain rather shadowy and ill-defined. His response connects with an
important dimension of his approach to the notion of audience. Critical of an abstract or
vague interpretation of audience, Patton emphasizes what might be called the sociology
of evaluation use. His use of the concept of audience is highly situated with particular
people at particular times. To use yet another metaphor, use is ‘nested’ in the concerns
of people with specific interests. It is the quality of the work with these people ‘which is
the key to specifying and achieving intended uses’ (p. 43). Patton goes further by
suggesting in Menu 3.1 (p. 54) that evaluators should find and cultivate people who
want to learn. Given the centrality of what Patton calls the ‘personal factor’, he
acknowledges the importance of project and stakeholder selection, presumably by the
evaluator. Further, he acknowledges the evaluator as a moral and ethical stakeholder
who brings professional standards and ‘his or her own sense of morality and integrity’
(p. 364) to the negotiation with primary users. In a transcript of a debate with
237

Evaluation 4(2)
colleagues on these issues, Patton includes an exchange with Ross Connor, a former
president of the American Evaluation Association, which problematizes this issue:
’You can pick and choose clients, right?’ I affirmed that I did. ‘My concern’ he replied,
‘would be those who don’t have the luxury of picking and choosing who they work with’. (p.
364)
Patton, then, makes his ethical and moral stand at the point at which he is deciding
whether or not to work with a client. Once he makes that decision, although he does not
state explicitly on what basis that decision is made, the primacy of the user comes into
play. Is it uncharitable to suggest that by the time these various selections and
negotiations have been completed, work with the primary users and their inclinations
are shaped by ethical and moral considerations which are not derived from them but
from the evaluator? In the world I inhabit, the luxury of client selection has not quite
arrived. However, Patton’s position is operationalizable, perhaps in muted form, by all
of us in our selection of bids to make, invitations to tender we choose to follow-up and
estimations of the extent to which we can negotiate the evaluation brief with a user
such that they can get the most out of its use and we can exercise ‘professional
standards’ whatever they may be. It is typical of the generosity of the book that the
resources for this debate are present for readers to generate their own perspectives on
this dimension of utilization-focused evaluation. However, this ethical dimension is not
the real ‘Achilles’ heel’ of utilization-focused evaluation. Much more of a problem,
Patton argues, is the turnover of primary intended users. The ‘personal factor’ is
important for utilization-focused evaluation in which the relationship between the
primary users and the evaluators are nurtured by, ‘getting them to commit time and
attention . . . dealing with the political dynamics, building credibility and conducting
the evaluation in an ethical manner’. When there is primary user turnover, all these
negotiations might need to be recast involving adjustments to timing, reporting and use.
Before leaving this issue of ‘audience’, I should mention a further worry with the
highly situated and narrow definition of intended user and intended use. What does it
say about the ‘responsibility’ of the evaluator to a wider constituency, the public, for
example? It seems entirely reasonable that the evaluation of a programme in the public
domain, funded by public money, should have a sense of evaluation use which is
beyond the immediate and specific users of the evaluation and connects to the
responsibility of an evaluation to provide a public account of the effects of using public
money in a particular way. I suppose it might be argued that such issues are established
before the evaluation begins at the ‘client selection’ stage or that the ‘public’ is
represented by proxy in such cases by public officials. However, the thrust and vigour
of Patton’s position leads me to believe that, in reality, such distinctions will be lost.
The central two parts of the book provide frameworks, examples and operating
principles for the evaluation process. I will end this review by giving some tasters from
the extensive list of ingredients. Part Two, titled ‘Focusing Evaluations: Choices,
Options and Decisions’, covers questions associated with managing the evaluation
process. It ranges from issues associated with the way in which evaluations might be
oriented in the context of shifting situational factors, through considerations of the way
programme goals might shape evaluation processes, theories of action are incorporated
238

Reviews
into evaluation assumptions and programme implementation might be charted to how
evaluation evidence might be connected to causal explanation.
Part Three is titled ‘Appropriate Methods’ and Patton is unequivocal about his
guiding principle. It is not derived from some methodological ideology or personal
inclination or an emotional/intellectual comfort zone masquerading as a philosophy of
social science. It is encapsulated in the following extract:
In utilization-focused evaluation, methods decisions, like decisions about focus and priority
issues, are guided and in-formed by our evaluation goal: intended use by intended users. (p.
241)
As I mention above, one of the great services this book offers is the demythologizing of
the technical dimension of method by the inclusion of users in decisions about the way
method is linked to evaluation outcomes and thus use. Patton considers it axiomatic
that he should work with primary stakeholders to consider the strengths and weak-
nesses of major design and measurement possibilities. This frame enables a sanguine
and optimistic stance on the so-called paradigm debate or paradigm wars which have
entered evaluation folklore. He offers a 10-part analysis of the reasons for the
‘withering’ of this debate. I will identify two developments from the list which
appealed to me and give readers a feel for the judicious tone of the discussion:
Evaluation has emerged as a genuinely interdisciplinary and multi method field of pro-
fessional practice.
Advance in methodological sophistication and diversity within both paradigms [sic quantita-
tive/experimental methods and qualitative/naturalistic methods] have strengthened diverse
applications to evaluation problems. (p. 293)
Rather than sterile debates concerning methods, as if it were possible to discuss them in
abstract, Patton situates the question of ‘What method?’ in the context of ‘Which
questions leading to what data leading to what uses by whom?’.
Utilization-Focused Evaluation is a big book, metaphorically and literally. Its range
is encyclopaedic but its conversations with the reader can be quite intimate. A reader
can dip into its pages to get a starting point on just about any issue, debate, example,
tool, approach or concept in the programme evaluation domain. It is an essential read
for anyone remotely interested in furthering their understanding of evaluation.
Murray Saunders
Centre for the Study of Education and Training
Lancaster University, UK
Statistics as Principled Argument, R. P. Abelson.

Hillsdale, NJ: Lawrence Erlbaum Associates, 1995.
$49.95 (hbk). $24.95 (pbk). ISBN: 0–8058–0528–1.
It’s a shame the title begins with ‘statistics’. First, statistics is viewed by many, perhaps
not completely without reason, as a partner of the devil. It is hardly surprising,
239

Evaluation 4(2)
therefore, that many otherwise intrepid researchers feel like crossing to the other side of
the street when such an ominous figure approaches. But in the present case, such a
response may not be necessary, or even advisable. For example, once past the title
page, a reader will discover that the book contains nary a formula of any kind, much
less any formulas of the grotesque, operating room, close-your-eyes variety. Second,
the majority of the volume’s content is almost as relevant to qualitative as to
quantitative data analysis and, in any case, is highly relevant to far broader topics than
statistics, such as program evaluation, science and critical thinking. Although the
volume teaches much about statistical practice, the volume’s primary goal is not so
much to teach statistical computations as to teach how to devise persuasive empirical
arguments, regardless of the number of numbers they contain.
The book’s main theses are that the purpose of empirical research is to create
principled arguments that are persuasive and that persuasiveness depends on five factors
labeled with the acronym MAGIC; which stands for magnitude, articulation, generality,
interestingness and credibility. The first of these five factors, magnitude, plays an
important role in principled argument because results (especially of statistical tests) are
likely to be misleading without information about effect size. Although several candi-
date measures of magnitude are available, Abelson forwards a novel measure he labels
‘causal efficacy’, which is equal to the size of an effect divided by the size of the cause.
Unlike traditional measures of magnitude, causal efficacy is directly related to the
‘rhetorical impact of a research result’ (p. 48). A large effect that results from a small
cause has the largest causal efficacy and therefore the greatest rhetorical impact, while a
small effect that results from a large input has the smallest causal efficacy and therefore
the least rhetorical impact. For example, the result found by Isen and Levin (1972) that
students given a cookie will volunteer an average of 69 minutes of their time compared
with an average of only 17 minutes volunteered by students not given a cookie has a
relatively high causal efficacy because the input is arguably smaller than the output (or as
Abelson, pp. 47–8, puts it ‘Working 52 extra minutes per cookie does not translate into
a living wage!’). On the other hand, the result that saccharin causes cancer in rats loses
much of its impact once we learn that, if humans were to consume as much saccharin as
was consumed in the experimental manipulation in the study, they would have to drink
800 cans of soda a day. While cancer is certainly a large and important outcome, 800
cans of soda is also a very large input.
The second of the five factors is articulation, which Abelson explicates by
distinguishing among ‘ticks’, ‘buts’ and ‘blobs’. In essence, a ‘tick’ (derived from tick
marks and articulation) is a ‘specific comparative difference’ (p. 105) such as a main
effect; a ‘but’ is a result, such as an interaction, that qualifies or delimits the generality
of a tick; and a ‘blob’ is a ‘cluster of undifferentiated research results’ (p. 105) such as
a statistically significant omnibus ANOVA which doesn’t reveal where differences
among means lie. To maximize clarity and parsimony, Abelson explains how to
describe sets of results in order to emphasize ticks, reduce buts and eliminate blobs. For
example, Abelson demonstrates how interactions can sometimes be turned into main
effects by either rearranging a table of results or by altering the types of comparisons
that are drawn. Based on the ease of the articulation of the results, Abelson also wisely
argues in favor of analyzing multiple dependent measures separately before, if ever,
240

Reviews
analyzing them together (such as with a multivariate procedure like MANOVA). Break
data into simple components and understand these components well, before aggregat-
ing the pieces and trying to understand the mass as a whole.
The third factor, generality, refers to the range of types of respondents, contexts and
treatments across which results hold. Assessing generality requires incorporating (or
uncovering) variations in respondents, contexts and treatments, and looking for
interactions in effects across these variations. In assessing interactions, one analytic
approach is to assume that treatment or context variations are ‘fixed’. The drawback to
this assumption is that the results can thereby be legitimately generalized only to the
specific treatment and context variations in the study (which is often far narrower than
desired). If the researcher wants to generalize to a broader population of treatment or
context variations, the analyst must assume that these variations are ‘random’. The
drawback here is that an analysis assuming random variations has far less power and
precision than an analysis with the fixed-effects assumption. Abelson discusses ways to
minimize this dilemma and demonstrates how the discovery of interactions can
sometimes lead to increased, rather than diminished, generality. An example comes
from research on the ‘risky-shift’ phenomenon wherein it was hypothesized that,
compared with individuals, groups prefer riskier actions. Subsequent inspection of the
data from which this phenomenon was initially inferred revealed that a risky shift
occurred in only 10 of the original 12 scenarios of risky choices, which reduced the
generality of the result. However, further exploration of this interaction led to the
discovery that, although not all of the 12 cases revealed a shift toward greater risk, they
could all be characterized as revealing a shift toward the most socially desirable action.
This discovery suggested the subsequent hypothesis that, compared with individuals,
groups shift preferences toward actions that are most socially favored and led to
research showing that such shifts occur in many other group decisions besides those
involving risky courses of action. Thus, what started out as a general result (the risky-
shift hypothesis) was subsequently shown to have less generality (when interactions
were discovered), but ultimately resulted in a theory of even greater generality.
Interestingness, the fourth factor that influences persuasiveness, is a function of both
surprise and importance. To be interesting, a new result must have the ability to change
one’s beliefs which, in turn, requires that the new result must differ from old beliefs. In
other words, the new result must be surprising. In addition, for a result to be interesting,
a potential change in beliefs must also have substantial consequences on other issues,
which means the new result must also be important. Interestingness decreases as the
number of replications of a study increases, because surprise decreases with each
replication, except to the extent that more interesting ‘buts’ (e.g. interactions) are added
and subsequently integrated. Therefore, interest in a research program is maintained
when results first produce a thesis, then an antithesis, and finally a synthesis. The risky-
shift example presented above follows this pattern. Another example comes from
Tesser (1990) in which the original thesis was the discovery that we are jealous of good
performances by our friends. The subsequent antithesis was the result that we can also
bask in the glory of our friend’s accomplishments. Finally, the synthesis was that we
are jealous when our self-esteem expects us to perform as well as our friends, but we
feel good for our friends if our performance would be irrelevant to our self-esteem.
241

Evaluation 4(2)
The last of the five factors is credibility, which is largely a function of the
(presumed) presence of methodological artifact. For those that can’t be prevented,
methodological artifacts can be ruled out by putting into competition the implications
of the artifact and the implications of the desired explanation (see also Reichardt and
Mark, 1997). For example, a study by Lord et al. (1984) ruled out demand
characteristics as an alternative explanation for attitude change by including an extra
comparison condition in which the experimental treatment was absent but demand
characteristics were strongly induced, and subsequently showing that this comparison
condition exhibited far less attitude change than the experimental condition. In the
absence of a well articulated alternative hypothesis, the credibility of an argument can
also be bolstered by the ‘method of signatures’ wherein ‘the specification of a
recognizable signature enhances the credibility of claims that particular underlying
processes are operative, much as would a coroner’s report (Scriven, 1974) of seven
different signs of a heart attack (and no signs of any other cause of death)’ (p. 184) (see
also Cochran, 1965; Cook and Campbell, 1979; Rosenbaum, 1995; Scriven, 1976; and
Webb et al., 1966). For example, the claim by Phillips (1977) that widely publicized
suicides increase the number of subsequent suicides via car crashes was strengthened
by showing, among other things, that more heavily publicized suicides were followed
by a greater number of fatal car crashes; and that car crashes involving a lone driver
increased, but crashes with passengers did not increase.
Both the teaching and the use of empirical research methods are often overly
mechanical and rule-bound. As a result, common sense and informed judgment are all
too often missing from data analysis and interpretation. Certainly, it is important to
understand well how to perform computations and implement research techniques, but
it is also important to have a thorough appreciation of an overarching logic of the
intelligent generation and interpretation of empirical evidence. Based on his discerning
analysis of such an overarching logic, Abelson suggests that a proficient empirical
researcher ‘must combine the skills of an honest lawyer, a good detective, and a good
storyteller’ (p. 16). Abelson’s volume is filled with much good insight and advice that
will help researchers hone these skills.
References
Cochran, W. G. (1965) ‘The Planning of Observational Studies of Human Populations (with
Discussion)’, Journal of the Royal Statistical Society, Series A 128: 234–66.
Cook, T. D. and D. T. Campbell (1979) Quasi-Experimentation: Design and Analysis Issues for
Field Settings. Chicago, IL: Rand McNally.
Isen, A. M. and P. F. Levin (1972) ‘The Effect of Feeling Good on Helping: Cookies and
Kindness’, Journal of Personality and Social Psychology 21: 384–8.
Lord, C. G., M. Lepper and E. Preston (1984) ‘Considering the Opposite: A Corrective Strategy
for Social Judgement’, Journal of Personality and Social Psychology 47: 1231–43.
Phillips, D. P. (1977) ‘Motor Vehicle Fatalities Increase Just After Publicized Suicide Stories’,
Science 196: 1464–5.
Reichardt, C. S. and M. M. Mark (1997) ‘Quasi-Experimentation’, in L. Bickman and D. J. Rog
(eds) The Handbook of Applied Social Research Methods. Thousand Oaks, CA: Sage.
Rosenbaum, P. R. (1995) Observational Studies. New York: Springer-Verlag.
242

Reviews
Scriven, M. (1974) ‘Evaluation Perspectives and Procedures’, in W. J. Popham (ed.) Evaluation
in Education: Current Applications, pp. 68–84. Berkeley, CA: McCutchan.
Scriven, M. (1976) ‘Maximizing the Power of Causal Investigations: The Modus Operandi
Method’, in G. V. Glass (ed.) Evaluation Studies Review Annual. Thousand Oaks, CA: Sage.
Tesser, A. (1990) ‘Interesting Models in Social Psychology: A Personal View’, invited address
presented at the meeting of the American Psychological Association (August), Boston.
Webb, E. J., D. T. Campbell, R. D. Schwartz and L. Sechrest (1966) Unobtrusive Measures:
Nonreactive Research in the Social Sciences. Chicago, IL: Rand McNally.
Charles S. Reichardt
Jenny A. Novotny
University of Denver, CO, USA
Realistic Evaluation, Ray Pawson and Nick Tilley.

London: Sage, 1997. 235 pp. £45.00 (hbk). £14.99
(pbk). ISBN 0–7619–5008–7 (hbk); 0–7619–5009–5
(pbk).
If anyone dares to elaborate a Canon of evaluation books, such as the one produced by
H. Bloom for literary works, most likely Ray Pawson and Nick Tilley’s Realistic
Evaluation will be one of the books to include in it. This is an original, ambitious,
provocative, witty and useful book which proposes a new and practical approach to
evaluation, providing also its epistemological foundations and a historical perspective
on evaluation research and practice. All this in less than 250 pages! The book is
described by its authors as ‘a stock-taking exercise and a manifesto concerning
evaluation’s quest for scientific status’ (p. xi) and it consists of an introduction and nine
chapters, the first two devoted to a critique and the rest to a detailed presentation of
‘realistic evaluation’.
The first chapter presents a brief history of evaluation research. This light history is
the way through which the authors ‘position’ their perspective. In the process, they
provide a useful overview but also some misunderstandings: the point of the method in
Campbell and Stanley is not to exclude ‘every conceivable rival’ causal agent (p. 5) but
those that are plausible, and their ‘whole point’ is not ‘to make the basic causal
inference secure’ (p. 6) but as solid as possible (Campbell, like Popper, was a
fallibilist). Furthermore, the valuable discussion could greatly benefit from Albert
Hirschman’s work on reactionary and progressive rhetoric (Hirschman, 1991).
Chapter 2 is a ‘constructive critique’ of the experimental tradition in evaluation. The
authors attack the ‘successionist’ (or external) approach to causality used by experi-
mental evaluation, contrasting it with the ‘generative’ metatheory of causation. They
develop their critique through a detailed analysis of two relevant examples drawn from
the field of criminology. This distinction between different views of causality
corresponds to a philosophical discussion that was started by Aristotle. Pawson and
Tilley follow Harré’s version (although not mentioning the fundamental work of Piaget
243

Evaluation 4(2)
in this field). What is emphasized is the real connection between events causally
connected and the importance of contextual factors.
The authors’ argument is that by its very logic, experimental evaluation (including
quasi-experimental designs) does not (and cannot) take into account in an appropriate
way either the key mechanisms linking programs with outcomes or the richness of
heterogeneous contexts. Given these limitations, experimental evaluation, according to
the authors, yields very little in terms of learning about programs (their interest is much
more directed towards formative rather than summative evaluation). This critique,
which might not be fully applicable to some recent versions of the quasi-experimental
approach (e.g. Mohr, 1996), paves the way for the authors’ ‘particular contribution to
evaluation’, i.e. ‘realistic evaluation’, which is presented in the remaining chapters of
the book.
Chapter 3 introduces what the authors consider as their epistemological foundation:
realism. It is interesting to note that, almost simultaneous with the publication of this
book, one of the few economists who is a well respected specialist in methodological
issues has also published a book on economics and methodology which uses the same
variant of realist philosophy of science, inspired by the works of R. Bhaskar (Lawson,
1997). Economics is a discipline to which practically no reference is made in Pawson
and Tilley’s book, although some elementary cost-benefit analysis and utility maxi-
mization are mentioned (p.122). Ray Rist has suggested that ‘evaluation and economics
will have to find new common ground’ (Rist, 1995: 171). Perhaps this variant of
realism can help to establish a bridge between the two disciplines.
Although the discussion about the realist foundations might appear somewhat
abstract for the reader unfamiliar with this type of literature, the presentation is well
illustrated by the development of an example, and it boils down to a simple and
powerful scheme for the explanation of the way(s) programs work: ‘causal outcomes
follow from mechanisms acting in contexts’ (p. 58). Pawson and Tilley sum up their
argument using the neat formula:
Outcomes ⫽ Mechanisms ⫹ Context
It might be more appropriate, though perhaps less clear for some readers, to express it
in functional form, with outcomes (O) depending not necessarily in an additive way on
both the mechanisms (M) and the context (C):
O ⫽ f (M, C)
One achievement of Pawson and Tilley is that, in this way, they have been able to
provide useful guidance to the process of hypothesis making, to the ‘context of
discovery’, which for Popper and others remained as a given, as the realm of creativity,
corresponding at best to psychology. Their orientation might seem simple but, as
shown in the detailed examples of the book, it leads the evaluator to focus their
attention on the mechanisms, contexts and outcomes, and on their relationships. The
context–mechanisms–outcomes patterns (or ‘CMO configurations’) are the core of the
realistic evaluation approach proposed by Pawson and Tilley, and are considered to be
the type of ‘theory’ that leads realistic evaluation research. Realistic evaluation is
presented as ‘theory-driven’ rather than ‘method-driven’. The usage of the term
244

Reviews
‘theory’ to refer to these ‘configurations’ might surprise some readers. But, as the
authors point out, ‘theory’ has been used with different meanings (13 meanings are
given on p. 120 and another appears on p. 192, including ‘conceptual framework’,
which might be the one to which the CMO patterns correspond more closely). The
principal task of the realistic evaluator would be to generate hypotheses about potential
CMO configurations (p. 104). In order to do this, Pawson and Tilley recommend
drawing on the folk wisdom of practitioners and the formal knowledge contained in the
academic literature.
It is interesting to observe that the authors see themselves as ‘whole-heartedly
pluralists when it comes to the choice of method’ proposing the use of multi-method
data collection and analysis on Mechanisms, Contexts and Outcomes (p. 85). A
fundamental question in realistic evaluation is ‘what works for whom in what
circumstances’, the three Ws (pp. 125, 210 and 144, where this question is referred to
as ‘the realist mantra’) or ‘what is it about the program that works for whom?’ (p. 109).
This is preferred to questions concerning the program as a whole, such as ‘does it
work?’. This is an important reminder that programs should not be considered as a
‘black box’ (frequently having mixed outcomes rather than uniform results), and that
this diversity is important from a learning perspective. It should also be recognized
that, in some contexts (for example, in the case of programs funded by financial
institutions), it is important to be able to make statements about overall program
performance which would require some sort of quantitative assessment and aggrega-
tion of partial results. Furthermore, although the authors point out the importance of
looking at the differences in levels of success achieved within programs for different
subgroups (p. 114), it might also be the case that more than one mechanism is at work
for each subgroup, thus results might be mixed even within the same subgroup. For
example, the case of the CCTV program described on pp. 78–79 includes eight
mechanisms, and therefore there could be subgroups for which some of the eight
mechanisms work and some do not. Thus, there may also be a need for aggregation
even at the subgroup level.
Pawson and Tilley apply their framework to the important issue of creating
cumulative bodies of evaluation research as a way to feed evaluation knowledge into
improvements in policy and practice. ‘Realistic cumulation’ consists of progressively
refining the understanding of the CMO patterns. Cumulation proceeds by abstracting
patterns of contexts, mechanisms and outcomes at different levels, thus developing a
typology of CMO configurations. Cumulation of this kind is presented as ‘theory
development’.
In the context of cumulation, ‘replication’ is discussed through the use of detailed
and interesting examples drawn from criminology. The authors do not elaborate on the
important practical consequences of replication. With the increasing use of pilot
programs in order to replicate and/or ‘scale-up’ what works well, an appropriate
understanding of replication becomes most relevant. The discussion in the book serves
as a warning against the rather widespread naive view of replication. As Pawson and
Tilley write, strict replication (a sort of ‘cloning’) is impossible.
However, after approvingly quoting Popper on ‘approximate repetitions’, they draw
the (unpopperian) conclusion that if one simply relies on observable similarities and
245

Evaluation 4(2)
differences between projects, decisions about replication, replicability and generaliz-
ation become no more than a matter of taste (p. 133), whereas they would conjecture
that replications would lead to the same outcomes. Therefore, although Pawson and
Tilley are right in pointing out the limitations and dangers of naive replications, it could
still make sense to carry out ‘quasi-replications’ with an awareness that they might not
work out as expected, so that it is important to continue monitoring and evaluating the
way in which the same mechanisms operate in almost similar contexts.
One of the noteworthy features of this book is that in addition to discussing issues at
the evaluation research frontier, it also contains some practical advice presented in
handbook form. For example, it includes a whole chapter on ‘how to construct realistic
data’ and to conduct ‘the realist(ic) interview’. There is also a chapter on ‘a realistic
consultation’, a very pedagogic (and amusing) dialogue between an evaluator and
policy makers on a smoke-cessation program.
After the original ‘consultation’, the relationship of evaluators and policy makers is
presented as a teacher–learner relationship, within a ‘teacher–learner cycle’. In the
opinion of this reviewer, such an approach might be counterproductive and jeopardize
the acceptance of their findings and recommendations. Surely it is better to stress the
‘learning mode’, and the need to establish an appropriate communication process, for
which the evaluator should explore different means to convey the messages, taking into
account the characteristics of the audience(s), policy-makers at different levels and
various professional and personality profiles.
The book concludes with a chapter in which ‘the new rules of realistic evaluation’
are presented as a summary of the argument. Figures and tables are also used in the text
to clarify and summarize the arguments. Although they do not always achieve their
purpose, these visual aids complement the text and illustrate the convenience of using
different methods to ensure a better communication.
Without doubt, the CMO framework developed by Pawson and Tilley provides a
simple, yet powerful, scheme to guide evaluators in designing and implementing
evaluations, as well as in storing the knowledge gained from evaluations. While
reading the book, it appears sometimes as if one has always been a realistic evaluator
without knowing it. But Realistic Evaluation provides an excellent and enjoyable
opportunity to reflect upon several key evaluation issues and, at the same time, it also
serves as a practical source of evaluation advice.
References
Hirschman, A. (1991) The Rhetoric of Reaction. Cambridge, MA: Harvard University Press.
Lawson, T. (1997) Economics & Reality. London: Routledge.
Mohr, L. (1996) Impact Analysis for Program Evaluation. London: Sage.
Rist, R. (1995) ‘Postcript: Development Questions and Evaluation Answers’, in R. Picciotto and
R. Rist (eds) Evaluating Country Development Policies and Programs: New Approaches for a
New Agenda, New Directions for Evaluation, no.67. San Francisco, CA: Jossey-Bass.
Osvaldo Néstor Feinstein

UN International Fund for Agricultural Development
Rome, Italy
246


Realistic Evaluation - Review of Tilley & Pawson

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Realistic Evaluation - Review of Tilley & Pawson

Uploaded by

Copyright:

Available Formats

Evaluationhttp://evi.sagepub.

The online version of this article can be found at:

The Tavistock Institute

Additional services and information for Evaluation can be found at:

Email Alerts: http://evi.sagepub.com/cgi/alerts

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

The Reviews section features in occasional issues of the journal. We take a

Utilization-Focused Evaluation: The New Century Text,

This is the third edition of Michael Quinn Patton’s Utilization-Focused Evaluation.

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Tone and Style

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

(1) The outcomes of selected ‘similar’ programmes.

Structure and Central Theme

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Statistics as Principled Argument, R. P. Abelson.

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Realistic Evaluation, Ray Pawson and Nick Tilley.

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

Osvaldo Néstor Feinstein

Downloaded from http://evi.sagepub.com by Osvaldo Feinstein on February 15, 2007

You might also like