Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Evaluation and Program Planning 79 (2020) 101787

Contents lists available at ScienceDirect

Evaluation and Program Planning


journal homepage: www.elsevier.com/locate/evalprogplan

The whole elephant: Defining evaluation T


Amy M. Gullickson
Centre for Program Evaluation, University of Melbourne, Melbourne, Australia

ARTICLE INFO ABSTRACT

Keywords: Definitions help us understand the characteristics of an object or phenomenon and are a necessary precursor to
Evaluation understanding what a good version of it looks like. Evaluation as a field has resisted a common definition (Crane,
Evaluation education 1988; Morell & Flaherty, 1978; M. F. Smith, 1999), which has implications for marketing, training, practice, and
Evaluation theory quality assurance. In this position paper, I describe the benefits and challenges of not having a clear, agreed-upon
Evaluation practice
definition, then propose and explore the implications of two definitions for the evaluation profession based on
values and valuation as the core of evaluation practice. The purpose is to describe a possible way forward
through definition that would increase our professional profile, power, and contribution to social justice. The
paper concludes with implications for evaluator competencies and evaluation education and questions for fur-
ther research.

Good definitions are at the heart of understanding what something must have a unique body of knowledge and a logical boundary of the
is (and isn’t); how it should look, feel, taste, smell, and operate (or not); problem area it is addressing (Crane, 1988; Morell & Flaherty, 1978;
what, if anything, it is supposed to produce; and what characteristics Worthen, 1994). The way the profession defines its work provides the
make it a good (or not) version of that kind of something (Scriven, boundaries for it, determining what areas of expertise are necessary for
1967). Different perspectives on a given something will often lead to its practitioners, and enabling descriptions of quality performance.
different definitions, as illustrated in the tale of the blind men of In- Without a definition, or with multiple definitions, those areas are un-
dostan, who disputed furiously about whether the same elephant was a clear, and the criteria for quality will shift accordingly. As Fournier
wall, snake, rope, tree, spear or fan because they were each touching a (1995) expressed it: “[H]ow the phenomenon is defined (that is, so-
different part (Saxe, 2020). I posit that evaluation, as an emerging cially constructed)… influences the source or locus of the values from
profession, is the elephant in a similar situation, with sectors, practi- which criteria are selected…[which] affects the validity of the con-
tioners, and academics all defining it differently based on their per- clusions” (p. 22). Take, for example, the different definitions of eva-
spectives. For example, economists may see evaluation only as cost- luation and their implications for expertise presented in Table 1.
benefit analysis (CBA), health and community sectors as social return Since evaluation is a market-based activity (Lemire, Nielsen, &
on investment (SROI), government policy makers as randomised con- Christie, 2018; Smith, 1998), the profession needs clarity about ex-
trolled trials (RCTs) (Gluckman, 2013), Indigenous people as an ex- pertise and quality not only for ourselves, but for our clients and other
tension of colonial values, and social scientists as a type of applied stakeholders. All parties need a shared understanding of what con-
research. As with many aspects of evaluation, the answer to “What is stitutes evaluation and evaluation quality to know whether they are
the definition of evaluation?” is “It depends.” If evaluation is to emerge delivering it/getting it. Understanding quality in evaluation begins at
as a distinct profession, a more robust and agreed-upon definition is the same place as it does for any other evaluand – with the definition.
needed; herein, I make a case for why a clear, unique definition is To date, evaluation has lacked a common definition that claims a
important, offer possible definitions, and discuss their implications. unique domain of expertise, although it has been called for (Smith,
1999). Discussions and publications on the topic provide evidence that
1. The case for a definition the field has resisted efforts in this direction, particularly if summative
judgement is included (Migotsky et al., 1997; Morell & Flaherty, 1978;
Scholars researching and reporting on evaluation for more than Poth, Lamarche, Yapp, Sulla, & Chisamore, 2014; Smith, 1999; Stake
forty years have agreed that for evaluation to emerge as a profession, it et al., 1997). Unique domain is also a problem, as demonstrated by the

E-mail address: amy.gullickson@unimelb.edu.au.

https://doi.org/10.1016/j.evalprogplan.2020.101787
Received 7 August 2019; Accepted 20 January 2020
Available online 27 January 2020
0149-7189/ © 2020 Elsevier Ltd. All rights reserved.
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

Table 1
Example definitions and their implications.
Source Definition Primary Expertise/Quality

Rossi, Lipsey, & Freeman, “a social science activity directed at collecting, analyzing, interpreting, Research methodology and methods.
2004, p. 2) and communicating information about the workings and effectiveness
of social programs”
Scriven (1991, p. 1) “Evaluation is the process of determining the merit, worth, and value of Values and valuation
things, and evaluations are the products of that process”
Mayne and Rist (2006, p. 97) In our view, evaluation and evaluators should be playing key roles in all Information and knowledge systems management; synthesis of
aspects of evaluative information in an organization: in building RBM empirical evaluation, research, monitoring and performance studies.
[results based management] capacity, in managing evaluative
knowledge systems, and in creating evaluative information and
knowledge, including through the conduct of evaluation studies…. We
use the term “evaluative information” and its counterparts to include
empirical information on and about the results of a service, program, or
policy…information from evaluation studies, results monitoring
systems, other one-off results-focused research, and performance
studies (Mayne & Rist, 2006), as well as evaluative information derived
from these various information sources through analysis, aggregation,
and synthesis (Rist & Stame, 2006.)”
American Evaluation “bettering products, personnel, programs, organizations, governments, Unclear. Could be change management, consulting skills, research,
Association (2004) consumers and the public interest; contributing to informed decision social justice, public administration, organisational theory and
making and more enlightened change; precipitating needed change; development, advocacy, human resources, decision science, etc.
empowering all stakeholders by collecting data from them and engaging
them in the evaluation process; and experiencing the excitement of new
insights”

perpetual EvalTalk question, “What is the difference between research 2. How might we define evaluation?
and evaluation?” The question has appeared every year since the list-
serv began more than 10 years ago – and answers vary by respondent. Along with a variety of other authors (Crane, 1988; Schwandt,
In an example response, Rogers (RealEvaluation accessed 2019 02 12),1 2008; Scriven, 1993), I propose that to make evaluation distinct, we
posted four different ways to think about the answer: i) as a dichotomy, need to claim the professional space of judgement based on values by
ii) as mutually independent, iii) evaluation as a subset of research, or applying the logic of evaluation – to set criteria and standards, measure
iv) research as a subset of evaluation. While the global evaluation performance, and arrive at a judgement about that performance
community does seem to have agreement, for the most part, on five to (Fournier, 1995; Scriven, 1991). Values and criteria are an essential
seven common competency domains (AES Professional Learning conversation in today’s global political milieu, where political actors
Committee, 2013; American Evaluation Association, 2018; Aoteroa have often overlooked sustainability and equity in favour of economic
New Zealand Evaluation Association, 2011; Canadian Evaluation or other short-term gains with increasingly visible negative con-
Society, 2010), these also do not clarify what makes evaluation unique sequences (House, 2014; Mertens, 1994; Symonette, Mertens, &
in the marketplace. Hopson, 2014). Claiming the process of value judgements differentiates
The benefit of the current “it depends” definitions is their inclusivity us from researchers, auditors, and organisational consultants and pro-
Broad definitions like that of the 2004 AEA Guiding Principles and the vides a distinct set of knowledge and skills to pursue for those seeking
competency sets referenced above demonstrate a desire to include the to become professionals.
various activities in which evaluators engage and the various contexts Historically, however, the field has resisted the explicit requirement
in which they work. Inclusiveness is even stated as an intent for the for a summative judgement – or really any judgement at all that ac-
2018 AEA competencies (American Evaluation Association, 2018). companies this type of definition (Chelimsky, 2013; Weiss, 1988). An-
However, ensuring that all are included tends to conflate the activities other challenge is the technical language, which has been difficult for
of people who do evaluation and the contexts in which they do it with non-philosophers to grasp and use. As a colleague recently expressed it,
evaluation itself. Usually evaluators are multi-purpose kinds of people, “I can’t talk with a client about the logic of evaluation – it’s too aca-
accustomed to adapting what they do to their clients’ needs using a demic. Our clients won’t understand it, and they will not want to en-
variety of skills. Rogers’ four relationships between evaluation and re- gage with it.” While we need to have technical language as part of our
search provide further evidence of this inclusiveness issue, adapting profession (Morell & Flaherty, 1978; Worthen, 1994), we do not need it
evaluation to suit the diversity of contexts and roles in which evaluators to be a barrier to people understanding what evaluation is and what
work. By ensuring that evaluation encompasses all the activities eva- evaluators do. What is needed, therefore, is a definition of evaluation
luators do, we include everyone. By ensuring that evaluation fits in all that would (a) still make us distinct from other similar activities, like
contexts and roles, we make evaluation a more marketable activity and research and consultancy; (b) include many of the activities evaluators
make evaluators more marketable for diverse roles. However, when do so as to maintain the desired inclusiveness; (c) provide the defining
evaluation includes everything that evaluators do, it makes our defi- characteristics necessary for us to take the next step toward evaluating
nitional boundaries fuzzy and, thus, leaves evaluation vulnerable to co- ourselves; and (d) do it in plain language.
opting by other disciplines, who may define it in specific ways to meet A combination of Stake’s (1977) definition (“fully describe, fully
their needs. It does not make our profession distinct and recognisable as judge”) and an expanded version of Scriven’s logic of evaluation
one that provides a unique contribution in the marketplace. (Fournier, 1995; Scriven, 1991) would meet these needs. Stake (1977)
articulated the task of evaluation as follows: “Both description and
judgement are essential – in fact, they are the two basic acts of eva-
luation… To be fully understood, the educational program must be fully
described and fully judged” (Stake, 1977, p. 374). Under this definition
1 of “fully describe, fully judge,” the task of evaluation is to provide a full
http://betterevaluation.org/blog/framing_the_difference_between_research_
and_evaluation
description of the evaluand, a justified set of criteria and standards by

2
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

Table 2
The Expanded Logic of Evaluation.
Step Mapping Purpose Potential Activities

0. Clarify evaluation purpose Elucidates the reasons for conducting an evaluation and Documenting the needs for the evaluation, its stakes and
and assess evaluability determine whether it is appropriate to do so. stakeholders, the level of certainty required in the
conclusions, and the general state of the evaluand and its
fitness for the process.
1. Define the evaluand Fully describe Describes the phenomenon to be evaluated (Fournier, 1995) Documenting (as appropriate) the needs or problem the
and sets the boundaries for the evaluation. evaluand is addressing, the evaluand’s program logic and/or
theory and costs, acquiring specific context and evaluand
information, such as stakeholders and their power
relationships, purpose, goals, intended audience/consumers,
commissioners, required level of certainty for findings, stakes
for the evaluation and/or the program, etc.
2. Define the group to which Fully describe Enables search for existing theories, norms, research, Defining the evaluand’s field (e.g., program, policy, etc.);
the evaluand belongs criteria, standards, frameworks, taxonomies, and evaluations discipline (e.g., education, political science, social work);
to inform the evaluation process. sector (government, NGO, private business, education); and
context (power dynamics, political pressures, relevant policy),
etc.
3. Identify criteria/ delineate Fully judge Sets the parameters for how we can know if the something Identifying potential criteria and/or evaluation questions
evaluation questions defined in Steps 1 and 2 is a good something. (which have criteria embedded in them); negotiating
agreement on them and weighting of them (if needed), and
delineation of indicators (qualitative or quantitative) through
literature review, collaboration, stakeholder consultation,
negotiation, needs assessment and engagement with
normative ethical theories, theories of justice, values and
valuing.
4. Identify performance Fully judge Defines the threshold of performance that equates to “good” Setting thresholds on indicators, determining what kind of
standards for this evaluand. evaluative operations are best suited to each (grading,
ranking, scoring, apportioning, attributing), assessing context
(e.g., literature, culture, etc.), and determining required level
of sophistication, from complicated (multiple gradations of
good/not) to simple (e.g., acceptable or not, does no harm,
pass/fail).
5. Justify the criteria and Fully judge Sets up support for the definitions of “good” in the Reasoning, argumentation, warranting, literature searches,
standards evaluation process; addressed potential claims of evaluative consultation, facilitation, negotiation, discussions on
as subjective. contextual values and valuing, including understanding what
counts as evidence in this context.
6. Measure; observe evaluand's Fully describe Provides the evidence about what the evaluand does. Evaluation and research design, including economic analyses,
performance and fully judge Primarily descriptive, but informed by values, e.g., what selection of data sources, selection and implementation of
counts as evidence, what data sources and indicators have evaluation methods (e.g., grading, ranking, scoring,
been determined as important. apportioning) and research methods, data collection and
analysis, measurement, adherence to relational and
procedural ethics, documentation and/or attribution of
impact and effectiveness of causal chains, etc.
7. Justify the measures Fully describe Addresses threats to quality in Step 6. Providing a rationale for the types of evidence (indicators)
and fully judge selected and a warrant for their connection to the criteria and
standards, description of the accuracy and level of certainty of
the evidence, documenting trade-offs, discussions and analysis
related to ethics, validity, reliability, trustworthiness,
generalisability/transferability, and other relevant principles.
8. Synthesise an evaluative Fully judge Combines the descriptions and aspects of judgement into a Rendering a judgement (formative or summative), at one or
judgement verdict about the goodness of the evaluand. several levels (individual components, criteria, or overall),
providing different weighting of data sources or criteria (Steps
4–7), using evaluation specific methods such as qualitative or
numerical weight and sum, rubrics, algorithms, decision trees,
etc.
9. Justify the synthesis method Fully judge Provides support for the choices made in Step 8. Describing the rationale behind the judgements were made, at
what level and how, including the choice of synthesis
methods, providing reasons based on evidence from the
evaluand’s description and the evaluation requirements. A
clear and transparent process in Step 8 will streamline this
step.
10. Report judgement Communicates judgement to audiences. Considers all the Data visualisation, writing and verbal communication,
possible ways in which the process and findings of an technology (e.g., web based, software tools), qualitative and
evaluation will be reported and the audiences to whom they quantitative reporting, storyboarding, storytelling, video
are reported. production, etc., which address needs for use.

which to judge it, and a process that enables generation of a judgement. In the expanded logic, the steps are listed in a linear order for the
Or, in simpler terms: deciding what makes something a something, sake of simplicity and referencing; however, working through the logic
deciding how to know that something is good, and then deciding how may be a more iterative process that begins in a variety of places. For
good a specific something is. To flesh it out, I have expanded Scriven’s instance, I often discuss reporting (Step 10) when talking with clients
original logic (Gullickson, 2018) based on Nunns (2016) to make the about the evaluation purpose (Step 0) and criteria (Step 3). Discovering
defining, judging, and the inherent reasoning aspects more visible which steps, if any, are inappropriate starting points, would need fur-
(Table 2). ther research and discussion. In the table I have used the term evaluand,

3
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

assuming a technical audience, but that could be replaced with “the In this case, the answer is clear. Under this definition, it is Rogers’
object” or “the something” to make it more accessible. Steps 0 and 10 fourth relationship: research in the service of evaluation. Research
acknowledge the nature of evaluation as a professional activity done for questions and findings are necessary to fully describe the evaluand, but
an audience—and that the act of evaluation is nested in a negotiation of to fully judge requires additional tasks and skills.
the task and reporting of the findings. These two steps are not part of
the logical process of arriving at a judgment about the goodness of the 3. Getting technical
evaluand, although they can profoundly affect what happens in the
evaluation. Within the profession, a more technical, single sentence definition
As demonstrated in Table 2, using “fully describe, fully judge” as the might be valuable to clarify the knowledge and skills embedded in
general definition of evaluation means that activities that have been “fully describe, fully judge,” which make evaluation distinctive and
categorised as evaluative, but are, in fact, descriptive, can be delineated highlight areas for both education and evaluation of quality practice. To
in the process. This provides several advantages, including: (a) clar- that end, consider a hybrid definition of evaluation based on Scriven
ifying the difference between evaluators and evaluation, (b) resolving (1991) and Fitzpatrick, Sanders and Worthen (2011):
some long-running debates in evaluation, (c) providing an orienting
Evaluation is the generation of a credible and systematic determi-
frame to place cost-benefit analysis, social return on investment, ran-
nation of merit, worth, and/or significance of an object through the
domised controlled trials, and other activities within an evaluative
application of defensible criteria and standards to demonstrably
process; and iv) enabling us to be clearer with our clients and stake-
relevant empirical facts
holders.
Consideration of a few key questions from our history will flesh out Scriven’s logic of evaluation provides the foundation of the defini-
these advantages. First, is judgement necessarily the job of the eva- tion because it is simultaneously differentiating and encompassing. It is
luator? Under this definition, no, but it is necessarily the task of an differentiating because it subsumes Scriven’s (1991) claim, “What dis-
evaluation, which fully judges. With that distinction clarified, we can tinguishes evaluation from other applied research is at most that it leads
explain to clients how what we have done, if it is more on the “fully to evaluative conclusions, and to get to them requires identifying
describe” side, fits into the general process of evaluation. If getting to standards and performance data, and the integration of the two” (pp.
judgement is not within the means or expectations of our work with a 143–144). It is encompassing because it implicitly underpins all eva-
client, the specialised knowledge evaluators would have under this luation activity (Shadish, Cook, & Leviton, 1991) no matter the eva-
definition allows us to explicitly prepare them to do it on their own. luand or its context. This definition states that what makes something
For example, in a recent evaluation, I used a much more basic an evaluation is the synthesis of a judgement about an evaluand's value
version of this framework with a client. Because their data had not been based on criteria, performance standards, and factual evidence. The
set up for evaluation, we spent much of our time and budget cleaning addition of the adjectives “credible,” “systematic,” “defensible,” and
up and analysing their existing data and providing specific feedback on “relevant” from Fitzpatrick, Sanders, and Worthen provides criteria by
what they could do to improve their data collection tools. We spent very which we can judge the quality of evaluative activity and differentiates
little time on establishing criteria and no time on standards or detailed personal preference evaluation from professional evaluation. What
evaluative synthesis. Therefore, when reporting, we simply placed the makes a professional evaluation good is that those criteria apply to the
project in this process, explaining that we spent most of our time in the evidence, reasoning, and judgements involved.
fully describe space (program logic, observing performance), and,
based on what we reported, they could engage in ongoing conversations 4. Implications for educating evaluators
about whether the performance was good enough – a process with
which we would be happy to assist. In this case, we did not “do” eva- The combination of “fully describe, fully judge” and the more
luation, but we prepared the clients to do it by explaining where what technical definition has implications for competencies and evaluator
we had done fit into the overall process of conducting an evaluation, education, i.e., what makes an evaluator an evaluator instead of
and flagged them on the importance of explicitly defining what good something else. Based on the definitions above, an evaluator is someone
looked like. with the skills and knowledge to systematically and credibly determine
Second, are RCTs evaluation? Under this definition, RCTs alone are the merit, worth, and significance of an evaluand based on evidence. No
not because they report on the size and significance of an impact, but matter the context of the evaluation or title of the person conducting it,
not its value. Thus, RCTs would fall on the “fully describe” side of the the task and logic of evaluation remain the same. This has implications
definition, contributing information towards fully judging the goodness for what professional evaluators need to know and be able to do. I
of the evaluand. RCTs could inform answers to evaluative questions, present an analysis of potential gaps below, informed by evaluator
such as: were the impacts big enough to make a difference in people’s competency sets and literature on evaluation education, begining with
lives? Did they benefit the most vulnerable in the population? some overall observations, and then progressing to specific areas that
Third, are economic analyses like CBA evaluation? Again, under this require attention.
definition, no, because they are not a comprehensive description and Overall, in the competency sets and evaluation education literature,
judgement of all the aspects of the program that relate to its value – most attention has been given to the skills and knowledge needed for
instead, they focus on one criterion to describe and judge. In both the the “fully describe” aspect of evaluation. As a result, there are fewer
case of RCT and CBA, the values focus on just one aspect of the eva- gaps here. However, the “fully describe” activities for Steps 1 and 2
luand’s goodness (i.e., its causality or its economic performance). Either demonstrate that the profession may need a more in-depth investigation
may be critically important, but alone they lack the comprehensive and use of existing theories and models from disciplines and contexts in
status of a good criteria of merit list, which is required to fully judge which our evaluands reside. Evaluation education would need to attend
(Scriven, 1994). The “fully describe, fully judge” definition allows us to to these theories along with evaluation approaches.
clearly explain how these important descriptive activities fit into the Interpersonal skills are present in all the competency sets, but not
overall task of evaluation or, in and of themselves, judge one aspect of present in evaluation texts or often taught in formal curriculum; there is
an evaluand, enabling us to operate on a more level playing field with some assumption that evaluators already have them (Dewey,
colleagues from business and research by clarifying that our distinctive Montrosse, Schröter, Sullins, & Mattox, 2008). Analysis of the activities
contribution is about value in the broader sense of merit, worth, and column in Table 2 demonstrates that evaluators will frequently need
significance (Scriven, 1991). negotiation, facilitation, and communication skills. A more sophisti-
Finally, what is the relationship between research and evaluation? cated version of the public speaking idea of reading the audience could

4
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

subsume these into a higher order competency in the interpersonal thinking. This leaves work to be done. How can we deliberately use the
domain. Reading the audience in evaluation could mean understanding tools of critical thinking to build defensible, credible, and justified ar-
the listeners’ or readers’ paradigms about evidence, the view they have guments about the value of something? If evaluation is higher order
of evaluation, the political climate, the social norms, the cultural in- thinking (Bloom & Krathwohl, 1956), what are the lower orders that
fluences, etc. that will influence the way the evaluator needs to engage need to be grasped first by the learner? How much practice and of what
with them about evaluation. It would combine situational analysis and type is needed?
awareness, negotiation, facilitation, and communication, summarising
the interaction of these key competencies for evaluation practice. 4.3. Worldviews and values
As stated above, the competencies related to the “fully judge” side of
these evaluation definitions have generally had the least attention in The culture(s) of an evaluator’s upbringing and disciplinary edu-
publication, competency sets, and curriculum. Based on the technical cation will affect the way she sees the world, just as those same influ-
requirements of “fully judge,” what knowledge, skills, abilities and ences affect the clients and stakeholders involved in the evaluation.
other characteristics align specifically with making systematic, credible, These worldviews determine a person’s perspective on what constitutes
relevant and defensible determinations of merit, worth, and sig- knowledge or evidence, appropriate ways to develop and discover it,
nificance based on defensible criteria? An analysis of the existing AEA whether values are acceptable in seeking knowledge, the prioritisation
competencies (2018) shows that many of these key competencies are of individuals or community in that process, and what good looks like.
already covered, for example, good evaluation questions (2.2), ethical Philosophical traditions from Western ethics, for example, provide
practice (1.1), and using systematic evidence to make evaluative jud- normative descriptions of what is good, based aspects such as con-
gements (1.4). Yet, based on the above definitions, we would need to be sequences, duty, or an ethic of care (Newman & Brown, 1996). Some
a bit more specific and detailed about the skills and knowledge essential scientific traditions require that for research to be good, it must be
to define those who would engage in professional evaluation, as fol- objective (Creswell, 2003). Indigenous communities, in contrast,
lows. prioritise relationship with people, community, and environment and
co-creation of knowledge, rather than objective investigation (Cram,
4.1. The logic of evaluation and evaluation-specific methods 2018). Worldviews and values underlie what counts as credible and
relevant data, what constitutes culturally appropriate approaches or
At the heart of practice based on these definitions is the logic of procedures, and may constitute an area for an evaluator’s professional
evaluation: setting criteria of merit, determining performance stan- development; however, recognizing and navigating worldviews is not
dards, measuring performance, and conducting evaluative synthesis to explicitly named in the AEA competencies.
generate a value judgement. Therefore, an evaluator must know and If we accept the above definitions of evaluation as a profession, then
demonstrate the skills to answer questions such as: What are criteria of we claim that knowledge about values is central to the task. Thus, it
merit? What are standards? What are criteria and standards for this would not be acceptable for an evaluator to operate exclusively and
evaluand? Are any more important than the others? What methods and unknowingly within her own worldview and accompanying values, or
resources can we use to generate them? What are the methods we can that of her client, or to assume that all parties involved share the same
use to synthesise evaluative conclusions from values and data? Which values. Evaluation education, therefore, must provide not only an in-
methods are the best in what circumstances? How can we engage in troduction to the basic research paradigms, but also introduce Western
these methods as transparently as possible? What makes a synthesis normative ethical philosophies and other worldviews that underpin
method a good choice in an evaluation? If we are about producing perspectives on what makes something good. Evaluators would need to
defensible judgements of merit, worth and significance, then this learn to recognise their own worldviews and the worldviews of others
knowledge must be our stock in trade. Several scholars have noted the and understand the implications of those for what good looks like for
importance of these questions to the discipline (e.g., Greene, 2005; the evaluation process and the evaluand. This is likely a gap in current
Mark, Henry, & Julnes, 2000; Schwandt, 2002), yet we have dedicated evaluation practice; according to Newman (1999), most evaluators do
very little time to these questions, and the requisite knowledge and not know that the Western normative ethical principles are sources of
skills to address them, in the literature and in evaluation textbooks. A values, nor do they understand the depth of tradition behind them or
recent review of research on evaluation published between 2005 and how they are likely influencing value choices in evaluation. The recent
2014 identified that few empirical investigations address questions re- New Directions for Evaluation issue on Indigenous Evaluation (Cram et al.,
lated to values and valuing (Coryn et al., 2017). A recent study of 2018) demonstrates the consequences of not attending to the world-
American Evaluation Association members showed that nearly three- view issue for Indigenous communities, and provides key insights on
fourths were not at all familiar or only a little familiar with the logic of what constitutes good in those communities. These resources provide a
evaluation (Ozeki, Coryn, & Schröter, 2019). To date, Davidson’s starting point, but further work needs to be done to determine what
(2005) Evaluation Methodology Basics is the only English text that dis- level of understanding evaluators need to have in this area to engage in
cusses criteria, standards, and evaluative synthesis explicitly, and it is quality practice.
14 years old.
4.4. Deep understanding about subjective/objective claims
4.2. Critical thinking and argumentation
Related to worldviews, and particularly important for evaluation, is
Under these definitions, the task of evaluation involves reasoning that at the heart of a credible argument (above) about merit, worth, and
and critical thinking to shape arguments about the goodness of some- significance is assembling evidence in support of the evaluative claim.
thing (Patton, 2008; Shadish et al., 1991). Yet how many of our formal Yet a major challenge in evaluation is that stakeholders will have dif-
training programs and professional development opportunities involve fering epistemologies or may, in fact, make claims that the use of values
in-depth work in the discipline of critical thinking and decision theory, renders judgements invalid. Scriven’s arguments about the fallacy of
on understanding and using their tools and ideas? Based on the analyses the value-free doctrine are well documented (Scriven, 1991, 2003,
of LaVelle (2014) and Dewey et al. (2008), not many. Buckley and Scriven, 2007, 2016); however, as evaluators, we need to be prepared
colleagues (Buckley, Archibald, Hargraves, & Trochim, 2015) explored to counter the challenge of “that’s just subjective.” Petrie (1995) has
how understanding and implementing the principles of critical thinking given us a potential start with his discussion of purpose, context, and
could improve evaluation capacity building efforts, but valuing is synthesis through the lens of perceptual control theory, as does
conspicuously absent from their definition and discussion on evaluative Scriven’s (2005) delineation of different types of value claims (personal,

5
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

market, contextual, essential). But we have several questions yet to


answer. Are value sets always subjective? Or contextual? Is a research
lens of external validity and/or transferability sufficient when we need
to consider whether an evaluation’s findings will generalise to other
contexts? How can we, in clear and compelling ways, articulate a de- Interpersonal
Context
fence against claims of subjectivity and help clients see their own as-
sumptions about truth and value as part of the picture, rather than the
Evalua on specific
whole picture (Kegan & Lahey, 2001, Kegan & Lahey, 2009)? logic, methods &
ethics; worldview,
4.5. Ethical conduct related to values values, &
subjec vity;
accompanying
While the competency sets include ethical practice, the ethical im- skills
plications of evaluation go beyond observing research ethics protocols
and compliance with the Program Evaluation Standards (Yarbrough, Methodology Project
Caruthers, Shulha, & Hopson, 2010) or other guidelines. Current re- (incl. cost Planning and
search on ethics in evaluation has delineated a variety of issues that analysis) Management
evaluators face (Azzam, 2016; Morris, 2015). Several of these, like
subjectivity, normative ethical traditions, and types of value claims, can
be addressed through research and exploration of the suggested topics
above. However, the definition of evaluation as a determination of
© 2019 A.M.Gullickson University of Melbourne
merit, worth, or significance adds ethical challenges. For instance, what
are the ethical challenges inherent in working in communities who Fig. 1. Competency wheel, based on the 2018 AEA competencies with defini-
have competing value sets related to the evaluand? Or where stake- tionally differentiating and professional practice domains at the centre
holders have opposing epistemologies and different types of evidence (Adapted from Stevahn, King, Ghere & Minnema, 2004).
offer conflicting perspectives? Much of the attention to culture and
advocacy has begun to address this, but very little has made it into the theories, and contracting, that would form a third layer, indicated by
field’s conversation around ethics. For instance, is it ethical for an the dotted circle around professional practice. The outer layers of the
evaluation to be conducted by a team of people external to the com- diagram are the aspects of evaluation that an evaluator could contract
munity, using the values of an external funder or values implicit in the out, like project management, or that might be discipline specific, like
literature, rather than on what the community holds to be of value? SROI.
I propose that these five areas of competency (the logic of evalua-
tion and evaluation-specific methods, critical thinking and argu-
mentation, deep understanding about subjective/objective claims, 6. Lessons learned
worldviews and values, and ethical conduct related to values) in com-
bination are what make evaluation distinct as a profession. Certainly, Definitions create boundaries and priorities by determining what
other skills and knowledge are important – even necessary – to con- something is and what aspects of it are related to whether it is good. For
ducting evaluation, as Table 2, the existing competency sets, and the instance, if this paper had built instead on the (Rossi, Lipsey, &
discussion above delineate. However, research methods, project man- Freeman, 2004) definition of evaluation as applied social science shown
agement, and interpersonal and consultancy skills are not what differ- in Table 1, the visualisation would have put research methods, rather
entiate evaluation from other fields. Placing evaluation activities within than evaluative reasoning and methods, in the middle. Building on
the frame of making value judgements means that shapes the activities Mayne and Rist (2006) might have put knowledge management at the
and expertise of evaluators. Under this definition, it would be our aim centre. Research on evaluation degree programs indicates that a dia-
to make these judgements, or to support others in the making of them. If gram depicting their implicit definition would include whatever re-
we are going to be able to judge what makes someone an evaluator and search methods are appropriate to the discipline in which the course is
what makes something an evaluation, we must have a clear picture of being taught, plus some evaluation approaches and professional prac-
what differentiates us. The definitions proposed above and this list of tice items like ethics and the Program Evaluation Standards (Davies &
core competencies can provide that differentiation. Mackay, 2014; Dewey et al., 2008; LaVelle, 2014). However, those
depictions and accompanying definitions would be less likely to clearly
5. Visualising evaluation competence differentiate evaluation as its own field, separate from research and/or
the discipline in which evaluation is being practiced.
With that in mind, I propose a visualisation of the set of compe- A definition that clearly describes evaluation as including judge-
tencies recently approved by AEA (Fig. 1), which orients the compe- ments of merit, worth or significance is not necessarily as limiting as
tencies in order of importance. The core evaluation specific compe- previous discussions in the literature have indicated. Using Stake’s
tencies are in the centre because they delineate evaluation as a distinct “fully describe, fully judge” as the broad frame and elaborating care-
transdiscipline (Scriven, 2003) and provide the foundation for all the fully the steps within it related to description, judgement, and the
other competencies. Their marked absence from texts, literature, and reasoning that connects them may resolve ongoing debates, provide
most competency sets indicates the emphasis is necessary. In addition, clarity for ourselves, our clients, and our competitors, and enable us to
evaluators need these skills so they can make judgements about their unpack and prioritise the knowledge and skills evaluators need to do
choices (or others’ recommendations) in the rest of the domains. Pro- the work of evaluation. Such a definition provides a way for evaluation
fessional practice becomes a second core set of competencies containing to position itself more clearly as a profession by defining its contribu-
standards, guidelines, approaches, and the work of evaluation theorists tion.
– the methods for engaging with the logic of evaluation in practice. The “fully judge” aspect of the definition places our work squarely
Meta-evaluation could be placed within this domain or the core set as in the realm of value judgements. Organisations and governments need
an evaluation-specific competency. It seems likely that there are es- a way to make stakeholders’ values visible, negotiate when they are
sential aspects of the other domains, such as basic inquiry design, cost competing, and gather evidence of impact created by programs in re-
analysis, reading the audience, understanding contextual values and lation to those values. Claiming a definition that includes “fully judge”

6
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

gives us access to this space, to follow the research on it, and to bring position the task of evaluation in comparison to other similar activities,
our various skills and tools to bear on the criteria that are used for which is essential to claiming our space in the market; (b) being broad
making decisions not only about what makes an existing policy or enough to encompass the variety of relevant activities, which resolves a
program good, but what criteria should be used for designing programs. variety of ongoing conflicts in the field; and (c) bounding, clarifying
This emphasis can increase the significance of our contribution. For and prioritising the key knowledge and skills required for those who
while improving evaluation designs (Bamberger, Rugh, & Mabry, 2012; would engage in it, making it a specialist activity (Morell & Flaherty,
Campbell & Stanley, 1963; Campbell, 1969) and increasing evaluation 1978). If we choose to pursue professionalisation, these steps will pave
use and influence have had much attention historically (Kirkhart, 2000; the way to certification, licensure and exclusion of unqualified persons
Patton, 2008; Weiss, 1988), history and a systems perspective show that – an essential aspect of that pursuit (Worthen, 1994). It would enable us
they are low leverage points that lack the capacity to change the system to access the benefits that come with a clearly bounded profession:
(Meadows & Wright, 2008; Pawson & Tilley, 1997) because they are too power to control quality, leverage to define and keep a place in the
far away from key decision-making points. market, and leverage to create the kind of influence on public good that
Criteria are part of the blueprint of an evaluand and, thus, have evaluation has been claiming for years, but not achieving (Campbell,
greater potential for impact, i.e., better leveragge. Changing the criteria 1969; Cronbach, 1982; Pawson & Tilley, 2014).
changes the whole understanding of the evaluand’s value. When values Like the blind men of Indostan describing the elephant, academics
and criteria are discussed at the beginning of program design, it is much and practitioners have disputed about evaluation, arguing over
easier to make changes, just as it is much easier to change a blueprint methods, values, use, and positivist, constructivist, and other view-
than a building (Carlis, 2013). Developmental evaluation (Patton, points. Ultimately, the result is a diffuse definition of evaluation that
2011) and evaluators’ recent engagement with design thinking (Chen, consumes resources on internal debates that are not particularly of in-
Gargani, Stead, & Norman, 2015; Dart, Webb, & Tolmer, 2017) in- terest outside our field, and that reduce our ability to create influence
dicates many evaluators are already onto this idea. Claiming the “fully on society because we lack a cohesive definition and framework for our
judge” definition provides us a clear pathway to become experts in transdiscipline and, thus, cannot emerge fully as a profession. Without a
valuation and criteria, thereby laying the foundation to potentially in- clear definition, we will continue to dispute loud and long while other
crease our impact on social justice. Consider the difference, for in- professions and evaluation commissioners actively determine, based on
stance, in a school building project where environmental sustainability their own perspectives, what evaluation is and the kind of information
is the primary criterion, rather than ease for dropping off children by it should provide for decision making (Crane, 1988; Lemire et al.,
car (Rowe, 2017). 2018). Evaluation will continue to sacrifice its seat at the table and its
There are risks inherent in adopting the “fully describe, fully judge” potential to create the social change it purports to seek in favour of
framework. First, it may neutralise long-standing debates, which have internal debate. Instead, we could adopt the “fully describe, fully judge”
been personally important and professionally defining for evaluation definition proposed herein, which covers the whole elephant. It clarifies
theorists. Second, placing evaluative reasoning and judgement at the what makes evaluation a something, it provides a map for the various
heart of the evaluation reduces the importance of evaluation ap- activities we engage in related to the task of evaluation, points us in the
proaches. Approaches have been personally important and profession- direction of what makes evaluation and evaluators good, and provides a
ally defining to many, providing a pathway for individual evaluators to pathway for the research, education, and practices we need to get us to
gain recognition and reputation. Under these definitions, while ap- good.
proaches are still relevant, they are not central to understanding what
evaluation is and what it does. Instead they are strategies for engaging
with the “fully describe, fully judge” process (Fournier, 1995; Smith, Funding
1995). Third, if this definition is applied retrospectively, it will reveal
that many activities that have been called evaluation were simply “fully Funding for this article was provided by the Melbourne Graduate
describe” activities – critically important and valuable, but incomplete School of Education under the conference travel grant scheme and the
as evaluations because they lacked the judgement aspect. Other activ- Research Development Award [grant number 01-4600-06-xxxx-
ities may be shown to have provided a judgment with scant attention to 000319-RDQ-12-01].
the “fully judge” steps and thus fall short of what is required to deliver a
defensible evaluation (Nunns, 2016). Fourth, it would reveal that many Acknowledgments
of us currently practicing evaluation have more to learn, particularly in
evaluation-specific reasoning and methods. The gap may feel threa- The author acknowledges Kate McKegg and Donna Podems for the
tening and contrary to the inclusive and consensus-based nature of invitation to participate in an AEA Conference panel in 2014, which
competency efforts globally. Fifth, it would follow that those teaching was the impetus for early thinking on this article; Jean King who did the
evaluation courses would have significant work to do revising their presentation when I was unable to attend, and then mentored me in
curricula, which have focused on approaches and research methods. moving it to publication; Ghislain Arbour for discussions that clarified
This has time and cost implications, which are particularly significant my thinking; Lori Wingate for the elephant metaphor associated with
for higher education as it loses public funding internationally. In- evaluation; Mathea Roorda, Kelly Hannum and the reviewers for their
dividuals and professional development organisations may be more critical review and comments; and Arlen Gullickson, for recommending
agile and able to access expertise needed to respond quickly. Finally, it Stake’s 1977 article to me at the start of my PhD studies.
points us to a significant gap in our research regarding values and va-
luing, particularly evaluation-specific reasoning and methods (Coryn
et al., 2017; Szanyi, Azzam, & Galen, 2012). Appendix A. Supplementary data
Agreement on a definition of evaluation has previously seemed
unlikely at best. Agreeing on this definition for our profession would Supplementary material related to this article can be found, in the
have significant implications for teaching and practice. However, it online version, at doi:https://doi.org/10.1016/j.evalprogplan.2020.
meets the needs of (a) being specific enough to differentiate and 101787.

7
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

References market and its industry—Advancing a research agenda. In S. B. Nielsen, S. Lemire, &
C. A. Christie (Vol. Eds.), The evaluation marketplace: Exploring the evaluation industry.
New directions for evaluation: Vol. 2018, (pp. 145–163). . https://doi.org/10.1002/ev.
AES Professional Learning Committee (2013). Evaluators’ professional learning competency 20339.
framework. Melbourne, Australia: Australasian Evaluation Society. Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation : An integrated framework for
American Evaluation Association (2004). American evaluation association guiding principles understanding, guiding, and improving policies and programs. The Jossey-Bass nonprofit
for evaluators. and public management series. San Francisco: Jossey-Bass.
American Evaluation Association (2018). AEA evaluator competencies. Mayne, J., & Rist, R. (2006). Studies are not enough: The necessary transformation of
Aoteroa New Zealand Evaluation Association (2011). Evaluator competencies. Retrieved evaluation. The Canadian Journal of Program Evaluation, 21(3), 93.
fromhttp://www.anzea.org.nz/wp-content/uploads/2013/05/110801_anzea_ Meadows, D. H., & Wright, D. (2008). In D. Wright, & Sustainability Institute (Eds.).
evaluator_competencies_final.pdf. Thinking in systems: A primer. White River Junction, VT: Chelsea Green Publishing.
Azzam, T. (2016). Ethical issues in evaluation practice: An empirical study. European Mertens, D. M. (1994). Training evaluators: Unique skills and knowledge. New Directions
Evaluation Society Conference. for Program Evaluation, 62(Summer), 17–27.
Bamberger, M., Rugh, J., & Mabry, L. (2012). RealWorld evaluation: Working under budget, Migotsky, C., Stake, R. E., Davis, R., Williams, B., DePaul, G., Cisneros, E. J., ... Feltovich,
time, data, and political constraints (2nd). Thousand Oaks, Calif: SAGE. J. (1997). Probative, dialectic, and moral reasoning in program evaluation.
Bloom, B. S., & Krathwohl, D. R. (1956). Taxonomy of educational objectives, the classifi- Qualitative Inquiry, 3(4), 453–467. https://doi.org/10.1177/1098214006296430.
cation of educational goals by a committee of college and university examiners. Handbook Morell, J. A., & Flaherty, E. W. (1978). The development of evaluation as a profession:
I: Cognitive domain. New York: Longmans. Current status and some predictions. Evaluation and Program Planning, 1(1), 11–17.
Buckley, J., Archibald, T., Hargraves, M., & Trochim, W. M. (2015). Defining and teaching https://doi.org/10.1016/0149-7189(78)90003-4.
evaluative thinking: Insights from research on critical thinking. American Journal of Morris, M. (2015). Research on evaluation ethics: Reflections and an agenda. In P. R.
Evaluation, 36(3), 375–388. Brandon (Vol. Ed.), Research on evaluation. New directions for evaluation, 148, 31–42:
Campbell, D. T. (1969). Reforms as experiments. The American Psychologist, (4), 409. Vol. 2015, (pp. 31–42). . https://doi.org/10.1002/ev.20155.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for Newman, D. L. (1999). Education and training in evaluation ethics. New Directions for
research. Chicago, IL: Rand McNally College Publishing Company. Evaluation, 82, 67–76. https://doi.org/10.1002/ev.1138.
Canadian Evaluation Society (2010). Competencies for Canadian evaluation practice. Newman, D. L., & Brown, R. D. (1996). Applied ethics for program evaluation. Thousand
Retrieved fromhttps://evaluationcanada.ca/txt/2_competencies_cdn_evaluation_ Oaks: Sage Publications.
practice.pdf. Nunns, H. (2016). The practice of evaluative reasoning in the Aotearoa New Zealand public
Carlis, J. V. (2013). Design: The key to writing a one-draft thesis. Self-publishedSelf-pub- sector. Massey University.
lished. Ozeki, S., Coryn, C. L. S., & Schröter, D. C. (2019). Evaluation logic in practice: Findings
Chelimsky, E. (2013). Balancing evaluation theory and practice in the real world. from two empirical investigations of American Evaluation Association members.
American Journal of Evaluation, 34(1), 91–98. https://doi.org/10.1177/ Evaluation and Program Planning, 76(March), 101681. https://doi.org/10.1016/j.
1098214012461559. evalprogplan.2019.101681.
Chen, H. T., Gargani, J., Stead, B., & Norman, C. (2015). Program design business Patton, M. Q. (2008). Utilization-focused evaluation. Thousand Oaks: Sage Publications.
meeting: Program design – Evaluation’s new frontier? American Evaluation Association Patton, M. Q. (2011). Developmental evaluation: Applying complexity concepts to enhance
Annual Conference. innovation and use. New York: Guilford Press.
Coryn, C. L. S., Wilson, L. N., Westine, C. D., Hobson Kristin, A., Ozeki, S., Fiekowsky, E. Pawson, R., & Tilley, N. (1997). Realistic evaluation. London: Sage.
L., ... Schröter, D. C. (2017). A decade of research on evaluation: A systematic review Pawson, R., & Tilley, N. (2014). Realistic evaluation (2nd). London: Sage Publications, Ltd.
of research on evaluation published between 2005 and 2014. American Journal of Petrie, H. G. (1995). Purpose, context, and synthesis: Can we avoid relativism? New
Evaluation, 38(3), 329–347. Directions for Evaluation, 1995(68), 81–91. https://doi.org/10.1002/ev.1021.
Cram, F. (2018). Conclusion: Lessons about Indigenous evaluation. In F. Cram, K.A. Poth, C., Lamarche, M. K., Yapp, A., Sulla, E., & Chisamore, C. (2014). Toward a defi-
Tibbetts, & J. LaFrance (Eds.), Indigenous Evaluation. New Directions for Evaluation, nition of evaluation within the Canadian context: Who knew this would be so diffi-
159, 121-133. cult? Canadian Journal of Program Evaluation, 29(1), 87–103. https://doi.org/10.
Cram, F., Tibbets, K.A., & LaFrance, J. (Eds.). (2018). Indigenous Evaluation. New Directions 3138/cjpe.29.1.87.
for Evaluation, 159. Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach
Crane, J. A. (1988). Evaluation as scientific research. Evaluation Review, 12(5), 467–482. (7th). Thousand Oaks, CA: Sage.
https://doi.org/10.1177/0193841X8801200501. Rowe, A. (2017). Evaluation for the Anthropocene. AES International Evaluation
Creswell, J. W. (2003). Research design, qualitative, quantitative, and mixed methods ap- Conference.
proaches (2nd ed.). Thousand Oaks, CA: SAGE. Saxe, J. G. (n.d.). The Blind Men and The Elephant. Retrieved from https://www.
Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San allaboutphilosophy.org/blind-men-and-the-elephant.htm.
Francisco, CA: Jossey-Bass, Inc. Schwandt, T. A. (2002). Evaluation practice reconsidered. New York: Peter Lang.
Dart, J., Webb, S., & Tolmer, Z. (2017). Stepping out: Evaluators working as designers. Schwandt, T. A. (2008). Educating for intelligent belief in evaluation. American Journal of
Australasian Evaluation Society Annual Conference. Evaluation, 29(2), 139–150. https://doi.org/10.1177/1098214006296430.
Davidson, E. J. (2005). Evaluation methodology basics: The nuts and bolts of sound evalua- Scriven, M. (1967). The methodology of evaluation. In R.W., Tyler, R.M., Gagné & M.,
tion. Thousand Oaks, CA: Sage Publications. Scriven (Eds.) Perspectives of Curriculum Evaluation. AERA Monograph Series on
Davies, R., & Mackay, K. (2014). Evaluator training: Content and topic valuation in Evaluation. No. 1, pp. 39-83. Chicago: Rand McNally.
university evaluation courses. American Journal of Evaluation, 35(3), 419. https://doi. Scriven, M. (1991). Evaluation thesaurus (4th). Newbury Park, CA: Sage.
org/10.1177/1098214013520066. Scriven, M. (1993). The nature of evaluation. New Directions for Program Evaluation,
Dewey, J. D., Montrosse, B. E., Schröter, D. C., Sullins, C. D., & Mattox, J. R. (2008). 1993(58), 5–48. https://doi.org/10.1002/ev.1640.
Evaluator competencies: What’s taught versus what’s sought. American Journal of Scriven, M. (1994). The final synthesis. American Journal of Evaluation, 15(3), 367.
Evaluation, 29(3), 268–287. https://doi.org/10.1177/1098214008321152. https://doi.org/10.1177/109821409401500317.
Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2011). Program evaluation: Alternative Scriven, M. (2003). Evaluation in the new millennium: The transdisciplinary vision. In S.
approaches and practical guidelines (4th). Upper Saddle River, N.J: Pearson Education. I. Donaldson, & M. Scriven (Eds.). Evaluating social programs and problems: A vision for
Fournier, D. M. (1995). Establishing evaluative conclusions: A distinction between gen- the new millenium (pp. 19–41). Mahwah, N.J: Lawrence Erlbaum.
eral and working logic. New Directions for Program Evaluation, 1995(68), 15–32. Scriven, M. (2005). Logic of evaluation. In S. Mathison (Ed.). Encyclopedia of evaluation
https://doi.org/10.1002/ev.1017. [electronic resource] (pp. 235–238). Thousand Oaks, CA; London: SAGE.
Gluckman, P. (2013). The role of evidence in policy formation and implementation. Scriven, M. (2007). The logic of evaluation. In H. V. Hansen (Vol. Ed.), Dissensus & the
Greene, J. C. (2005). A Value-Engaged Approach for Evaluating the Bunche-Da Vinci Search for Common Ground: 138, (pp. 1–16). Windsor, ON: Ontario Society for the
Learning Academy. In M. C. Alkin, & C. A. Christie (Vol. Eds.), Theorists Models in Study of Argumentation.
Action New Directions for Evaluation: 2005, (pp. 27–45). Jossey-Bass. Scriven, M. (2016). Roadblocks to recognition and revolution. American Journal of
Gullickson, A. M. (2018). Doing evaluation: Task analysis as a pathway to progress Evaluation, 37(1), 27–44.
evaluation education. Australasian Evaluation Society Conference. Shadish, W. R. J., Cook, T. D., & Leviton, L. C. (1991). Foundations of program evaluation:
House, E. R. (2014). Evaluation: Values, biases, and practical wisdom. Charlotte, NC: Theories of practice. Newbury Park, CA: Sage Publications.
Information Age Publishing. Smith, M. F. (1999). Should AEA begin a process for restricting membership in the pro-
Kegan, R., & Lahey, L. L. (2001). How the way we talk can change the way we work: Seven fession of evaluation? American Journal of Evaluation, 20(3), 521. https://doi.org/10.
languages for transformation. San Francisco: Jossey-Bass. 1177/109821409902000311.
Kegan, R., & Lahey, L. L. (2009). Immunity to change: How to overcome it and unlock po- Smith, N. L. (1995). The influence of societal games on the methodology of evaluative
tential in yourself and your organization (1st). Boston, MA: Harvard Business School. inquiry. New Directions for Program Evaluation, 1995(68), 5–14. https://doi.org/10.
Kirkhart, K. E. (2000). Reconceptualizing evaluation use: An integrated theory of influ- 1002/ev.1016.
ence. New Directions for Evaluation, 2000(88), 5–23. https://doi.org/10.1002/ev. Smith, N. L. (1998). Professional reasons for declining an evaluation contract. American
1188. Journal of Evaluation, 19(2), 177.
LaVelle, J. M. (2014). In S. I. Donaldson (Ed.). An examination of evaluation education Stake, R. E. (1977). The countenance of educational evaluation. In A. A. Bellack, & H. M.
programs and evaluator skills across the world. United States – California: The Kliebard (Eds.). Curriculum and evaluation (pp. 372–390). Berkeley, CA: McCutchan.
Claremont Graduate University. Stake, R. E., Migotsky, C., Davis, R., Cisneros, E., Depaul, G., Dunbar, C., ... Williams, B.
Lemire, S., Nielsen, S. B., & Christie, C. A. (2018). Toward understanding the evaluation

8
A.M. Gullickson Evaluation and Program Planning 79 (2020) 101787

(1997). The evolving syntheses of program value. Evaluation Practice, 18(2), 89–103. Amy is a Senior Lecturer at the University of Melbourne
Stevahn, L., King, J. A., Ghere, G., & Minnema, J. (2004). Essential Competencies for Centre for Program Evaluation. Her evaluation and research
Program Evaluators Self-Assessment. Retrieved 31 January 2020, from https://www. experience spans academia, health, education, business, not
youtube.com/watch?v=8E0aZ387M_I. for profit, international development, and faith commu-
Symonette, H., Mertens, D. M., & Hopson, R. (2014). The development of a diversity nities. She has taught and developed more than 10 different
initiative: Framework for the graduate education diversity internship (GEDI) pro- evaluation courses, she served as academic lead for the
gram. In P. M. Collins, & R. K. Hopson (Eds.). New directions for evaluation (pp. 9– University of Melbourne's first online graduate programs,
22). . and helped to lead formation of the International Society
Szanyi, M., Azzam, T., & Galen, M. (2012). Research on evaluation: A needs assessment. for Evaluation Education, which includes more than 90
Canadian Journal of Program Evaluation, 27(1), 39–64. people across 12 countries. Her current research projects
Weiss, C. H. (1988). Reports on topic areas: Evaluation for decisions: Is anybody there? include evaluator education, evaluative synthesis and eva-
Does anybody care? American Journal of Evaluation, 9(1), 5–19. https://doi.org/10. luation mainstreaming.
1177/109821408800900101.
Worthen, B. R. (1994). Is evaluation a mature profession that warrants the preparation of
evaluation professionals? New Directions for Program Evaluation, 62(Summer), 3–15.
Yarbrough, D. B., Caruthers, F. A., Shulha, L. M., & Hopson, R. K. (2010). The program
evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand
Oaks, CA: Sage.

You might also like