Williams 2009 Theoryon Frontline

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/249773149

The Development of Theory-Driven Evaluation in the


MilitaryTheory on the Front Line

Article  in  American Journal of Evaluation · March 2009


DOI: 10.1177/1098214008329522

CITATIONS READS

10 2,119

2 authors, including:

Andrew Williams
European Southern Observatory
11 PUBLICATIONS   141 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Andrew Williams on 04 June 2014.

The user has requested enhancement of the downloaded file.


American Journal of Evaluation
http://aje.sagepub.com

The Development of Theory-Driven Evaluation in the Military: Theory on the Front


Line
Andrew P. Williams and John C. Morris
American Journal of Evaluation 2009; 30; 62
DOI: 10.1177/1098214008329522

The online version of this article can be found at:


http://aje.sagepub.com/cgi/content/abstract/30/1/62

Published by:

http://www.sagepublications.com

On behalf of:
American Evaluation Association

Additional services and information for American Journal of Evaluation can be found at:

Email Alerts: http://aje.sagepub.com/cgi/alerts

Subscriptions: http://aje.sagepub.com/subscriptions

Reprints: http://www.sagepub.com/journalsReprints.nav

Permissions: http://www.sagepub.com/journalsPermissions.nav

Citations http://aje.sagepub.com/cgi/content/refs/30/1/62

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


American Journal of
Evaluation
Volume 30 Number 1
March 2009 62-79
The Development of # 2009 American Evaluation
Association
Theory-Driven Evaluation 10.1177/1098214008329522
http://aje.sagepub.com

in the Military hosted at


http://online.sagepub.com

Theory on the Front Line


Andrew P. Williams
John C. Morris
Old Dominion University

The use of theory-driven evaluation is an emerging practice in the military—an aspect generally
unknown in the civilian evaluation community. First developed during the 1991 Gulf War and
applied in both the Balkans and Afghanistan, these techniques are now being examined in the North
Atlantic Treaty Organisation (NATO) as a means to evaluate the effects of military operations in
complex, asymmetric conflict environments. In spite of these practices, theory-driven evaluation
in the military is still in the developmental stages. This article traces the development to date of
theory-driven evaluation in NATO and assesses its strengths and weaknesses in the military con-
text. We conclude that a cross-pollination of ideas between military and civilian evaluators is
urgently needed to improve the quality and effectiveness of military evaluation.

Keywords: theory-driven evaluation; military evaluation; effects-based approach to operations;


assessment; utilization; NATO

F rom the development of methodologies to test the effectiveness of various educational


initiatives in schools and universities in the early 20th century, the field of evaluation
has evolved and grown into a discipline of study in its own right, and one that interlinks
with many research disciplines including: Social science, public administration, public policy,
and education. Throughout the world, it is now commonplace to find whole evaluation depart-
ments integrated with government agencies, nonprofit, nongovernmental, and international
organizations. Furthermore, within each respective organization, the role of evaluation typi-
cally has a place throughout the entire policy and service delivery process, from the identifi-
cation of a public need to ensuring the value and quality of final, implemented solutions
(Rogers, 2007).
The processes involved in planning, executing, and evaluating a military operation bear
many similarities to that of planning, implementing and evaluating projects and services in
civilian organizations. At the simplest level, in both civilian and military domains, a problem
is identified that requires some action to be taken to alter the situation and rectify the problem.
Although the idea of assessing both the progress and the impact of this action has been rela-
tively well understood and extensively studied in civilian settings, the concept is surprisingly
underdeveloped in the military domain.

Authors’ Note: Please address correspondence to Andrew Paul Williams, Constant Hall, Suite 2084, Norfolk, VA
23529; (757) 577-2921; e-mail: awill123@odu.edu. The views expressed in this article are the views of the authors
and do not necessarily represent the views of NATO.

62
Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009
Williams, Morris / The Development of Theory-Driven Evaluation in the Military 63

Historically, most military evaluations were carried out as simple performance monitoring
known as Battle Damage Assessment, which involved collecting data on ordnance expenditure
rates, targets hit rates, target damage assessment, casualty rates, and area of ground captured
(Diehl & Sloan, 2005). There was generally no effort systematically to measure outcomes or
impacts in a comprehensive manner. It would be incumbent upon a few key individuals in
leadership positions to combine all information received from various sources and ‘‘con-
struct’’ an evaluation of progress and results, which would subsequently inform further plan
refinement and operational management (Curry, 2004; Rauch, 2002).1 In recent years the con-
cept of an ‘‘effects-based approach to operations’’ (EBAO) has gained prominence in military
theory. In essence, EBAO is a theory-based construct that, among other aspects, calls for the
explicit measurement of task or ‘‘action’’ accomplishment and result or ‘‘effect’’ achievement.
Certain militaries, especially those of the United States, the United Kingdom, and the North
Atlantic Treaty Organisation (NATO), have spurred the development of EBAO and initiated
operational use to some extent. The part of EBAO that is concerned with the measurement of
program implementation, progress toward outcomes, and creation of impacts came to be
known as ‘‘effects-based assessment’’ (EBA). As the military community is still undecided
on the terminology, with similar terms such as ‘‘campaign assessment,’’ and ‘‘engagement
space assessment’’ in use, for simplicity we shall refer to EBA simply as ‘‘Assessment’’ from
hereon.2
In general, the military has endorsed this new addition to their toolbox of operations manage-
ment resources; however, the techniques have been swiftly adopted without detailed theoretical
or philosophical examination. The current reality is that military commanders on the ground are
using results from Assessment. The danger is that the military personnel using these methods do
not have a full appreciation of the difficulties and limitations, nor the manner in which they may
be most usefully employed. This is important, especially as military operations and programs
involve significant amounts of public money, and often feature direct loss of life.
Given the current use of Assessment by American and Allied troops and civilians in Iraq,
Afghanistan and the Balkans, it is our hope to stimulate discussion in the evaluation commu-
nity on a particularly challenging and ambitious case of program evaluation and to initiate the
cross-pollination of ideas, and most importantly—experience, which could improve the quality
and use of military evaluation. There is an important role for civilian evaluators to play in the
continued development and implementation of theory-driven evaluation in the military.
This article has four sections. First, we trace the development of and describe the salient
points of theory-driven evaluation in the civilian domain. Second, we examine theory-
driven thinking in military planning and evaluation, up to the current state of development
in NATO. Third, we describe Assessment in detail, showing the equivalence to Chen’s
(1990) evaluation work. We conclude with an evaluation perspective, considering the
strengths and weaknesses of a theory-driven approach in the military context.

Theory-Driven Evaluation

As noted, the new military operations management construct of EBAO and its evaluation
counterpart, Assessment, are theory-based concepts. The use of theory as a basis for evaluation
has a long history in the civilian domain, although the idea only gained prominence in the
1980s before which, the evaluation literature was rarely concerned with incorporation of
theory into evaluation processes (Chen, 1990; Gargani, 2003). A review by Lipsey Crosse,
Dunkle, Pollard, and Stobart (1985) of 175 evaluation studies noted that ‘‘most of the
programs. . . evaluated did not provide simple structured treatments that could be expected

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


64 American Journal of Evaluation / March 2009

to work through readily apparent mechanisms’’ (p. 20). In fact, one of the main reasons for the
growing interest in theory-driven evaluation was ‘‘the usual inability of even the most sophis-
ticated experimental evaluations to explain what factors were responsible for the program’s
success—or failure’’ (Weiss, 1997a, p. 502).
Although some evaluators argued that the focus of any evaluation should be on the quality,
value, and success of any implemented program (Scriven, 1994), theorists recognized that
failing to identify the underlying causal mechanisms that led to program success would
not allow deficiencies in either program implementation or program design to be suitably
identified and addressed (Chen, 1994). Thus, program improvement was seen as a key goal
in any program evaluation (Chen, 1994; Posovac & Carey, 2006; Rogers, 2000). The defining
characteristic of theory-driven evaluation is that the assumptions and mechanisms behind a
program can be expressed in a logical sequence of cause and effect statements—also known
a program theory (Weiss, 1997a).
The concept of program theory was seen as key to both the successful design of programs
and their subsequent evaluation (Bickman, 1987). Chen and Rossi (1992, p. 43) described it
as ‘‘a specification of what must be done to achieve the desired goals, what other important
impacts may also be anticipated, and how these goals and impacts could be generated.’’
Program theory acts as a base upon which variables may be operationalized to design and
conduct an evaluation—resulting in many varied benefits (Bickman, 1987; Scheirer,
1987). Donaldson (2007) notes that theory-driven evaluation provides a useful tool to probe
the validity of an evaluation and can assist in identifying and controlling extraneous sources
of variance. Furthermore, it is method neutral, freeing the evaluator from methodological
constraints. The advantages are wider than solely evaluation design and practice. For exam-
ple, Chen and Rossi (1981) argue that the successful development and evaluation of program
theory contributes to advancing fundamental social science knowledge. Chen (2005) and
Sullivan and Stewart (2006) further note that program theory provides a strategy to incorpo-
rate stakeholder views in the program design and evaluation. The evaluation literature
provides many more extensive examples on the benefits of performing evaluations based
from program theory.
Even given the wide interest and acclaim in the literature, theory-driven evaluation is not
without its disadvantages and its critics. Weiss (1997a)—a leading proponent of the tech-
nique—described key challenges in theory-driven evaluation: Program theory is notoriously
difficult to construct and the level of granularity to which the evaluator must go is often uncer-
tain; multiple theories are possible—often requiring multiple sets of measurement indicators—
thus, the technique is resource and data intensive; and by focusing on the program theory, the
evaluator may ignore many other important effects and causes (see also Shaw & Crompton,
2003). Others have noted that although program theory may provide focus to an evaluation,
it provides a focal point for politicization of the evaluation (English & Kaleveld, 2003).
Finally, it has been pointed out that in reality, few social science theories actually exist for the
program areas that evaluators typically face, thus the evaluator often begins with a grounded
theory approach in developing program theory (Stufflebeam, 2001).
As the popularity of theory-driven approaches grew in the 1980s, Chen and Rossi (1981,
1983, 1987) progressively developed an increasingly detailed framework for performing
theory-driven evaluation, culminating in Chen’s (1990) seminal textbook, Theory-Driven
Evaluations,3 of which we now summarize the key concepts. In this work, Chen presented
a comprehensive framework for program evaluation, in which program theory was integrated
from start to finish. He conceptualized two main domains of program theory on which evalua-
tions were performed: Normative and causative. Normative domain theory specifies the
‘‘goals or outcomes (that) should be pursued or examined, and how the treatment should be

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 65

designed and implemented (Chen, 1990, p. 43). In this sense, Chen expanded the role of pro-
gram evaluation to assist program planners in designing and planning programs by providing
‘‘the rationale and justification for the program structure and activities’’ (p. 43). Causative
domain theory aims to be empirically based and describes the ‘‘underlying causal mechanisms
that link, mediate or condition the causal relationship between treatment variables and out-
come variables in a program’’ (p. 44). Program evaluation of the causative theory is evaluation
in the ‘‘usual’’ sense—that is assessing the impacts of the program—however, how those
impacts were created is also an important area of consideration.
These two domain theories are broken down into a structured typology of theories on which
evaluations can be focused:

1. Normative treatment evaluation.


2. Normative implementation environment evaluation.
3. Normative outcome evaluation.
4. Impact evaluation.
5. Intervening mechanism evaluation.
6. Generalization evaluation.

Program evaluators may typically begin with normative outcome evaluation which ‘‘assist(s)
stakeholders in identifying, clarifying, or developing the goals or outcomes of a program’’
(p. 91). In fact, three types of outcome evaluations are described that see the evaluator:
Develop new goals or perform ‘‘goal revelation’’ by working closely with program stake-
holders; prioritize existing goals in the context of the program situation; and provide direction
and structure in unfocused programs with politically developed goals. This evaluation is often
an important stage of program theory development (Chen, 2005).
Once a program is underway, the remaining evaluations can take place. The normative treat-
ment evaluation seeks to determine the congruency between the planned program (the normative
treatment theory) and the actual implemented program—with the aim to gather recommenda-
tions for improvement of program operations. Especially in the case where implementation went
wrong, the treatment evaluation may be closely aligned with a normative implementation envi-
ronment evaluation which assesses environmental factors that affect program implementation.
Causative program theory evaluations are the key to theory-driven evaluations. Impact
evaluations are ‘‘usual’’ in the sense that they assess the impact of the implemented program
on the defined outcomes; however, the measurement variables will be derived from a prespe-
cified theory. By considering the causal processes between the treatment and the outcome, the
intervening mechanism evaluation expands the scope of what was once considered ‘‘usual’’
evaluation. It is this stage where the program theory is most critically tested. Finally, the gen-
eralization evaluation will consider issues of validity and generalizability to other programs,
where the focus is on broadening and generalizing the program theory. Several composite
forms are possible, such as normative treatment-impact evaluation, which focuses on specific
aspects of treatment components on impacts.
The work of Chen (1990, 1994, 2005), Rossi (1981, 1983, 1987, 1992), Bickman (1987, 2000),
Donaldson (2007), Weiss (1997a, 1997b) and others, have brought the evaluation community a
rigorous and comprehensive approach to conducting theory-driven evaluation that facilitates
incorporation of the needs of program planners, decision makers, stakeholders, and evaluators.
It encourages evaluators to have technical and methodological expertise, but also, the incor-
poration of program theory encourages, and indeed requires the evaluator to have expert and
substantive knowledge in the program itself—a fact that is critical in the military situations to
be described.

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


66 American Journal of Evaluation / March 2009

Following Chen’s (1990) conception of evaluation, in this article we demonstrate that


military Assessment is an example of theory-driven evaluation and consider the advantages and
disadvantages of its use in the military context. This comparison is striking in the respect that
the two examples were developed in complete isolation; we conclude that this is a reflection
of the strengths and appropriateness of the theory-driven method. In the positivist culture of
the military with a very distinct organizational and operational context, it is significant for the
civilian evaluation community to note that, from the arguments developed in this article,
theory-driven evaluation is considered the ‘‘best fit’’ in that context. We now turn to a description
of the military development of EBAO and Assessment.

The Development of EBAO and Assessment in the Military

The Changing Security Environment


The development of EBAO has been spurred by the changing security environment since
the end of the Cold War. The focus of military power has shifted from symmetric force-
on-force combat with fixed logistics on home territory, to a much larger spectrum of expedi-
tionary operations that may involve mixes of war fighting, counter-insurgency, civil support,
reconstruction, and humanitarian aid. Although these particular missions would have featured
in many cold war scenarios, they were consequences of government policy at the time which
was simply to overwhelm the Soviet Union in direct military confrontation, thus resulting
in massive destruction followed by necessary but unintended ‘‘nonkinetic’’ humanitarian
operations. The difference now is that these nonkinetic missions have become actual policy,
in that governments explicitly seek to engage in such endeavors; the chance of massive
military confrontation being significantly reduced.
This shift in focus has increased the complexity to which the military commander must
adapt and engage. Many cold war conflict scenarios were very complicated, but the aims were
generally simple: Deterrence or defeat; any missions resulting as a consequence or aftermath
of military action were secondary. Now that the new mission types (such as the reconstruction
of Iraq or the NATO missions in the Balkans) address particularly complex goals, the need for
a new management system that accounts for nonmilitary considerations in addition to purely
military aims was recognized. Furthermore, in environments where massive military destruc-
tion is not expected, there are likely to be many other actors operating, including civilian gov-
ernment agencies, international organizations, nongovernmental organizations, and charities.
It is desirable, therefore, to operate synergistically with these actors to produce a cumulative,
beneficial effect (Alberts & Hayes, 2007).
The most recent NATO documentation on EBAO states the following: ‘‘EBAO is the
coherent and comprehensive application of the various instruments of the Alliance combined
with the practical cooperation with involved non-NATO actors, to create desired effects
necessary to achieve objectives and ultimately the NATO end state’’ (NATO, 2007a, p. 10).
Underlying this statement is a philosophy that has emerged in warfare over millennia: Military
action alone cannot defeat an enemy—a combined and coordinated application of all instru-
ments of power in the political, civil, economic, and military domains produces maximum
results. From the turbulent 1800s—in which the military strategists that founded modern
military doctrine recognized the inseparability of military action from political action, and the
interrelationship of actors in any situation (Clausewitz, 1832/1968; Moltke, 1871/1993)—this
concept can be traced from its first major applications in World War II (WWII) to today’s
current operations (Jobbagy, 2005; Smith, 2003, 2006).

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 67

On defining EBAO, NATO (2007b) documentation states further that

the central idea of EBAO is the use of different instruments to create effects that alter the beha-
viour and capabilities of different actors in the engagement space to achieve our objectives and
end-state. Therefore, EBAO requires a clear understanding of these different instruments and of
the nature of the different systems we seek to influence. (p. 4)

We see that the requirement for a comprehensive systemic understanding of the operational
environment is recognized. Again, this concept is not new, but what differentiates EBAO from
other operations management techniques is that systemic understanding—the development of
a theory of the operational environment—is a foundation for the whole of planning, manage-
ment, and assessment of operations.

EBAO—A New Management Construct


Perhaps somewhat unintentionally, EBAO began under the same philosophy as results-
based management (RBM). The air-power military strategists that first developed EBAO
during and after the 1991 Gulf War had realized that by focusing on the consequences of
destruction of Iraqi air-defense rather than the actual destruction, coalition planners were able
dramatically to increase the air strike assets available4 (Deptula, 2001a, 2001b; Murray &
Scales, 2003). This led to the realization that the way in which the accomplishment of objec-
tives were assessed needed to be rethought, as focusing on percentage damage was misleading
when infrastructure did not need to be damaged to produce a certain effect. What the military
strategists had emulated was an RBM construct of planning, managing and assessing
operations.
The RBM philosophy was first described in Drucker’s (1954) seminal management text,
The Practice of Management, in which he delineates the process of management by objectives.
His techniques were used by private sector management, before being adopted by the United
States Agency for International Development which developed the concept of a ‘‘Logframe’’
or Logical Framework—an analytical tool used to plan, monitor, and evaluate projects
(Rogers, 2007; Sartorius, 1991). The name arises from the logical linkages set out by the plan-
ners to connect a project’s means with its ends (UNESCO, 2007). The technique was adopted
by many democratic governments in the 1980s as the philosophy of ‘‘New Public Manage-
ment’’ emerged, which was driven in part, by a recognized need for governments to be more
accountable and responsive to the public, and generally more efficient.
In the civilian context, the core aim of using RBM is to shift the focus of planning, man-
aging, and decision making, from inputs and processes, to the results to be met. In a similar
vein, the change from ‘‘traditional’’ military planning to EBAO shifts the focus from inputs,
methods and targets, to outcomes (effects). Furthermore, EBAO places an emphasis on effects
at all levels of the campaign. The traditional military planning system considered results at the
strategic level, whereas at the operational level planning was focused on methods. However, it
is important to note that the development of EBAO has taken it from the common log frame
model to one requiring intervening mechanism theory (Chen, 1990) as a core component. The
developers of EBAO realized that a military plan constructed from a list of treatments and
outcomes, as specified in a log frame, did not provide the ‘‘operational art’’ gained by under-
standing the detailed intervening mechanisms and causal relationships in a plan.
EBAO first begins with a theoretical model derived from systemic analysis of the current
state of the operational environment and expected state after the application of a set of actions;
then second, after execution of these actions calls for a periodic assessment of their success to

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


68 American Journal of Evaluation / March 2009

confirm or refute the theory. This process of Assessment is fundamental to the concept of a
theory-driven evaluation. By persistently assessing the congruency between theoretical mod-
els (plans) to reality (results) the military commander is given a powerful tool to determine the
success of the implementation of the plan, the impact of the planned actions, the accuracy of
his or her situational understanding, and the understanding of the causative factors in the
environment.

Assessment

For the purpose of the majority of readers who are unlikely to be familiar with the military
system of Assessment, we take the liberty to elaborate on some basics. The reader should note
that we describe the ‘‘theory’’ of EBAO and Assessment, as intended by the developers. There
is some difference in the way in which it is currently practiced, which we allude to several
times in the remaining discussions.
The primary purpose of Assessment is to increase the effectiveness of the execution of mil-
itary operations. By continually monitoring and analyzing the implementation of actions and
accomplishment of effects, the intention of Assessment is to guide the operational commander
in making informed adjustments to the plan being executed. Assessment aims to provide a
validation of causality in the plan design by confirming that the actions performed are indeed
creating the desired effects, and to improve understanding of the workings of the operational
environment. Assessment also plays an important role in providing situational awareness rela-
tive to the plan.
Although the developers of Assessment call for an independent evaluation unit in the Head-
quarters (HQ) structure, in practice to date, the Assessment staff is usually a team of military
officers and civilian analysts drawn from other areas of the HQ organization, though certain
nations and NATO actually have specific job billets for Assessment, but very few. Generally,
the civilian analysts are specialists with specific training in analysis techniques, although the
authors have not yet encountered any that have awareness of the ‘‘civilian’’ evaluation field—
a fact that is quite surprising, given the level and quantity of development of evaluation
technique and theory in the civilian domain. The military analysts are not specialists and rely
a great deal on their civilian counterparts for the actual analysis. NATO Assessment literature
(NATO, 2006a, 2007a, 2007b) notes that the Assessment staff should work closely with the
planning staff and intelligence and systems analysis staff, who are responsible for creating
‘‘models’’ of the operational environment; however, it is the authors’ impression that this
formal relationship is not yet fully enacted in current HQ, mainly due to the infancy of systems
analysis capabilities.5
Assessment is based around three distinct areas: Assessment design, data collection and
analysis, and assessment reporting. We will cover each area in turn, demonstrating the connec-
tions with Chen’s (1990) theory-driven evaluation work. The reader is invited to use Table 1 as
a guide to understand the connections between military and civilian evaluation terminology.
The focus will be on assessment design as the issues discussed in this article relate principally
to this stage.

Assessment Design in Planning


There are several precursor activities involving plans development that must be considered
before the discussion on Assessment design. The operational HQ’ intelligence and systems
analysis staffs produce a model of the operational environment. The creation of the model

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 69

Table 1
Mapping Between Civilian and Military Evaluation Terminology
Civilian (Chen, 1990) Military (NATO)

Program goals End-state


Descriptive program theory Systemic analysis
Normative program theory Effects-based plan
Generalization theory No direct comparison
Intervening mechanism theory Effects development and linkage with actions
Impact theory Change in effects
Outcome theory Effects
Implementation environment theory Considered as part of actions
Treatment theory Actions
Generalization evaluation Currently no direct comparison, but considered implicitly
Intervening mechanism evaluation Effect–action analysis
Impact evaluation MOE analysis
Outcome evaluation Effects development
Implementation environment evaluation Considered as part of MOP analysis or effect–action analysis
Treatment evaluation MOP analysis

Notes: MOE ¼ measure of effectiveness; MOP ¼ measure of performance; NATO ¼ North Atlantic Treaty
Organisation.

begins by breaking down the key elements of the operational environment into categories,
usually political, military, economic, social, information and infrastructure. Within each
category, the analysis may focus on: Political institutions, parties and personalities; public
administration institutions and personalities; military force composition, dispositions and per-
sonnel; economic centers and markets; social actors and institutions; media and communications
institutions, and national infrastructure. Relationships are then identified between these consti-
tuents and positive and negative influences are incorporated, where known. NATO and certain
nations, notably the United States and Germany, are currently investigating the use of computer
modeling tools involving influence diagrams and system dynamics to perform systems analysis.
This essentially is the first stage of a descriptive program theory: That is, a description of
the current state of affairs as it is believed to exist, based off the information available at the
time. This descriptive program theory is the starting point for the creation of the effects-based
plan. The plan is designed principally by the operational commander and his planning staff,
with the advice of the intelligence and systemic analysis staff. Usually, plan development is
top-down—the planning staff take the strategic-level or overall mission goals or ‘‘end-state’’
from senior political and military leaders and cascade the outcome variables, that is effects,
from these top level goals. There is an immediate and explicit assumption that the creation
of these effects in the operational environment is causally linked to the achievement of the
end-state.6 As indicated in Table 1, the effects part of the plan can be considered as normative
outcome domain theory.
Effects are derived from, in principle, detailed analysis of the descriptive program theory,
from which key elements and relationships existing in the operational environment are
elicited. Effect statements are outcome variables that state the commander’s intended view
on what these key elements and relationships should be. Effect statements may also include
impacts that detail the expected change. Typically, an effects-based plan contains hierarchies
of effects and subeffects, depending on the complexity of the operational environment and
intended mission. The lower level subeffects are linked to actions, thus providing the impor-
tant intervening mechanism program theory. Although the process of creating this stage

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


70 American Journal of Evaluation / March 2009

represents a form of normative outcome evaluation, in the military system it is considered a


part of planning and is not identified as a separate activity.
It is at the effects-development stage where the Assessment staff in the HQ becomes
involved. For each effect, analysts are responsible for attributing one or more ‘‘measures of
effectiveness’’ (MOE) that allow the accomplishment of the effect to be measured quantita-
tively. Each MOE must have a corresponding ‘‘threshold of success’’ that determines the
criterion for a successfully accomplished effect or for failure. The analysis using MOE is
equivalent to impact evaluation. Experience over the past 10 years has led to the identification
of key attributes an MOE must possess. According to Lambert (2002), Defence, Science and
Technology Laboratory [DSTL] (2005), and NATO (2007c), an MOE must be: Relatable to
the mission and directly tied to an effect; meaningful given the context to describe the
expected change in the effect; measurable consistently over time and reducible to a quantita-
tive value; describing only one variable; sensitive to change in a realistic period of time; cost-
effective and not burdensome to the data collectors; and culturally and locally relevant such
that thresholds are derived from local standards, expectations, and cultures. The theory of
EBAO places great importance on MOE, to the extent that measurability is a core criterion
in the development of effects (NATO, 2007a). However, the Assessment staff is not
constrained to use only MOE as their prime means of evaluation; often data collection occurs
outside the formal structures of the plan (Evans, 2003; Hopkin, 2004; Lambert, 2002; Neighbour
et al., 2002).
The effects-development stage will be paralleled by an activity between the planning and
Assessment staff to develop ‘‘actions,’’ which can be equated to normative treatment program
theory. Actions prescribe what the military forces will actually do to achieve the planned
effects and therefore a causal linkage between the planned actions and intended effects is
implied. Typically, the effects-based plan produced at the operational level of command
specifies the actions to a limited level of detail, instead leaving the subordinate commands
to elaborate on detailed task descriptions and resource allocations. In commonality with the
effects, the Assessment staff creates so-called ‘‘measures of performance’’ (MOP) to allow
measurement of action accomplishment, which, as indicated in Table 1, is equivalent to
treatment evaluation. It should be noted that although actions and MOP are required in EBAO
theory, to the authors’ best knowledge, no military forces have successfully been able to
perform a treatment evaluation against actions. However, this appears to be a consequence
of limited resources devoted to Assessment. It should be noted that the treatment component
of Assessment has been tested in controlled studies on several occasions (NATO, 2006b,
2007d, 2008b; Swedish Armed Forces Headquarters, 2007).7
The Assessment staff is responsible for producing an assessment plan, which can be con-
sidered equivalent to an evaluation design. Once the MOE and MOP are developed from the
corresponding effects and actions, attention turns to consideration of data collection, analysis
strategies and reporting methods. However, before these are discussed several observations
must be noted about the effect-based plan or normative program theory.
Taking Bickman’s (1987) definition of program theory as ‘‘a plausible and sensible model
of how a program is supposed to work’’ (p. 5), an effects-based plan can be considered as a
normative program theory, in that it explicitly spells out the actions that must be taken to create
certain effects in the operational environment and lead to the end-state, based on an underlying
theory or set of theories. The plan can be thought of a theoretical sequence of hypotheses: If
actions a1–ai are performed on elements n1–nj then effects e1–ek will be created. These hypoth-
eses are derived from the descriptive and causative theory in the systemic analysis. The aim of
Assessment is to verify these hypotheses, refine the theoretical model should the desired
effects not be created, and to investigate the underlying causal mechanisms of change. This

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 71

point is important and provides the distinctive difference between program theory and logic
models, as noted in Chen (2005, p. 36). The ultimate goal of Assessment is not just to measure
progress in implementation of the program and progress in achievement of outcomes, but to
find the causative factors and why the program is or is not successful.
In addition to the above causative assumptions is the consideration of prioritization of actions
and effects. Although EBAO literature does not specifically mention the concept of ‘‘weight-
ing,’’ the practical reality is that some effects will be more important than others. Recent devel-
opments in Assessment theory call for the explicit relative weighting of effects and their
corresponding MOE (NATO, 2007c). These weightings can be derived from the foundational
descriptive program theory, although some subjective judgment of importance is also likely.
Although the comparison between military operations and public service–orientated programs
has been made in passing, it is worthy to note the comparison at this stage, especially as the reader
might wonder how the activities of military forces can be related to public programs. The initial
attack on Afghanistan—Operation Enduring Freedom—was certainly a typical military operation
involving a significant air and ground campaign. However, if one considers the past 7 years in
Afghanistan and the operations of military-led provincial reconstruction teams which exist essen-
tially to provide basic public service programs, the comparison is more valid (Lane & Sky, 2006;
Maley, 2007). Furthermore, current and past military plans explicitly call for actions in nonmili-
tary domains, such as provision of public services, health services, reconstruction activities and
development of democratic governance (NATO/ISAF, 2006b).8

Data Collection and Analysis


The Assessment staff is responsible for creating a data collection and analysis plan. The
MOP and MOE employed in a military operation may call for a wide scope of data collection,
ranging from surveys of local populations, to compilation of secret signals intelligence
intercepts from enemy communications or radar. Depending on the threat present, it has been
commonplace in past operations for civilian analysts to accompany military personnel on
patrols to perform their data collection activities (Lambert, 2002; Neighbour et al., 2002);
however, this is one point of practical difficulty that military evaluation endures. It is likely
that evaluation teams currently in Iraq, for example, are confined to the HQ, severely con-
straining the evaluation possible.
There are three types of evaluation performed at the analysis stage of the Assessment
process: Treatment evaluation, impact evaluation, and intervening mechanism evaluation.
As we see in Table 1, the normative treatment evaluation corresponds to the analysis of MOP
data, to determine the success in accomplishing the planned actions, or as Chen (1990) notes
‘‘. . . (to assess) the congruency between normative treatment and implemented treatment’’
(p. 104). Typically, the MOP analysis may involve confirming: That bomber aircraft carried
out the planned number of sorties against the intended target set; that infantry patrols were car-
ried out as planned; that information leaflets on mine awareness were distributed in provincial
towns; or even that woman’s health clinics were established in certain areas. As has been
noted much in the literature (Chen, 1990, p. 104; see also Posavac & Carey, 2006; Pressman
& Wildavsky, 1984), discrepancies between planned program implementation and actual
program implementation are not uncommon; the same applies to military operations, and it
is important to confirm that what was planned was accomplished correctly.
The analysis of MOE is currently the most practiced evaluation in theory-driven Assess-
ment and corresponds to an impact evaluation. Impact evaluation seeks to assess the impact
of the treatment on outcomes, by assessing the major intended and unintended outcomes

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


72 American Journal of Evaluation / March 2009

(Chen, 1990, p. 144, 167). Thus, the analysis gives an indicator of progress toward achieving
the overall mission goals. Following the MOP example above, MOE analysis would seek to
confirm: That the air bombardments did indeed reduce militant activity in the targeted areas;
that the infantry patrols increased townspeople’s feeling of security; that the information given
to locals prevented them from picking up old mines or ordnance; and that the infant mortality
rate declined overall as a result of better access to medical care.
The key component of the Assessment process is the intervening mechanism evaluation.
Intervening mechanism evaluations identify the intervening and contextual factors through
which the treatment affects the outcome, discovering the causal factors underlying program
results (Chen, 1990, p. 191). The first stage of this process is to determine the correlation
between MOP and MOE data. For example, if the MOE criteria for a certain effect were
not met (meaning the effect was not achieved), yet the MOP criteria were met for the actions
linking to the effect (meaning that the actions were implemented correctly), this implies that
the intervening mechanism theory is incorrect. Should the situation be reversed and the data
analysis indicates that the actions were not implemented correctly yet the effect was achieved,
this once again demonstrates an issue with the program theory, even though beneficial effects
were still created. The second stage involves determining the correlation between MOE data
for linked effects, that is, the intervening mechanisms that precede the top-level effects. In fact,
although not explicitly called as such, MOE are often intervening mechanisms in themselves.
For example, if the end effect was feeling of security, a possible MOE would involve simply
questioning locals about their perception of security. However, it is unlikely this MOE would
accurately reflect the true perception of security, especially if the survey questions were
not relative to any baseline. To accurately determine this effect, it may be necessary to gauge
people’s perceptions of local police performance, and crime and criminal prosecution rates,
rather than attributing security only to the military patrols because security is inherently linked
to other nonmilitary factors.
The Assessment staff must determine the issues behind the program theory and equally, the
issues in the implemented program. The intervening mechanism analysis reveals areas where
closer inspection is required. For example, supposing the perception of security in an Afghan
town was perceived negatively, even though the security patrols were performed as planned, the
MOP-MOE correlation analysis would reveal this discrepancy. Intervening mechanism analysis
reveals that local townspeople are troubled by seeing overt military displays as the country’s
long history of conflict is still fresh in many minds. The Assessment staff, on interviewing the
patrols’ leaders may discover that instead of patrolling on foot, wearing light armor, and stopping
to talk to locals or handout candy and water bottles, most patrols were implemented with soldiers
wearing full combat gear driving very fast through town in armored vehicles (NATO, 2007c).
Another vitally important role of theory-based plans and evaluation in EBAO is the iden-
tification of undesired effects—the above case being a very pertinent example. Effects-based
plans explicitly identify possible undesired consequences of planned actions, and Assessment
staff may explicitly seek to measure the ‘‘progress’’ toward undesired effects.

Assessment Reporting
The Assessment staff is responsible for providing the results of progress to the military
commander and the detailed theory-driven evaluations to the planning and systemic analysis
staff. Depending on the nature of the operations, regular cycles of reporting are established, the
leadership receiving monthly, quarterly or even 6-monthly progress updates. Currently, the
reporting methods are simplistic and based almost entirely on ‘‘traffic-light’’ charts that report

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 73

the progress of each effect as red, yellow, or green, depending on the effect’s MOE data and
the thresholds of success.
A particular challenge in the implementation of Assessment lies in the scope of the
evaluation. The current International Security Assistance Force (ISAF) military operation in
Afghanistan consists of a multinational HQ in Kabul with five Regional Commands, totaling
about 53,000 personnel (‘‘ISAF mission,’’ 2008), in addition to an air task force, forward
support bases, and several nationally controlled provisional reconstruction teams. The NATO
component of the mission is overseen at the strategic level of command by Allied Command
Operations in Belgium. The tactical, operational, and strategic command levels all essentially
govern and implement interdependent programs of varying complexity and scope that feed
into the overall ISAF mission.
Assessment literature calls for separate evaluations at each level of command, with the
results from a subordinate level aggregating in quantitative and qualitative ways, in addition
with other political or external guidance, to contribute as part of the inputs to the higher
command. In practice in ISAF, only the operational and strategic levels actually conduct
Assessment.9 However, the issue of how individual missions and the overall campaign are
assessed is important, as missions in certain regions may be highly successful, while other
regions may be facing difficulty or failing dramatically. Currently, this problem is solved
simply by staff coordination in the command structure through regular Assessment working
groups with multilevel, multiorganizational representation. Evaluations from each regional
mission are briefed, with the operational command producing an overall evaluation. The
strategic-level commander may be briefed once or twice per year on the ‘‘campaign’’ evalua-
tion, or more regularly if the situation requires.10
Certainly, the most difficult aspect of the reporting process is ensuring utilization. The
Assessment staff will gain probably a deeper understanding of the implemented operations and
its causal mechanism than anyone else in the staff (Weiss, 1997a). The importance of interstaff
collaboration has been noted in several studies on EBAO methodology (NATO, 2006b, 2007d,
2008a, 2008b; Swedish Armed Forces HQ, 2007). Therefore, it is important that the Assess-
ment staff work with their counterparts in plans, intelligence, and systemic analysis to ensure
that the results are properly fed back into the operational planning and management process.
Although the theory of EBAO notes that the direction of the operation should be primarily
informed by the Assessment process, in practice, the military commander has many other outside
influences affecting decision making. On a positive note, however, given the traditionally
process-orientated nature of the military, once an Assessment process is accepted and written
into doctrine, utilization may be easier to ensure than in civilian evaluations of public programs.

An Evaluation Perspective
We have demonstrated that EBAO is a theory-driven construct: An effects-based plan is
equivalent to program theory and Assessment is equivalent to theory-driven evaluation.
Although the advantages and disadvantages of theory-driven evaluation have been well docu-
mented in the literature (see for example, Chen, 1990; Weiss, 1997b), we now briefly consider
them in the context of the military setting.

Benefits of Assessment for the Military


The actress, Dame Judi Dench, in the recent film adaptation of the James Bond novel
Casino Royale, uttered the sardonic phrase in response to a particularly troublesome terrorist

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


74 American Journal of Evaluation / March 2009

situation: ‘‘God, I miss the Cold War!’’ The intention was not to understate the severity of that
period of history, but to highlight the fact that focusing solely on a single common enemy was
in many ways simpler than dealing with today’s complex world of stateless terrorist organiza-
tions, multipolar international politics and the wide spectrum of Western military missions.
Military missions are focusing increasingly on complex stabilization and reconstruction
operations which are typically overlaid with antiinsurgency campaigns and security opera-
tions. The advantage of using a program theory approach to planning and evaluation is that
military leadership is encouraged to think about the complex interrelationships that exist in
the operational environment. Although it has been cynically noted that any good military
commander would do this anyway, the benefit of EBAO is that development of theory and
systemic thinking are made explicit in the process. Furthermore, the advantage of using
Assessment at all stages of planning and implementation is that judgments on progress are
based on rational thinking (as opposed to guessing) and are aimed at testing and validating the
planning staff’s estimate of the complex interrelationships.
As military forces are involved in more complex operations with a variety of international
actors, there is an obvious necessity for all the actions of these actors to be synergistic and
complimentary. This concept of policy coherence, as it is known in the field of humanitarian
development, has become a necessity for military leaders to consider (see for example,
Clements, Chianca, & Sasaki, 2008; OECD, 2003; Picciotto, 2005, 2007). One noted advan-
tage for the military of theory-driven planning and evaluation methods is that the majority of
international development agencies use theory-based techniques, thus facilitating their integra-
tion with military activities and improving military—nonmilitary collaboration in the field
(NATO, 2007d, 2008b). Many militaries are investigating becoming more closely aligned with
the Development Assistance Committee’s evaluation methods as part of the incorporation of
EBAO into doctrine.
A key issue in evaluation is always the question of utilization. The military organizational
system is founded upon strong command and control relationships and hierarchy. Although
more flexible command and control structures have been the subject of much study (e.g.,
Alberts & Hayes, 2003), especially for combat situations, the fact remains that in terms of
planning and operational design—established doctrine and bureaucracy still dominate. This
is a feature that can be exploited by military evaluators: As Assessment theory and processes
become doctrinal, Assessment staffs will see that their military customers are expecting
their products because it is incorporated in planning handbooks and taught in classrooms.
Assessment staff, in addition to providing operationally relevant information to aid plan
refinement through a mechanism that is actually specified on paper, will also provide new
information for more traditional management purposes previously absent from the military
such as: Motivation of staff; celebration of progress; budgeting for resource allocation; learn-
ing for future operations, and ensuring accountability (Behn, 2003).

Dangers of Assessment for the Military


More often than not, the military commanders in charge of major multinational operations
are not expert on planning or Assessment theory. They are made aware of the top level con-
cepts, and make decisions based on highly summarized and generalized information. Assess-
ment is a prime example: The commander is presented with a series of evaluation results each
month or week that are couched in scientific language and presentation, yet may actually be
entirely subjective in construction or completely ignore error estimation (Bärtl, 2007). It is
therefore vital that the practitioners and developers of Assessment theory put in significant

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 75

effort to manage ‘‘hyperrationalism’’ (Weiss, 1997a) and ensure that scientific objectivity is
labeled where deserved, and that subjectivity and error is highlighted where necessary. If
program theory and evaluation is to be an effective tool, the benefits and limitations alike must
be understood.
A typical problem encountered in planning for military operations is that generally the
scope of planned ‘‘programs’’ is very broad and involves and affects a wide variety of actors.
For large operations in Iraq and Afghanistan where the operational scope is national, the pro-
gram theory used to develop the plan has to be very broad and holistic—a very ambitious task
for any planner. The planner has a difficult task in choosing the level of granularity of their
program theory, let alone selecting a theory from the many alternatives possible. A similar
problem is thus faced by the Assessment staff in deciding the indicators of progress. These
concerns reveal the general situations in which theory-based methods are best used. Given the
complexity of military programs, using the results from Assessment as a primary decision
driver may be unrealistic. It is far more realistic to expect that Assessment, in the case of rap-
idly evolving offensive military conflicts, will be conducted as a postoperational activity for
the purposes of review and capturing lessons learnt. However, in slow-moving, humanitarian,
reconstruction and peace-support operations, more time will be allowed for the application of
theory-based methods.
Although strong command and control may allow for improved use in comparison to
civilian examples, this may also present a difficulty in the actual relevance of the results being
presented. Military culture is not yet suited to theoretical concepts in planning: Current
military officers are trained to collect data, analyze options, decide and move on. The point
of Assessment is that postulated program theory (i.e., the effects-based plan) is a hypothesis
to be tested—and then refined. There must be a certain amount of iteration involved. Current
military thinking requires that several ‘‘courses-of-action’’ are analyzed in planning, from
which the commander selects one. A decision made by a senior staff officer is usually binding
to some extent. There is a danger that Assessment results will be constrained to inform only
within the framework of the existing or chosen course-of-action, rather than allow the creation
of a whole new one.

Conclusion

EBAO has brought many beneficial concepts to the military’s way of operating. The use of
program theory driven planning: Reinforces the necessity to think holistically about causal
mechanisms from treatment to outcome; increases the consideration of actors, events and their
relationships outside the traditional military domain; and allows Assessment models and
continuous monitoring of progress to be rationally derived from theory. The practice of
program-theory driven Assessment: Provides a holistic evaluation of operational progress;
allows the identification of inappropriate assumptions underlying a plan; facilitates the iden-
tification of unintended positive and negative effects; and allows improved use as Assessment
results contribute to the development and refinement of theory based plans—which constitute
‘‘fundamental’’ social science knowledge in military programs. As noted by Bickman (2000),
‘‘the strength of program theory depends on substantive knowledge in the field’’ (p. 112). Any
evaluator knows that the mechanism of developing program theory and performing an evalua-
tion often leaves them with a better understanding of the program than the original planners. It
is this improvement in foundation, social science theory and knowledge that we hope increased
Assessment activities can bring to military operations.

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


76 American Journal of Evaluation / March 2009

EBAO is certainly an improvement on the current methods of operating, which previously


only focused on mechanical and superficial counts of targets destroyed and did not consider
the environment in a holistic sense. However, before it is adopted wholeheartedly and codified
in doctrine, the Assessment and Evaluation communities should consider the warnings stated
in this article. The issues noted above create a situation in which theory-driven evaluation
gives results that are couched in a ‘‘scientific’’ or rationally determined manner, thus conceal-
ing the subjectivity and uncertainty inherent in modeling the social world. Furthermore, the
scientific and positivistic management culture inherent in the military may overly emphasize
and prioritize evaluation results because of their implied objectivity and rational nature.
EBAO and Assessment have the unique characteristic of being a form of program theory
and theory-driven evaluation developed completely independently from the civilian body of
knowledge, by intelligent and experienced military staff.11 The military context of evaluation
is very different from that known by the civilian evaluation community. The advantages of
clear lines of authority and customers for evaluations are contrasted with the difficulty of data
collection and dangerous operating circumstances. However, theory-driven evaluation has
shown to be flexible enough to operate in both the military and civilian context. Although
many advocates of the theory approach will take heart in the manner in which the military con-
ducts this form of evaluation, there are still many challenges ahead. The military evaluation
community is small and underresourced. To the best of the authors’ knowledge, no academic
departments of military colleges operate research programs into Assessment, and no work has
been performed on this particularly challenging and important case of theory-driven evalua-
tion in civilian universities. The development of military evaluation is literally down to the
isolated work of a relatively small number of individuals. We hope that we may show the way
to the military evaluation community and reveal the existence of the vast body of civilian eva-
luation knowledge. More important, we hope to stimulate interest in the civilian evaluation
community on this special and very important case of program evaluation.

Notes
1. This thinking was core in Secretary McNamara’s widely criticized introduction of the Planning Programming
Budgeting System (PPBS) in the US Department of Defense (Schick, 1969; Wildavsky, 1966, 1969), and its applica-
tion in the Vietnam ‘‘body count’’ progress indicators was attributed as being one of the major causes of failure in the
war (Gartner & Myers, 1995; Perrin, 1998). In addition to the rational-model assumptions necessary for PPBS, many
were offended by the crassness of the ‘‘body count’’ statistics regularly touted by the Department of Defense. We
thank an anonymous reviewer for this point.
2. It should be noted that the processes and terminology described in this article are not yet formal policy or doc-
trine of North Atlantic Treaty Organisation (NATO) forces; however, at the time of writing, several studies examining
effects-based approach to operations (EBAO) in NATO are ongoing. EBAO’s future is more troubled in the United
States: A recent and very unpopular decision by General Mattis, US Joint Forces Command, specifically orders all
development of program theory concepts, including effects-based operations to cease (Mattis, 2008). At the time
of writing, within NATO, EBAO concepts are still in use—to varying extents—in Afghanistan and the Balkans.
3. An anonymous reviewer noted that there is a subtle distinction between theory-based and theory-driven evalua-
tion. From the large amount of literature reviewed in preparation of this article, the authors confidently conclude that
in the majority of journal articles on theory-based/-driven evaluation, these terms are used interchangeably. However,
some do explicitly make a distinction. For example, Gargani (2003) argues that theory-based applies to the general
application of theory to evaluation, although theory-driven corresponds to Chen’s (1990) 6-stage model of evaluation.
In this article, we choose theory-driven to highlight the intended close connection with Chen’s model. We suggest,
however, that this may a valuable discussion topic for the ‘‘Dialogue’’ section of AJE.
4. Where before, 10 bunker-busting bombs requiring several aircraft would be planned for to completely destroy
an air defense station, the air operations staff soon realized that the effect of one single bomb was to cause the Iraqi
operators to immediately shut down that station to prevent detection—thus, the same effect was created: Deny use of

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 77

enemy air defense. However, the military ‘‘footprint’’ in strike craft and supporting logistics had been significantly
reduced.
5. Most military operational headquarters are organized by functional branches such as Plans, Intelligence,
Logistics, Engineering, Communications, etc. The study of program theory planning methods has resulted in the concept
of system of systems analysis, in which specialized analysts create detailed models of the operational environment
(i.e., prescriptive program theory), using techniques based on system concepts such as social networking, influence
diagrams, and network analysis. In experimental trials in the Balkans and Afghanistan, the systems analysis staff has
been part of the Plans branch, although there is some debate about whether they should fall under Intelligence.
6. A debate is ongoing in the NATO Assessment community currently about whether the high-level end states
should have indicators to determine that achieving all the effects did indeed achieve the end-state.
7. The reader will observe that the references to these studies mention ‘‘experiments.’’ In the original article sub-
mission we included this word in the text—attracting great interest from an anonymous reviewer. Some clarification is
needed: Although NATO and the United States have tested the concepts of Assessment and EBAO in fairly large-
scale, controlled environments; they were generally one group—single trial events, rather than experiments in the for-
mal sense. Unfortunately, the military community is not so rigorous with the term, ‘‘experiment.’’
8. There are ongoing political debates about whether or not NATO, as a military alliance, should engage in pro-
grams outside of the military domain.
9. The approach described here is an approximation to reality. The full process and organizational structure cannot
be described due to security restrictions.
10. It is the authors’ judgment that performing a campaign evaluation by simple amalgamation of results from
individual missions is an unsatisfactory solution, although this is partially caused by the lack of comprehensive and
integrative program theory from strategic to operational levels. Individual treatment evaluations (which can be amal-
gamated) at the mission level are appropriate to capture the success of implemented programs, however, the impact
and intervening mechanism evaluations should be comprehensive across the entire campaign and closely aligned with
overall program theory.
11. The authors conducted an extensive literature search and found no reference to any civilian evaluation in
military Assessment literature.

References
Alberts, D., & Hayes, R. (2003). Power to the edge—command and control in the information age. Washington, DC:
Command and Control Research Program. Retrieved September 10, 2008, from http://www.dodccrp.org/files/
Alberts_Power.pdf.
Alberts, D., & Hayes, R. (2007). Planning for complex endeavors. Washington, DC: Command and Control Research
Program. Retrieved September 10, 2008, from http://www.dodccrp.org/files/Alberts_Planning.pdf.
Bärtl, M. (2007, March). Practices, challenges, and issues to resolve in campaign assessment—operational HQ
experiences from ISAF. Paper presented at NATO Allied Command Transformation Operational Analysis Confer-
ence, Norfolk, VA.
Behn, R. (2003). Why measure performance? Different purposes require different measures. Public Administration
Review, 63, 586-606.
Bickman, L. (1987). The functions of program theory. In Bickman. L (Ed.), Using program theory in evaluation
(pp. 5-19). New Directions for Program Evaluation, Vol. 33. San Francisco, CA: Jossey Bass.
Bickman, L. (2000). Summing up program theory. In Bickman. L (Ed.), Program theory in evaluation: Challenges
and opportunities (pp. 103-112). New Directions for Evaluation, Vol. 87. San Francisco, CA: Jossey Bass.
Chen, H.-T. (1990). Theory-driven evaluations. Newbury Park, CA: Sage.
Chen, H.-T. (1994). Theory-driven evaluations: Needs, difficulties, and options. Evaluation Practice, 15, 79-82.
Chen, H.-T. (2005). Practical program evaluation. Assessing and improving planning, implementation, and effective-
ness. Thousand Oaks, CA: Sage.
Chen, H. T., & Rossi, P. H. (1981). The multi-goal, theory-driven approach to evaluation: A model linking basic and
applied social science. In Freeman. H. E, & Solomon. M. A (Eds.), Evaluation Studies Review Annual, Vol. 6.
Beverly Hills, CA: Sage.
Chen, H.-T., & Rossi, P. H. (1983). Evaluating with sense: The theory-driven approach. Evaluation Review, 7,
283-302.
Chen, H.-T., & Rossi, P. H. (1987). The theory-driven approach to validity. Evaluation and Program Planning, 10,
95-103.
Chen, H.-T., & Rossi, P. H. (1992). Using theory to improve program and policy evaluations. New York, NY: Green-
wood Press.

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


78 American Journal of Evaluation / March 2009

Clausewitz, C. von. (1968). On war (Graham, J. J., Trans.). London, Great Britain: Penguin. (Original work published
1832).
Clements, P., Chianca, T., & Sasaki, R. (2008). Reducing world poverty by improving evaluation of development aid.
American Journal of Evaluation, 29, 195-214.
Curry, H. (2004). The current battle damage assessment paradigm is obsolete. Air and Space Power Journal, 18, 13-17.
Deptula, D. (2001a). Effects-based operations: A change in the nature of warfare. Arlington, VA: Defence and
Airpower Series, Aerospace Education Foundation. Retrieved December 14, 2008, from http://www.airforce-
magazine.com/MagazineArchive/Documents/2001/April%202001/0401effects.pdf.
Deptula, D. (2001b). Firing for effects. Air Force, 84, 45-53. Retrieved September 10, 2008, from http://www.afa.org/
magazine/April2001/0401effects.pdf.
Diehl, J. E., & Sloan, C. E. (2005). Battle damage assessment: The ground truth. Joint Force Quarterly, 37, 59-64.
Donaldson, S. I. (2007). Program theory-driven evaluation science: Strategies and applications. New York, NY: Tay-
lor and Francis.
Drucker, P. F. (1954). The practice of management. New York, NY: Harper and Row.
Defence, Science and Technology Laboratory. (2005). Code of best practice for the use of measures of effectiveness
(MOE) (Report No. Dstl/CR14304v1.1). UK Ministry of Defence.
English, B., & Kaleveld, L. (2003). The politics of program logic. Evaluation Journal of Australasia, 3, 35-42.
Evans, D. (2003). Operational analysis in support of HQ ISAF, Kabul, Afghanistan, 2002. In Woodcock. A, & Davis. D
(Eds.), Analysis for governance and stability (pp. 198-226). Cornwallis, Canada: The Canadian Peacekeeping Press.
Gargani, J. (2003, November). The history of theory-based evaluation: 1909 to 2003. Paper presented at the American
Evaluation Association annual conference, Reno, NV.
Gartner, S. S., & Myers, M. E. (1995). Body counts and ‘‘success’’ in the Vietnam and Korean wars. Journal of Inter-
disciplinary History, 25, 377-395.
Hopkin, A. J. (2004). Operational analysis in support of HQ MND(SE), Basrah, Iraq, 2003. In Woodcock. A, & Rose. G
(Eds.), Analysis for Stabilization and Counter-Terrorist Operations, Cornwallis, Canada: The Canadian Peacekeep-
ing Press.
ISAF mission. (2008, July). ISAF Mirror, 49. ISAF Public Affairs Office. Retrieved September 10, 2008, from http://
www.nato.int/isaf/docu/mirror/2008/mirror_49_200807.pdf.
Jobbagy, Z. (2005). Powered flight, strategic bombing and military coercion: Study on the origins of effects-based
operations. Den Haag, The Netherlands: The Clingendael Centre for Strategic Studies. Retrieved December 14,
2008, from http://www.hcss.nl/en/publication/37/Powered-Flight,-Strategic-Bombing-and-Military-Coe.html.
Lambert, N. J. (2002). Measuring the success of the NATO operation in Bosnia and Herzegovina 1995-2000.
European Journal of Operational Research, 140, 459-481.
Lane, R., & Sky, E. (2006). The role of provincial reconstruction teams in stabilization. Royal United Services Institute
Journal, 151, 46-51.
Lipsey, M. W., Crosse, S., Dunkle, J., Pollard, J., & Stobart, G. (1985). Evaluation: The state of the art and the sorry
state of the science. In Cordray. D. S (Ed.), Utilizing prior research in evaluation planning (pp. 7-28). New Direc-
tions for Program Evaluation, Vol. 27. San Francisco: Jossey-Bass.
Maley, W. (2007). Provincial reconstruction teams in Afghanistan—how they arrived and where they are going.
NATO Review, 2007(3), NATO. Retrieved September 10, 2008, from http://www.nato.int/docu/review/2007/
issue3/english/art2.html.
Mattis, J. N. (2008). USJFOM commander’s guidance for effects-based operations. Joint Force Quarterly, 51, 105-108.
Moltke, H. von. (1993). On the art of war: Selected writings. (Hughes, D. J., & Bell, H. Trans.; Hughes, D. J., Ed.).
New York, NY: Random House (Original work published in 1871).
Murray, W., & Scales, R. H. (2003). The Iraq war: A military history. Cambridge, MA: Harvard University.
NATO. (2006a). MC position on an effects-based approach to operations. North Atlantic Military Committee (signed
06 June 2006). (Document No. MCM 0052-2006/IMSWM-0147-2006 SD3).
NATO. (2006b). Multinational experiment 4—NATO analysis report. H.Q. Supreme Allied Command Transforma-
tion, Norfolk, VA.
NATO. (2007a). BiSC draft pre-doctrinal EBAO handbook version 4.2, October 4, 2007. Bi-Strategic Command
Effects-Based Approach to Operations Working Group.
NATO. (2007b). Bi-Strategic Command Discussion Paper on EBAO. (Document No. J5PLANS/2920-036/07). H.Q.
Supreme Allied Powers Europe and H.Q. Supreme Allied Command Transformation.
NATO. (2007c). Draft effects-based assessment handbook version 1.0. H.Q. Supreme Allied Command Transforma-
tion, Norfolk, VA.
NATO. (2007d). Effects-based assessment NATO analysis report. Limited objective experiment conducted as a por-
tion of the multinational experiment series 5, 04-08 June 2007. H.Q. Supreme Allied Command Transformation,
Norfolk, VA.

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009


Williams, Morris / The Development of Theory-Driven Evaluation in the Military 79

NATO. (2008a). Enabler ’08 experiment report. H.Q. Supreme Allied Command Transformation, Norfolk, VA.
NATO. (2008b). Multinational experiment 5 analysis report. H. Q. Supreme Allied Command Transformation,
Norfolk, VA.
NATO/ISAF. (2006). HQ ISAF KABUL—operational plan. Kabul, Afghanistan: Author.
Neighbour, M., Bailey, P., Hawthorn, M., Lensing, C., Robson, H., Smith, S., et al. (2002). Providing operational anal-
ysis to a peace support operation: The Kosovo experience. Journal of the Operational Research Society, 53,
523-543.
Organisation for Economic Cooperation and Development. (2003, July). Policy coherence: Vital for global develop-
ment. Retrieved September, 10, 2008, from http://www.oecd.org/dataoecd/11/35/20202515.pdf.
Perrin, B. (1998). Effective use and misuse of performance measurement. American Journal of Evaluation, 19,
367-379.
Picciotto, R. (2005). The evaluation of policy coherence for development. Evaluation, 11, 311-330.
Picciotto, R. (2007). The new environment for development evaluation. American Journal of Evaluation, 28, 509-521.
Posavac, E.J., & Carey, R. G. (2006). Program evaluation: Methods and case studies (7th ed.). Upper Saddle River,
NJ: Pearson.
Pressman, J. L., & Wildavsky, A. B. (1984). Implementation. Berkeley, CA: University of California.
Rauch, J. T. (2002). Assessing airpower’s effects: Capabilities and limitations of real-time battle damage assessment.
(Masters dissertation, School of Advanced Airpower, Air University, 2002). Retrieved September 10, 2008, from
http://handle.dtic.mil/100.2/ADA420587.
Rogers, P. J. (2000). Program theory: Not whether programs work but how they work. In Stufflebeam. D. L, Madaus. G. F,
& Kellaghan. T (Eds.), Evaluation models (pp. 209-232). Boston: Kluwer Academic.
Rogers, P. J. (2007). Theory-based evaluation: Reflection ten years on. In Mathison, S. (Ed.), Enduring issues in
evaluation: The 20th anniversary of the collaboration between NDE and AEA. New Directions for Evaluation,
Vol. 114. San Francisco: Jossey Bass.
Sartorius, R. H. (1991). The logical framework approach to project design and management. American Journal of Eva-
luation, 12, 139-147.
Scheirer, M. A. (1987). Program theory and implementation theory: Implications for evaluators. In Bickman, L. (Ed.),
Using program theory in evaluation (pp. 59-76). New Directions for Program Evaluation, Vol. 33. San Francisco:
Jossey Bass.
Schick, A. (1969). System politics and systems budgeting. Public Administration Review, 29, 137-151.
Scriven, M. (1994). The fine line between evaluation and explanation. Evaluation Practice, 15, 75-77.
Shaw, I., & Crompton, A. (2003). Theory, like mist on spectacles, obscures vision. Evaluation, 9, 192-204.
Smith, E. (2003). Effects-based operations: Applying network centric warfare in peace, crisis and war. Washington,
DC: Command and Control Research Program. Retrieved September 10, 2008, from http://www.dodccrp.org/
files/Smith_EBO.PDF.
Smith, E. (2006). Complexity, networking, and effects-based approaches to operations. Washington, DC: Command
and Control Research Program. Retrieved September 10, 2008, from http://www.dodccrp.org/files/Smith_
Complexity.pdf.
Stufflebeam, D. L. (Ed.). (2001). Evaluation models. New directions for evaluation, 89. San Francisco: Jossey-Bass.
Sullivan, H. & Stewart, M. (2006). Who owns the theory of change? Evaluation, 12, 179-199.
Swedish Armed Forces Headquarters. (2007). Swedish EBAO development after the autumn experiment 2006.
Enkoping, Sweden: Swedish Armed Forces.
UNESCO. (2007). Results-based programming, management and monitoring (RBM) at UNESCO: Guiding princi-
ples. Bureau of Strategic Planning, UNESCO, Paris, France. Retrieved September 10, 2008, from http://portal.
unesco.org/fr/files/40194/11924654571BSP_RBM_guiding_principles_October_2007__2_.
pdf/BSPþRBMþguidingþprinciplesþOctoberþ2007þ_2_.pdf.
Weiss, C. H. (1997a). How can theory-based evaluations make greater headway? Evaluation Review, 21, 501-524.
Weiss, C. H. (1997b). Theory-based evaluation: Past, present and future. In Rog, D. J., & Fournier, D. (Eds.), Progress
and future directions in evaluation: Perspectives on theory, practice, and methods, New Directions for Evaluation,
Vol. 76. San Francisco, CA: Jossey Bass.
Wildavsky, A. B. (1966). The political economy of efficiency: Cost-benefit analysis, systems analysis, and program
budgeting. Public Administration Review, 26, 292-310.
Wildavsky, A. B. (1969). Rescuing policy analysis from PPBS. Public Administration Review, 29, 189-202.

Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009

View publication stats

You might also like