Professional Documents
Culture Documents
Williams 2009 Theoryon Frontline
Williams 2009 Theoryon Frontline
Williams 2009 Theoryon Frontline
net/publication/249773149
CITATIONS READS
10 2,119
2 authors, including:
Andrew Williams
European Southern Observatory
11 PUBLICATIONS 141 CITATIONS
SEE PROFILE
All content following this page was uploaded by Andrew Williams on 04 June 2014.
Published by:
http://www.sagepublications.com
On behalf of:
American Evaluation Association
Additional services and information for American Journal of Evaluation can be found at:
Subscriptions: http://aje.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations http://aje.sagepub.com/cgi/content/refs/30/1/62
The use of theory-driven evaluation is an emerging practice in the military—an aspect generally
unknown in the civilian evaluation community. First developed during the 1991 Gulf War and
applied in both the Balkans and Afghanistan, these techniques are now being examined in the North
Atlantic Treaty Organisation (NATO) as a means to evaluate the effects of military operations in
complex, asymmetric conflict environments. In spite of these practices, theory-driven evaluation
in the military is still in the developmental stages. This article traces the development to date of
theory-driven evaluation in NATO and assesses its strengths and weaknesses in the military con-
text. We conclude that a cross-pollination of ideas between military and civilian evaluators is
urgently needed to improve the quality and effectiveness of military evaluation.
Authors’ Note: Please address correspondence to Andrew Paul Williams, Constant Hall, Suite 2084, Norfolk, VA
23529; (757) 577-2921; e-mail: awill123@odu.edu. The views expressed in this article are the views of the authors
and do not necessarily represent the views of NATO.
62
Downloaded from http://aje.sagepub.com at American Evaluation Association on February 21, 2009
Williams, Morris / The Development of Theory-Driven Evaluation in the Military 63
Historically, most military evaluations were carried out as simple performance monitoring
known as Battle Damage Assessment, which involved collecting data on ordnance expenditure
rates, targets hit rates, target damage assessment, casualty rates, and area of ground captured
(Diehl & Sloan, 2005). There was generally no effort systematically to measure outcomes or
impacts in a comprehensive manner. It would be incumbent upon a few key individuals in
leadership positions to combine all information received from various sources and ‘‘con-
struct’’ an evaluation of progress and results, which would subsequently inform further plan
refinement and operational management (Curry, 2004; Rauch, 2002).1 In recent years the con-
cept of an ‘‘effects-based approach to operations’’ (EBAO) has gained prominence in military
theory. In essence, EBAO is a theory-based construct that, among other aspects, calls for the
explicit measurement of task or ‘‘action’’ accomplishment and result or ‘‘effect’’ achievement.
Certain militaries, especially those of the United States, the United Kingdom, and the North
Atlantic Treaty Organisation (NATO), have spurred the development of EBAO and initiated
operational use to some extent. The part of EBAO that is concerned with the measurement of
program implementation, progress toward outcomes, and creation of impacts came to be
known as ‘‘effects-based assessment’’ (EBA). As the military community is still undecided
on the terminology, with similar terms such as ‘‘campaign assessment,’’ and ‘‘engagement
space assessment’’ in use, for simplicity we shall refer to EBA simply as ‘‘Assessment’’ from
hereon.2
In general, the military has endorsed this new addition to their toolbox of operations manage-
ment resources; however, the techniques have been swiftly adopted without detailed theoretical
or philosophical examination. The current reality is that military commanders on the ground are
using results from Assessment. The danger is that the military personnel using these methods do
not have a full appreciation of the difficulties and limitations, nor the manner in which they may
be most usefully employed. This is important, especially as military operations and programs
involve significant amounts of public money, and often feature direct loss of life.
Given the current use of Assessment by American and Allied troops and civilians in Iraq,
Afghanistan and the Balkans, it is our hope to stimulate discussion in the evaluation commu-
nity on a particularly challenging and ambitious case of program evaluation and to initiate the
cross-pollination of ideas, and most importantly—experience, which could improve the quality
and use of military evaluation. There is an important role for civilian evaluators to play in the
continued development and implementation of theory-driven evaluation in the military.
This article has four sections. First, we trace the development of and describe the salient
points of theory-driven evaluation in the civilian domain. Second, we examine theory-
driven thinking in military planning and evaluation, up to the current state of development
in NATO. Third, we describe Assessment in detail, showing the equivalence to Chen’s
(1990) evaluation work. We conclude with an evaluation perspective, considering the
strengths and weaknesses of a theory-driven approach in the military context.
Theory-Driven Evaluation
As noted, the new military operations management construct of EBAO and its evaluation
counterpart, Assessment, are theory-based concepts. The use of theory as a basis for evaluation
has a long history in the civilian domain, although the idea only gained prominence in the
1980s before which, the evaluation literature was rarely concerned with incorporation of
theory into evaluation processes (Chen, 1990; Gargani, 2003). A review by Lipsey Crosse,
Dunkle, Pollard, and Stobart (1985) of 175 evaluation studies noted that ‘‘most of the
programs. . . evaluated did not provide simple structured treatments that could be expected
to work through readily apparent mechanisms’’ (p. 20). In fact, one of the main reasons for the
growing interest in theory-driven evaluation was ‘‘the usual inability of even the most sophis-
ticated experimental evaluations to explain what factors were responsible for the program’s
success—or failure’’ (Weiss, 1997a, p. 502).
Although some evaluators argued that the focus of any evaluation should be on the quality,
value, and success of any implemented program (Scriven, 1994), theorists recognized that
failing to identify the underlying causal mechanisms that led to program success would
not allow deficiencies in either program implementation or program design to be suitably
identified and addressed (Chen, 1994). Thus, program improvement was seen as a key goal
in any program evaluation (Chen, 1994; Posovac & Carey, 2006; Rogers, 2000). The defining
characteristic of theory-driven evaluation is that the assumptions and mechanisms behind a
program can be expressed in a logical sequence of cause and effect statements—also known
a program theory (Weiss, 1997a).
The concept of program theory was seen as key to both the successful design of programs
and their subsequent evaluation (Bickman, 1987). Chen and Rossi (1992, p. 43) described it
as ‘‘a specification of what must be done to achieve the desired goals, what other important
impacts may also be anticipated, and how these goals and impacts could be generated.’’
Program theory acts as a base upon which variables may be operationalized to design and
conduct an evaluation—resulting in many varied benefits (Bickman, 1987; Scheirer,
1987). Donaldson (2007) notes that theory-driven evaluation provides a useful tool to probe
the validity of an evaluation and can assist in identifying and controlling extraneous sources
of variance. Furthermore, it is method neutral, freeing the evaluator from methodological
constraints. The advantages are wider than solely evaluation design and practice. For exam-
ple, Chen and Rossi (1981) argue that the successful development and evaluation of program
theory contributes to advancing fundamental social science knowledge. Chen (2005) and
Sullivan and Stewart (2006) further note that program theory provides a strategy to incorpo-
rate stakeholder views in the program design and evaluation. The evaluation literature
provides many more extensive examples on the benefits of performing evaluations based
from program theory.
Even given the wide interest and acclaim in the literature, theory-driven evaluation is not
without its disadvantages and its critics. Weiss (1997a)—a leading proponent of the tech-
nique—described key challenges in theory-driven evaluation: Program theory is notoriously
difficult to construct and the level of granularity to which the evaluator must go is often uncer-
tain; multiple theories are possible—often requiring multiple sets of measurement indicators—
thus, the technique is resource and data intensive; and by focusing on the program theory, the
evaluator may ignore many other important effects and causes (see also Shaw & Crompton,
2003). Others have noted that although program theory may provide focus to an evaluation,
it provides a focal point for politicization of the evaluation (English & Kaleveld, 2003).
Finally, it has been pointed out that in reality, few social science theories actually exist for the
program areas that evaluators typically face, thus the evaluator often begins with a grounded
theory approach in developing program theory (Stufflebeam, 2001).
As the popularity of theory-driven approaches grew in the 1980s, Chen and Rossi (1981,
1983, 1987) progressively developed an increasingly detailed framework for performing
theory-driven evaluation, culminating in Chen’s (1990) seminal textbook, Theory-Driven
Evaluations,3 of which we now summarize the key concepts. In this work, Chen presented
a comprehensive framework for program evaluation, in which program theory was integrated
from start to finish. He conceptualized two main domains of program theory on which evalua-
tions were performed: Normative and causative. Normative domain theory specifies the
‘‘goals or outcomes (that) should be pursued or examined, and how the treatment should be
designed and implemented (Chen, 1990, p. 43). In this sense, Chen expanded the role of pro-
gram evaluation to assist program planners in designing and planning programs by providing
‘‘the rationale and justification for the program structure and activities’’ (p. 43). Causative
domain theory aims to be empirically based and describes the ‘‘underlying causal mechanisms
that link, mediate or condition the causal relationship between treatment variables and out-
come variables in a program’’ (p. 44). Program evaluation of the causative theory is evaluation
in the ‘‘usual’’ sense—that is assessing the impacts of the program—however, how those
impacts were created is also an important area of consideration.
These two domain theories are broken down into a structured typology of theories on which
evaluations can be focused:
Program evaluators may typically begin with normative outcome evaluation which ‘‘assist(s)
stakeholders in identifying, clarifying, or developing the goals or outcomes of a program’’
(p. 91). In fact, three types of outcome evaluations are described that see the evaluator:
Develop new goals or perform ‘‘goal revelation’’ by working closely with program stake-
holders; prioritize existing goals in the context of the program situation; and provide direction
and structure in unfocused programs with politically developed goals. This evaluation is often
an important stage of program theory development (Chen, 2005).
Once a program is underway, the remaining evaluations can take place. The normative treat-
ment evaluation seeks to determine the congruency between the planned program (the normative
treatment theory) and the actual implemented program—with the aim to gather recommenda-
tions for improvement of program operations. Especially in the case where implementation went
wrong, the treatment evaluation may be closely aligned with a normative implementation envi-
ronment evaluation which assesses environmental factors that affect program implementation.
Causative program theory evaluations are the key to theory-driven evaluations. Impact
evaluations are ‘‘usual’’ in the sense that they assess the impact of the implemented program
on the defined outcomes; however, the measurement variables will be derived from a prespe-
cified theory. By considering the causal processes between the treatment and the outcome, the
intervening mechanism evaluation expands the scope of what was once considered ‘‘usual’’
evaluation. It is this stage where the program theory is most critically tested. Finally, the gen-
eralization evaluation will consider issues of validity and generalizability to other programs,
where the focus is on broadening and generalizing the program theory. Several composite
forms are possible, such as normative treatment-impact evaluation, which focuses on specific
aspects of treatment components on impacts.
The work of Chen (1990, 1994, 2005), Rossi (1981, 1983, 1987, 1992), Bickman (1987, 2000),
Donaldson (2007), Weiss (1997a, 1997b) and others, have brought the evaluation community a
rigorous and comprehensive approach to conducting theory-driven evaluation that facilitates
incorporation of the needs of program planners, decision makers, stakeholders, and evaluators.
It encourages evaluators to have technical and methodological expertise, but also, the incor-
poration of program theory encourages, and indeed requires the evaluator to have expert and
substantive knowledge in the program itself—a fact that is critical in the military situations to
be described.
the central idea of EBAO is the use of different instruments to create effects that alter the beha-
viour and capabilities of different actors in the engagement space to achieve our objectives and
end-state. Therefore, EBAO requires a clear understanding of these different instruments and of
the nature of the different systems we seek to influence. (p. 4)
We see that the requirement for a comprehensive systemic understanding of the operational
environment is recognized. Again, this concept is not new, but what differentiates EBAO from
other operations management techniques is that systemic understanding—the development of
a theory of the operational environment—is a foundation for the whole of planning, manage-
ment, and assessment of operations.
confirm or refute the theory. This process of Assessment is fundamental to the concept of a
theory-driven evaluation. By persistently assessing the congruency between theoretical mod-
els (plans) to reality (results) the military commander is given a powerful tool to determine the
success of the implementation of the plan, the impact of the planned actions, the accuracy of
his or her situational understanding, and the understanding of the causative factors in the
environment.
Assessment
For the purpose of the majority of readers who are unlikely to be familiar with the military
system of Assessment, we take the liberty to elaborate on some basics. The reader should note
that we describe the ‘‘theory’’ of EBAO and Assessment, as intended by the developers. There
is some difference in the way in which it is currently practiced, which we allude to several
times in the remaining discussions.
The primary purpose of Assessment is to increase the effectiveness of the execution of mil-
itary operations. By continually monitoring and analyzing the implementation of actions and
accomplishment of effects, the intention of Assessment is to guide the operational commander
in making informed adjustments to the plan being executed. Assessment aims to provide a
validation of causality in the plan design by confirming that the actions performed are indeed
creating the desired effects, and to improve understanding of the workings of the operational
environment. Assessment also plays an important role in providing situational awareness rela-
tive to the plan.
Although the developers of Assessment call for an independent evaluation unit in the Head-
quarters (HQ) structure, in practice to date, the Assessment staff is usually a team of military
officers and civilian analysts drawn from other areas of the HQ organization, though certain
nations and NATO actually have specific job billets for Assessment, but very few. Generally,
the civilian analysts are specialists with specific training in analysis techniques, although the
authors have not yet encountered any that have awareness of the ‘‘civilian’’ evaluation field—
a fact that is quite surprising, given the level and quantity of development of evaluation
technique and theory in the civilian domain. The military analysts are not specialists and rely
a great deal on their civilian counterparts for the actual analysis. NATO Assessment literature
(NATO, 2006a, 2007a, 2007b) notes that the Assessment staff should work closely with the
planning staff and intelligence and systems analysis staff, who are responsible for creating
‘‘models’’ of the operational environment; however, it is the authors’ impression that this
formal relationship is not yet fully enacted in current HQ, mainly due to the infancy of systems
analysis capabilities.5
Assessment is based around three distinct areas: Assessment design, data collection and
analysis, and assessment reporting. We will cover each area in turn, demonstrating the connec-
tions with Chen’s (1990) theory-driven evaluation work. The reader is invited to use Table 1 as
a guide to understand the connections between military and civilian evaluation terminology.
The focus will be on assessment design as the issues discussed in this article relate principally
to this stage.
Table 1
Mapping Between Civilian and Military Evaluation Terminology
Civilian (Chen, 1990) Military (NATO)
Notes: MOE ¼ measure of effectiveness; MOP ¼ measure of performance; NATO ¼ North Atlantic Treaty
Organisation.
begins by breaking down the key elements of the operational environment into categories,
usually political, military, economic, social, information and infrastructure. Within each
category, the analysis may focus on: Political institutions, parties and personalities; public
administration institutions and personalities; military force composition, dispositions and per-
sonnel; economic centers and markets; social actors and institutions; media and communications
institutions, and national infrastructure. Relationships are then identified between these consti-
tuents and positive and negative influences are incorporated, where known. NATO and certain
nations, notably the United States and Germany, are currently investigating the use of computer
modeling tools involving influence diagrams and system dynamics to perform systems analysis.
This essentially is the first stage of a descriptive program theory: That is, a description of
the current state of affairs as it is believed to exist, based off the information available at the
time. This descriptive program theory is the starting point for the creation of the effects-based
plan. The plan is designed principally by the operational commander and his planning staff,
with the advice of the intelligence and systemic analysis staff. Usually, plan development is
top-down—the planning staff take the strategic-level or overall mission goals or ‘‘end-state’’
from senior political and military leaders and cascade the outcome variables, that is effects,
from these top level goals. There is an immediate and explicit assumption that the creation
of these effects in the operational environment is causally linked to the achievement of the
end-state.6 As indicated in Table 1, the effects part of the plan can be considered as normative
outcome domain theory.
Effects are derived from, in principle, detailed analysis of the descriptive program theory,
from which key elements and relationships existing in the operational environment are
elicited. Effect statements are outcome variables that state the commander’s intended view
on what these key elements and relationships should be. Effect statements may also include
impacts that detail the expected change. Typically, an effects-based plan contains hierarchies
of effects and subeffects, depending on the complexity of the operational environment and
intended mission. The lower level subeffects are linked to actions, thus providing the impor-
tant intervening mechanism program theory. Although the process of creating this stage
point is important and provides the distinctive difference between program theory and logic
models, as noted in Chen (2005, p. 36). The ultimate goal of Assessment is not just to measure
progress in implementation of the program and progress in achievement of outcomes, but to
find the causative factors and why the program is or is not successful.
In addition to the above causative assumptions is the consideration of prioritization of actions
and effects. Although EBAO literature does not specifically mention the concept of ‘‘weight-
ing,’’ the practical reality is that some effects will be more important than others. Recent devel-
opments in Assessment theory call for the explicit relative weighting of effects and their
corresponding MOE (NATO, 2007c). These weightings can be derived from the foundational
descriptive program theory, although some subjective judgment of importance is also likely.
Although the comparison between military operations and public service–orientated programs
has been made in passing, it is worthy to note the comparison at this stage, especially as the reader
might wonder how the activities of military forces can be related to public programs. The initial
attack on Afghanistan—Operation Enduring Freedom—was certainly a typical military operation
involving a significant air and ground campaign. However, if one considers the past 7 years in
Afghanistan and the operations of military-led provincial reconstruction teams which exist essen-
tially to provide basic public service programs, the comparison is more valid (Lane & Sky, 2006;
Maley, 2007). Furthermore, current and past military plans explicitly call for actions in nonmili-
tary domains, such as provision of public services, health services, reconstruction activities and
development of democratic governance (NATO/ISAF, 2006b).8
(Chen, 1990, p. 144, 167). Thus, the analysis gives an indicator of progress toward achieving
the overall mission goals. Following the MOP example above, MOE analysis would seek to
confirm: That the air bombardments did indeed reduce militant activity in the targeted areas;
that the infantry patrols increased townspeople’s feeling of security; that the information given
to locals prevented them from picking up old mines or ordnance; and that the infant mortality
rate declined overall as a result of better access to medical care.
The key component of the Assessment process is the intervening mechanism evaluation.
Intervening mechanism evaluations identify the intervening and contextual factors through
which the treatment affects the outcome, discovering the causal factors underlying program
results (Chen, 1990, p. 191). The first stage of this process is to determine the correlation
between MOP and MOE data. For example, if the MOE criteria for a certain effect were
not met (meaning the effect was not achieved), yet the MOP criteria were met for the actions
linking to the effect (meaning that the actions were implemented correctly), this implies that
the intervening mechanism theory is incorrect. Should the situation be reversed and the data
analysis indicates that the actions were not implemented correctly yet the effect was achieved,
this once again demonstrates an issue with the program theory, even though beneficial effects
were still created. The second stage involves determining the correlation between MOE data
for linked effects, that is, the intervening mechanisms that precede the top-level effects. In fact,
although not explicitly called as such, MOE are often intervening mechanisms in themselves.
For example, if the end effect was feeling of security, a possible MOE would involve simply
questioning locals about their perception of security. However, it is unlikely this MOE would
accurately reflect the true perception of security, especially if the survey questions were
not relative to any baseline. To accurately determine this effect, it may be necessary to gauge
people’s perceptions of local police performance, and crime and criminal prosecution rates,
rather than attributing security only to the military patrols because security is inherently linked
to other nonmilitary factors.
The Assessment staff must determine the issues behind the program theory and equally, the
issues in the implemented program. The intervening mechanism analysis reveals areas where
closer inspection is required. For example, supposing the perception of security in an Afghan
town was perceived negatively, even though the security patrols were performed as planned, the
MOP-MOE correlation analysis would reveal this discrepancy. Intervening mechanism analysis
reveals that local townspeople are troubled by seeing overt military displays as the country’s
long history of conflict is still fresh in many minds. The Assessment staff, on interviewing the
patrols’ leaders may discover that instead of patrolling on foot, wearing light armor, and stopping
to talk to locals or handout candy and water bottles, most patrols were implemented with soldiers
wearing full combat gear driving very fast through town in armored vehicles (NATO, 2007c).
Another vitally important role of theory-based plans and evaluation in EBAO is the iden-
tification of undesired effects—the above case being a very pertinent example. Effects-based
plans explicitly identify possible undesired consequences of planned actions, and Assessment
staff may explicitly seek to measure the ‘‘progress’’ toward undesired effects.
Assessment Reporting
The Assessment staff is responsible for providing the results of progress to the military
commander and the detailed theory-driven evaluations to the planning and systemic analysis
staff. Depending on the nature of the operations, regular cycles of reporting are established, the
leadership receiving monthly, quarterly or even 6-monthly progress updates. Currently, the
reporting methods are simplistic and based almost entirely on ‘‘traffic-light’’ charts that report
the progress of each effect as red, yellow, or green, depending on the effect’s MOE data and
the thresholds of success.
A particular challenge in the implementation of Assessment lies in the scope of the
evaluation. The current International Security Assistance Force (ISAF) military operation in
Afghanistan consists of a multinational HQ in Kabul with five Regional Commands, totaling
about 53,000 personnel (‘‘ISAF mission,’’ 2008), in addition to an air task force, forward
support bases, and several nationally controlled provisional reconstruction teams. The NATO
component of the mission is overseen at the strategic level of command by Allied Command
Operations in Belgium. The tactical, operational, and strategic command levels all essentially
govern and implement interdependent programs of varying complexity and scope that feed
into the overall ISAF mission.
Assessment literature calls for separate evaluations at each level of command, with the
results from a subordinate level aggregating in quantitative and qualitative ways, in addition
with other political or external guidance, to contribute as part of the inputs to the higher
command. In practice in ISAF, only the operational and strategic levels actually conduct
Assessment.9 However, the issue of how individual missions and the overall campaign are
assessed is important, as missions in certain regions may be highly successful, while other
regions may be facing difficulty or failing dramatically. Currently, this problem is solved
simply by staff coordination in the command structure through regular Assessment working
groups with multilevel, multiorganizational representation. Evaluations from each regional
mission are briefed, with the operational command producing an overall evaluation. The
strategic-level commander may be briefed once or twice per year on the ‘‘campaign’’ evalua-
tion, or more regularly if the situation requires.10
Certainly, the most difficult aspect of the reporting process is ensuring utilization. The
Assessment staff will gain probably a deeper understanding of the implemented operations and
its causal mechanism than anyone else in the staff (Weiss, 1997a). The importance of interstaff
collaboration has been noted in several studies on EBAO methodology (NATO, 2006b, 2007d,
2008a, 2008b; Swedish Armed Forces HQ, 2007). Therefore, it is important that the Assess-
ment staff work with their counterparts in plans, intelligence, and systemic analysis to ensure
that the results are properly fed back into the operational planning and management process.
Although the theory of EBAO notes that the direction of the operation should be primarily
informed by the Assessment process, in practice, the military commander has many other outside
influences affecting decision making. On a positive note, however, given the traditionally
process-orientated nature of the military, once an Assessment process is accepted and written
into doctrine, utilization may be easier to ensure than in civilian evaluations of public programs.
An Evaluation Perspective
We have demonstrated that EBAO is a theory-driven construct: An effects-based plan is
equivalent to program theory and Assessment is equivalent to theory-driven evaluation.
Although the advantages and disadvantages of theory-driven evaluation have been well docu-
mented in the literature (see for example, Chen, 1990; Weiss, 1997b), we now briefly consider
them in the context of the military setting.
situation: ‘‘God, I miss the Cold War!’’ The intention was not to understate the severity of that
period of history, but to highlight the fact that focusing solely on a single common enemy was
in many ways simpler than dealing with today’s complex world of stateless terrorist organiza-
tions, multipolar international politics and the wide spectrum of Western military missions.
Military missions are focusing increasingly on complex stabilization and reconstruction
operations which are typically overlaid with antiinsurgency campaigns and security opera-
tions. The advantage of using a program theory approach to planning and evaluation is that
military leadership is encouraged to think about the complex interrelationships that exist in
the operational environment. Although it has been cynically noted that any good military
commander would do this anyway, the benefit of EBAO is that development of theory and
systemic thinking are made explicit in the process. Furthermore, the advantage of using
Assessment at all stages of planning and implementation is that judgments on progress are
based on rational thinking (as opposed to guessing) and are aimed at testing and validating the
planning staff’s estimate of the complex interrelationships.
As military forces are involved in more complex operations with a variety of international
actors, there is an obvious necessity for all the actions of these actors to be synergistic and
complimentary. This concept of policy coherence, as it is known in the field of humanitarian
development, has become a necessity for military leaders to consider (see for example,
Clements, Chianca, & Sasaki, 2008; OECD, 2003; Picciotto, 2005, 2007). One noted advan-
tage for the military of theory-driven planning and evaluation methods is that the majority of
international development agencies use theory-based techniques, thus facilitating their integra-
tion with military activities and improving military—nonmilitary collaboration in the field
(NATO, 2007d, 2008b). Many militaries are investigating becoming more closely aligned with
the Development Assistance Committee’s evaluation methods as part of the incorporation of
EBAO into doctrine.
A key issue in evaluation is always the question of utilization. The military organizational
system is founded upon strong command and control relationships and hierarchy. Although
more flexible command and control structures have been the subject of much study (e.g.,
Alberts & Hayes, 2003), especially for combat situations, the fact remains that in terms of
planning and operational design—established doctrine and bureaucracy still dominate. This
is a feature that can be exploited by military evaluators: As Assessment theory and processes
become doctrinal, Assessment staffs will see that their military customers are expecting
their products because it is incorporated in planning handbooks and taught in classrooms.
Assessment staff, in addition to providing operationally relevant information to aid plan
refinement through a mechanism that is actually specified on paper, will also provide new
information for more traditional management purposes previously absent from the military
such as: Motivation of staff; celebration of progress; budgeting for resource allocation; learn-
ing for future operations, and ensuring accountability (Behn, 2003).
effort to manage ‘‘hyperrationalism’’ (Weiss, 1997a) and ensure that scientific objectivity is
labeled where deserved, and that subjectivity and error is highlighted where necessary. If
program theory and evaluation is to be an effective tool, the benefits and limitations alike must
be understood.
A typical problem encountered in planning for military operations is that generally the
scope of planned ‘‘programs’’ is very broad and involves and affects a wide variety of actors.
For large operations in Iraq and Afghanistan where the operational scope is national, the pro-
gram theory used to develop the plan has to be very broad and holistic—a very ambitious task
for any planner. The planner has a difficult task in choosing the level of granularity of their
program theory, let alone selecting a theory from the many alternatives possible. A similar
problem is thus faced by the Assessment staff in deciding the indicators of progress. These
concerns reveal the general situations in which theory-based methods are best used. Given the
complexity of military programs, using the results from Assessment as a primary decision
driver may be unrealistic. It is far more realistic to expect that Assessment, in the case of rap-
idly evolving offensive military conflicts, will be conducted as a postoperational activity for
the purposes of review and capturing lessons learnt. However, in slow-moving, humanitarian,
reconstruction and peace-support operations, more time will be allowed for the application of
theory-based methods.
Although strong command and control may allow for improved use in comparison to
civilian examples, this may also present a difficulty in the actual relevance of the results being
presented. Military culture is not yet suited to theoretical concepts in planning: Current
military officers are trained to collect data, analyze options, decide and move on. The point
of Assessment is that postulated program theory (i.e., the effects-based plan) is a hypothesis
to be tested—and then refined. There must be a certain amount of iteration involved. Current
military thinking requires that several ‘‘courses-of-action’’ are analyzed in planning, from
which the commander selects one. A decision made by a senior staff officer is usually binding
to some extent. There is a danger that Assessment results will be constrained to inform only
within the framework of the existing or chosen course-of-action, rather than allow the creation
of a whole new one.
Conclusion
EBAO has brought many beneficial concepts to the military’s way of operating. The use of
program theory driven planning: Reinforces the necessity to think holistically about causal
mechanisms from treatment to outcome; increases the consideration of actors, events and their
relationships outside the traditional military domain; and allows Assessment models and
continuous monitoring of progress to be rationally derived from theory. The practice of
program-theory driven Assessment: Provides a holistic evaluation of operational progress;
allows the identification of inappropriate assumptions underlying a plan; facilitates the iden-
tification of unintended positive and negative effects; and allows improved use as Assessment
results contribute to the development and refinement of theory based plans—which constitute
‘‘fundamental’’ social science knowledge in military programs. As noted by Bickman (2000),
‘‘the strength of program theory depends on substantive knowledge in the field’’ (p. 112). Any
evaluator knows that the mechanism of developing program theory and performing an evalua-
tion often leaves them with a better understanding of the program than the original planners. It
is this improvement in foundation, social science theory and knowledge that we hope increased
Assessment activities can bring to military operations.
Notes
1. This thinking was core in Secretary McNamara’s widely criticized introduction of the Planning Programming
Budgeting System (PPBS) in the US Department of Defense (Schick, 1969; Wildavsky, 1966, 1969), and its applica-
tion in the Vietnam ‘‘body count’’ progress indicators was attributed as being one of the major causes of failure in the
war (Gartner & Myers, 1995; Perrin, 1998). In addition to the rational-model assumptions necessary for PPBS, many
were offended by the crassness of the ‘‘body count’’ statistics regularly touted by the Department of Defense. We
thank an anonymous reviewer for this point.
2. It should be noted that the processes and terminology described in this article are not yet formal policy or doc-
trine of North Atlantic Treaty Organisation (NATO) forces; however, at the time of writing, several studies examining
effects-based approach to operations (EBAO) in NATO are ongoing. EBAO’s future is more troubled in the United
States: A recent and very unpopular decision by General Mattis, US Joint Forces Command, specifically orders all
development of program theory concepts, including effects-based operations to cease (Mattis, 2008). At the time
of writing, within NATO, EBAO concepts are still in use—to varying extents—in Afghanistan and the Balkans.
3. An anonymous reviewer noted that there is a subtle distinction between theory-based and theory-driven evalua-
tion. From the large amount of literature reviewed in preparation of this article, the authors confidently conclude that
in the majority of journal articles on theory-based/-driven evaluation, these terms are used interchangeably. However,
some do explicitly make a distinction. For example, Gargani (2003) argues that theory-based applies to the general
application of theory to evaluation, although theory-driven corresponds to Chen’s (1990) 6-stage model of evaluation.
In this article, we choose theory-driven to highlight the intended close connection with Chen’s model. We suggest,
however, that this may a valuable discussion topic for the ‘‘Dialogue’’ section of AJE.
4. Where before, 10 bunker-busting bombs requiring several aircraft would be planned for to completely destroy
an air defense station, the air operations staff soon realized that the effect of one single bomb was to cause the Iraqi
operators to immediately shut down that station to prevent detection—thus, the same effect was created: Deny use of
enemy air defense. However, the military ‘‘footprint’’ in strike craft and supporting logistics had been significantly
reduced.
5. Most military operational headquarters are organized by functional branches such as Plans, Intelligence,
Logistics, Engineering, Communications, etc. The study of program theory planning methods has resulted in the concept
of system of systems analysis, in which specialized analysts create detailed models of the operational environment
(i.e., prescriptive program theory), using techniques based on system concepts such as social networking, influence
diagrams, and network analysis. In experimental trials in the Balkans and Afghanistan, the systems analysis staff has
been part of the Plans branch, although there is some debate about whether they should fall under Intelligence.
6. A debate is ongoing in the NATO Assessment community currently about whether the high-level end states
should have indicators to determine that achieving all the effects did indeed achieve the end-state.
7. The reader will observe that the references to these studies mention ‘‘experiments.’’ In the original article sub-
mission we included this word in the text—attracting great interest from an anonymous reviewer. Some clarification is
needed: Although NATO and the United States have tested the concepts of Assessment and EBAO in fairly large-
scale, controlled environments; they were generally one group—single trial events, rather than experiments in the for-
mal sense. Unfortunately, the military community is not so rigorous with the term, ‘‘experiment.’’
8. There are ongoing political debates about whether or not NATO, as a military alliance, should engage in pro-
grams outside of the military domain.
9. The approach described here is an approximation to reality. The full process and organizational structure cannot
be described due to security restrictions.
10. It is the authors’ judgment that performing a campaign evaluation by simple amalgamation of results from
individual missions is an unsatisfactory solution, although this is partially caused by the lack of comprehensive and
integrative program theory from strategic to operational levels. Individual treatment evaluations (which can be amal-
gamated) at the mission level are appropriate to capture the success of implemented programs, however, the impact
and intervening mechanism evaluations should be comprehensive across the entire campaign and closely aligned with
overall program theory.
11. The authors conducted an extensive literature search and found no reference to any civilian evaluation in
military Assessment literature.
References
Alberts, D., & Hayes, R. (2003). Power to the edge—command and control in the information age. Washington, DC:
Command and Control Research Program. Retrieved September 10, 2008, from http://www.dodccrp.org/files/
Alberts_Power.pdf.
Alberts, D., & Hayes, R. (2007). Planning for complex endeavors. Washington, DC: Command and Control Research
Program. Retrieved September 10, 2008, from http://www.dodccrp.org/files/Alberts_Planning.pdf.
Bärtl, M. (2007, March). Practices, challenges, and issues to resolve in campaign assessment—operational HQ
experiences from ISAF. Paper presented at NATO Allied Command Transformation Operational Analysis Confer-
ence, Norfolk, VA.
Behn, R. (2003). Why measure performance? Different purposes require different measures. Public Administration
Review, 63, 586-606.
Bickman, L. (1987). The functions of program theory. In Bickman. L (Ed.), Using program theory in evaluation
(pp. 5-19). New Directions for Program Evaluation, Vol. 33. San Francisco, CA: Jossey Bass.
Bickman, L. (2000). Summing up program theory. In Bickman. L (Ed.), Program theory in evaluation: Challenges
and opportunities (pp. 103-112). New Directions for Evaluation, Vol. 87. San Francisco, CA: Jossey Bass.
Chen, H.-T. (1990). Theory-driven evaluations. Newbury Park, CA: Sage.
Chen, H.-T. (1994). Theory-driven evaluations: Needs, difficulties, and options. Evaluation Practice, 15, 79-82.
Chen, H.-T. (2005). Practical program evaluation. Assessing and improving planning, implementation, and effective-
ness. Thousand Oaks, CA: Sage.
Chen, H. T., & Rossi, P. H. (1981). The multi-goal, theory-driven approach to evaluation: A model linking basic and
applied social science. In Freeman. H. E, & Solomon. M. A (Eds.), Evaluation Studies Review Annual, Vol. 6.
Beverly Hills, CA: Sage.
Chen, H.-T., & Rossi, P. H. (1983). Evaluating with sense: The theory-driven approach. Evaluation Review, 7,
283-302.
Chen, H.-T., & Rossi, P. H. (1987). The theory-driven approach to validity. Evaluation and Program Planning, 10,
95-103.
Chen, H.-T., & Rossi, P. H. (1992). Using theory to improve program and policy evaluations. New York, NY: Green-
wood Press.
Clausewitz, C. von. (1968). On war (Graham, J. J., Trans.). London, Great Britain: Penguin. (Original work published
1832).
Clements, P., Chianca, T., & Sasaki, R. (2008). Reducing world poverty by improving evaluation of development aid.
American Journal of Evaluation, 29, 195-214.
Curry, H. (2004). The current battle damage assessment paradigm is obsolete. Air and Space Power Journal, 18, 13-17.
Deptula, D. (2001a). Effects-based operations: A change in the nature of warfare. Arlington, VA: Defence and
Airpower Series, Aerospace Education Foundation. Retrieved December 14, 2008, from http://www.airforce-
magazine.com/MagazineArchive/Documents/2001/April%202001/0401effects.pdf.
Deptula, D. (2001b). Firing for effects. Air Force, 84, 45-53. Retrieved September 10, 2008, from http://www.afa.org/
magazine/April2001/0401effects.pdf.
Diehl, J. E., & Sloan, C. E. (2005). Battle damage assessment: The ground truth. Joint Force Quarterly, 37, 59-64.
Donaldson, S. I. (2007). Program theory-driven evaluation science: Strategies and applications. New York, NY: Tay-
lor and Francis.
Drucker, P. F. (1954). The practice of management. New York, NY: Harper and Row.
Defence, Science and Technology Laboratory. (2005). Code of best practice for the use of measures of effectiveness
(MOE) (Report No. Dstl/CR14304v1.1). UK Ministry of Defence.
English, B., & Kaleveld, L. (2003). The politics of program logic. Evaluation Journal of Australasia, 3, 35-42.
Evans, D. (2003). Operational analysis in support of HQ ISAF, Kabul, Afghanistan, 2002. In Woodcock. A, & Davis. D
(Eds.), Analysis for governance and stability (pp. 198-226). Cornwallis, Canada: The Canadian Peacekeeping Press.
Gargani, J. (2003, November). The history of theory-based evaluation: 1909 to 2003. Paper presented at the American
Evaluation Association annual conference, Reno, NV.
Gartner, S. S., & Myers, M. E. (1995). Body counts and ‘‘success’’ in the Vietnam and Korean wars. Journal of Inter-
disciplinary History, 25, 377-395.
Hopkin, A. J. (2004). Operational analysis in support of HQ MND(SE), Basrah, Iraq, 2003. In Woodcock. A, & Rose. G
(Eds.), Analysis for Stabilization and Counter-Terrorist Operations, Cornwallis, Canada: The Canadian Peacekeep-
ing Press.
ISAF mission. (2008, July). ISAF Mirror, 49. ISAF Public Affairs Office. Retrieved September 10, 2008, from http://
www.nato.int/isaf/docu/mirror/2008/mirror_49_200807.pdf.
Jobbagy, Z. (2005). Powered flight, strategic bombing and military coercion: Study on the origins of effects-based
operations. Den Haag, The Netherlands: The Clingendael Centre for Strategic Studies. Retrieved December 14,
2008, from http://www.hcss.nl/en/publication/37/Powered-Flight,-Strategic-Bombing-and-Military-Coe.html.
Lambert, N. J. (2002). Measuring the success of the NATO operation in Bosnia and Herzegovina 1995-2000.
European Journal of Operational Research, 140, 459-481.
Lane, R., & Sky, E. (2006). The role of provincial reconstruction teams in stabilization. Royal United Services Institute
Journal, 151, 46-51.
Lipsey, M. W., Crosse, S., Dunkle, J., Pollard, J., & Stobart, G. (1985). Evaluation: The state of the art and the sorry
state of the science. In Cordray. D. S (Ed.), Utilizing prior research in evaluation planning (pp. 7-28). New Direc-
tions for Program Evaluation, Vol. 27. San Francisco: Jossey-Bass.
Maley, W. (2007). Provincial reconstruction teams in Afghanistan—how they arrived and where they are going.
NATO Review, 2007(3), NATO. Retrieved September 10, 2008, from http://www.nato.int/docu/review/2007/
issue3/english/art2.html.
Mattis, J. N. (2008). USJFOM commander’s guidance for effects-based operations. Joint Force Quarterly, 51, 105-108.
Moltke, H. von. (1993). On the art of war: Selected writings. (Hughes, D. J., & Bell, H. Trans.; Hughes, D. J., Ed.).
New York, NY: Random House (Original work published in 1871).
Murray, W., & Scales, R. H. (2003). The Iraq war: A military history. Cambridge, MA: Harvard University.
NATO. (2006a). MC position on an effects-based approach to operations. North Atlantic Military Committee (signed
06 June 2006). (Document No. MCM 0052-2006/IMSWM-0147-2006 SD3).
NATO. (2006b). Multinational experiment 4—NATO analysis report. H.Q. Supreme Allied Command Transforma-
tion, Norfolk, VA.
NATO. (2007a). BiSC draft pre-doctrinal EBAO handbook version 4.2, October 4, 2007. Bi-Strategic Command
Effects-Based Approach to Operations Working Group.
NATO. (2007b). Bi-Strategic Command Discussion Paper on EBAO. (Document No. J5PLANS/2920-036/07). H.Q.
Supreme Allied Powers Europe and H.Q. Supreme Allied Command Transformation.
NATO. (2007c). Draft effects-based assessment handbook version 1.0. H.Q. Supreme Allied Command Transforma-
tion, Norfolk, VA.
NATO. (2007d). Effects-based assessment NATO analysis report. Limited objective experiment conducted as a por-
tion of the multinational experiment series 5, 04-08 June 2007. H.Q. Supreme Allied Command Transformation,
Norfolk, VA.
NATO. (2008a). Enabler ’08 experiment report. H.Q. Supreme Allied Command Transformation, Norfolk, VA.
NATO. (2008b). Multinational experiment 5 analysis report. H. Q. Supreme Allied Command Transformation,
Norfolk, VA.
NATO/ISAF. (2006). HQ ISAF KABUL—operational plan. Kabul, Afghanistan: Author.
Neighbour, M., Bailey, P., Hawthorn, M., Lensing, C., Robson, H., Smith, S., et al. (2002). Providing operational anal-
ysis to a peace support operation: The Kosovo experience. Journal of the Operational Research Society, 53,
523-543.
Organisation for Economic Cooperation and Development. (2003, July). Policy coherence: Vital for global develop-
ment. Retrieved September, 10, 2008, from http://www.oecd.org/dataoecd/11/35/20202515.pdf.
Perrin, B. (1998). Effective use and misuse of performance measurement. American Journal of Evaluation, 19,
367-379.
Picciotto, R. (2005). The evaluation of policy coherence for development. Evaluation, 11, 311-330.
Picciotto, R. (2007). The new environment for development evaluation. American Journal of Evaluation, 28, 509-521.
Posavac, E.J., & Carey, R. G. (2006). Program evaluation: Methods and case studies (7th ed.). Upper Saddle River,
NJ: Pearson.
Pressman, J. L., & Wildavsky, A. B. (1984). Implementation. Berkeley, CA: University of California.
Rauch, J. T. (2002). Assessing airpower’s effects: Capabilities and limitations of real-time battle damage assessment.
(Masters dissertation, School of Advanced Airpower, Air University, 2002). Retrieved September 10, 2008, from
http://handle.dtic.mil/100.2/ADA420587.
Rogers, P. J. (2000). Program theory: Not whether programs work but how they work. In Stufflebeam. D. L, Madaus. G. F,
& Kellaghan. T (Eds.), Evaluation models (pp. 209-232). Boston: Kluwer Academic.
Rogers, P. J. (2007). Theory-based evaluation: Reflection ten years on. In Mathison, S. (Ed.), Enduring issues in
evaluation: The 20th anniversary of the collaboration between NDE and AEA. New Directions for Evaluation,
Vol. 114. San Francisco: Jossey Bass.
Sartorius, R. H. (1991). The logical framework approach to project design and management. American Journal of Eva-
luation, 12, 139-147.
Scheirer, M. A. (1987). Program theory and implementation theory: Implications for evaluators. In Bickman, L. (Ed.),
Using program theory in evaluation (pp. 59-76). New Directions for Program Evaluation, Vol. 33. San Francisco:
Jossey Bass.
Schick, A. (1969). System politics and systems budgeting. Public Administration Review, 29, 137-151.
Scriven, M. (1994). The fine line between evaluation and explanation. Evaluation Practice, 15, 75-77.
Shaw, I., & Crompton, A. (2003). Theory, like mist on spectacles, obscures vision. Evaluation, 9, 192-204.
Smith, E. (2003). Effects-based operations: Applying network centric warfare in peace, crisis and war. Washington,
DC: Command and Control Research Program. Retrieved September 10, 2008, from http://www.dodccrp.org/
files/Smith_EBO.PDF.
Smith, E. (2006). Complexity, networking, and effects-based approaches to operations. Washington, DC: Command
and Control Research Program. Retrieved September 10, 2008, from http://www.dodccrp.org/files/Smith_
Complexity.pdf.
Stufflebeam, D. L. (Ed.). (2001). Evaluation models. New directions for evaluation, 89. San Francisco: Jossey-Bass.
Sullivan, H. & Stewart, M. (2006). Who owns the theory of change? Evaluation, 12, 179-199.
Swedish Armed Forces Headquarters. (2007). Swedish EBAO development after the autumn experiment 2006.
Enkoping, Sweden: Swedish Armed Forces.
UNESCO. (2007). Results-based programming, management and monitoring (RBM) at UNESCO: Guiding princi-
ples. Bureau of Strategic Planning, UNESCO, Paris, France. Retrieved September 10, 2008, from http://portal.
unesco.org/fr/files/40194/11924654571BSP_RBM_guiding_principles_October_2007__2_.
pdf/BSPþRBMþguidingþprinciplesþOctoberþ2007þ_2_.pdf.
Weiss, C. H. (1997a). How can theory-based evaluations make greater headway? Evaluation Review, 21, 501-524.
Weiss, C. H. (1997b). Theory-based evaluation: Past, present and future. In Rog, D. J., & Fournier, D. (Eds.), Progress
and future directions in evaluation: Perspectives on theory, practice, and methods, New Directions for Evaluation,
Vol. 76. San Francisco, CA: Jossey Bass.
Wildavsky, A. B. (1966). The political economy of efficiency: Cost-benefit analysis, systems analysis, and program
budgeting. Public Administration Review, 26, 292-310.
Wildavsky, A. B. (1969). Rescuing policy analysis from PPBS. Public Administration Review, 29, 189-202.