Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Article

American Journal of Evaluation


1-23

The Mechanistic Rewards © The Author(s) 2023

of Data and Theory Integration Article reuse guidelines:


sagepub.com/journals-permissions

for Theory-Based Evaluation DOI: 10.1177/10982140221122764


journals.sagepub.com/home/aje

Corrado Matta1, Jannika Lindvall2 ,


and Andreas Ryve2,3

Abstract
In this article, we discuss the methodological implications of data and theory integration for Theory-
Based Evaluation (TBE). TBE is a family of approaches to program evaluation that use program the-
ories as instruments to answer questions about whether, how, and why a program works. Some of
the groundwork about TBE has expressed the idea that a proper program theory should specify the
intervening mechanisms underlying the program outcome. In the present article, we discuss in what
way data and theory integration can help evaluators in constructing and refining mechanistic pro-
gram theories. The paper argues that a mechanism is both a network of entities and activities
and a network of counterfactual relations. Furthermore, we argue that although data integration
typically provides information about different parts of a program, it is the integration of theory
that provides the most important mechanistic insights.

Keywords
data integration, theory integration, mechanism, theory-based evaluation, process tracing

Introduction
In this article, we discuss the methodological implications of data and theory integration for
Theory-Based Evaluation (TBE). We take as a point of departure the interest—expressed in some
of the main groundwork for TBE—in investigating the intervening mechanisms that contribute to
a program outcome. We discuss the claims concerning mechanisms in TBE and whether these
claims can justify methodological choices connected to data and theory integration. In addition,
we concretize our discussion with a case: a research project conceptualizing and evaluating a
national-scale professional development (PD) program for mathematics teachers in Sweden.
By delving into the complex methodological details of our case, we investigate what kinds of
mechanisms are studied in our examples and how different forms of integration contribute to the

1
Linneaus University, Vaxjo, Sweden
2
Mälardalen University, Vasteras, Sweden
3
Östfold University College, Halden, Halden

Corresponding Author:
Corrado Matta, Department of Pedagogy and Learning, Linnaeus University, Universitetsplatsen 1, 351 95, Vaxjo, Sweden.
Email: corrado.matta@lnu.se
2 American Journal of Evaluation 0(0)

construction and refinement of a mechanistic program theory, that is, a theory that accounts for how a
program generates its target outcome.
In the paper, we argue for three main claims. First, researchers in the field of TBE should conceive
mechanisms as both networks of counterfactual relations and networks of theoretically specified enti-
ties and activities. Secondly, assuming this concept of mechanisms, it emerges from the analysis of
our cases that data integration provides useful information about the parts of a program—that is, the
activities and materials that constitute the program’s content—but provides no further contribution to
the construction and refinement of a mechanistic program theory. Thirdly, the integration of theories
seems to be the main driver of mechanistic theorizing, and the use of theoretical resources seems to be
pervasive in the process of the construction and refinement of a mechanistic program theory.
The paper begins by introducing the concept of TBE and clarifying the role of program theories in
this family of approaches. We then set the stage for our discussion by showcasing how the concept of
a mechanism has been discussed in the context of TBE. In the same section, we propose a concep-
tualization of mechanisms in TBE. This conceptualization is the result of our interpretation of the
claims concerning mechanism in the TBE, but it is not a simple descriptive account of mechanistic
commitments among evaluation researchers. The next section concerns how integrative methods
have been discussed in TBE, and especially what rewards are connected to different forms of inte-
grations. In light of the discussions of mechanisms and integration in TBE, we move on to our anal-
ysis. This consists of two sections in which we discuss in what ways data integration and theory
integration can contribute to constructing and refining mechanistic program theories, that is, theories
that account for the mechanism through which a program generates its outcome. Our analysis is based
on a case of a research project evaluating a national-scale PD program for mathematics teachers in
Sweden. We conclude in the last section by summarizing our claims and discussing the contribution
of our discussion to the field of TBE.

Theory-Based Evaluation
The concept of theory-based evaluation has been around for at least three decades, as illustrated by
the historical perspectives on TBE in Weiss (1997) and Rogers and Weiss (2007). TBE is a term that
describes a family of approaches to the evaluation of interventions that have as a common denom-
inator the use of an explicit theory as an instrument for answering questions about whether a
program works, and also how and why it does (Chen, 2006; Coryn et al., 2011). In other words,
TBE describes all the approaches to evaluation that focus on the logic of an intervention, and that
use a theory to define this logic. Along with its more common use is a longer list of labels, such
as theory-driven evaluation, program-theory evaluation, theory-guided evaluation, theory-of-action,
theory-of-change, program logic, or logical frameworks (Coryn et al., 2011), which, however, are not
always used with the same distinct definitions (Rogers & Weiss, 2007).
Central to TBE is the use of a program theory which should function as an instrument that guides
both the design and the application of the evaluation (Coryn et al., 2011). The content and nature of
program theories vary depending on the approach to TBE. To name an example: Weiss (1997) dis-
tinguishes between implementation theories and programmatic theories. The former describes the
activities and materials that are involved in the program, together with its intermediate and final out-
comes; the latter describes the processes or mechanism through which the implementation of a
program generates its outcome (for instance, the cognitive processing of the information provided
during the program implementation). Weiss goes on explaining that Theory of Change approaches
to TBE (also discussed in e.g., Funnell & Rogers, 2011; Rogers & Weiss, 2007) involve both pro-
grammatic and implementation theories. A theory of change describes, according to its proponents,
the interventions as a chain of activities, processes, and intermediate outcomes. According to this
approach, the intervention can be described as effective only if the activities that are described in
Matta et al. 3

the theory of change were actually implemented, and the contribution of the activities described—for
instance, the cognitive processes resulting from the implementation—is accounted for in the theory
of change and is empirically supported.

Mechanisms in TBE
Back in 1989, during the earlier development of TBE, Chen, one of the main contributors to TBE in
this phase, discussed a conceptual framework for TBE (Chen, 1989). In this framework, he describes
the main conceptual dimensions of theory in TBE. Theories that are employed for evaluating inter-
ventions must belong to certain theoretical domains, Chen (1989) argues, and one of these domains is
what he calls the “intervening mechanism domain” (p. 393). According to Chen:

This domain investigates the causal processes which link the implemented treatment to outcomes (i.e., the
processes by which the treatment produces or fails to produce the desired outcome). Program treatment
usually affects program outcomes through some intervening process. An investigation of intervening
mechanisms will provide information about why a program works or does not work, and help to diagnose
the strengths and/or weakness of a program for possible improvement. (1989, p. 393)

Chen describes here a set of specific constraints on the theory that determine the possibility of
using a theory for evaluation purposes. The theories that TBE requires are causal, process-oriented,
and likely to describe the causal interaction between several processes. Another early contributor to
the mechanism-oriented discussion in TBE is Weiss (1997), who stressed the importance of knowing
the mechanism for extrapolating program outcomes to other contexts:

Knowing the mechanism that works is even more important for other sites that want to adopt the success-
ful program. It is impossible for them to replicate the entire set of materials, procedures, physical condi-
tions, and relationships that make up the program. They have to adapt it to their own participants. When
they know the essential levers, they can make the adaptation without fear of losing the key component that
makes the program effective. (Weiss, 1997, p. 511)

Although the concept of mechanism is present in the earlier contributions to TBE, it reached its
centrality with the rise of realistic evaluation, a specific approach to TBE, couched in the philosophy
of critical realism, first developed by Pawson and Tilley (1997). The main claim of Realist Evaluation
is that an outcome is the result of the interaction between a mechanism and a context. Pawson and
Tilley conceptualize program theories as mechanism-context-outcomes, to stress the importance of
context. The theory that realist evaluators seek to develop and test accounts for how a specific
context enables a mechanism to be effective. This form of theory is particularly fruitful for evaluation
purposes, as it specifies not only how a program intervention works, but also under which contextual
conditions—as Weiss (1997) stressed above—the effect of the program can be replicated.
We do not aim at providing an overview of the use of the mechanism concept in the TBE litera-
ture. Astbury and Leeuw (2010), Dalkin et al. (2015), and, more recently, Schmitt (2020), and Lemire
et al. (2020) provide a detailed overview of this kind, showing the widespread use and the varieties of
conceptualizations of the mechanisms in TBE. These overviews identify a number of recurrent
themes concerning how mechanisms are conceptualized in the context of TBE. For instance,
Astbury and Leeuw (2010) identify the terms “hidden,” “sensitive to variations in context,” and “gen-
erate outcomes” as recurring ways of characterizing mechanisms. Schmitt (2020) provides a taxon-
omy of two types of mechanism concepts used in the evaluation literature: behavioral mechanisms
(describing changes in individual reasoning and decisions that mediate behavioral effects) and
process mechanisms (describing how different activities involved in an intervention or program
4 American Journal of Evaluation 0(0)

are linked together). In the same thematic volume on causal mechanisms in program evaluation,
Lemire et al. (2020) provide further types of mechanism concepts in evaluation research: program
components, psychological reactions, behavioral reactions, and contextual conditions.
These recent taxonomies are mainly descriptions of how the concept of mechanism is used in eval-
uation research and focus on what kind of components make up a mechanism.
What we observe in the literature on mechanisms in TBE is the absence of proposals for a norma-
tive standard about the nature of mechanisms. Arguably, the variety of uses of the term mechanism in
the literature about TBE depends on this lack of a shared standard. Our first contribution in this paper
is to fill this gap by proposing a normative definition of the concept of mechanism for TBE. Our pro-
posal is based on how some features of mechanisms appear in the discussion around TBE but is not
descriptive. Moreover, our proposal focuses on what defines a mechanism in TBE rather than on what
constituents should a mechanism include. Our conceptualization consists of two claims:

(a) The causal pathway claim and


(b) The entities and activities claim.

As we will elaborate later on, we argue that, together, these claims define how mechanisms should be
understood in terms of these features, even if some existing approaches to TBE might not cohere with
them. Our conceptualization does not settle the questions of what component types should make up a
mechanism (such as program activities, psychological or behavioral reactions, or other forms of
agency-related factors). We refer the reader to Dalkin et al. (2015) and to Lemire et al. (2020) for
a discussion of this issue.

Causal Pathway Claim


As we mentioned above, Chen (1989) conceptualizes a mechanism as a set of “causal processes,” that
is, “the processes by which the treatment produces or fails to produce the desired outcome” (p. 393).
Crucially, Chen characterizes mechanisms as “intervening,” a concept that will be clarified by Tilley
(2000) later on:

In the case of social programmes we are concerned with change. Social programmes are concerned with
effecting a change in a regularity. The initial regularity is deemed, for some reason, to be problematic. The
programme aims to alter it. A pattern may be problematic for a whole variety of reasons. There may be
crime problems, problems of pupils failing at school, health pattern problems, literacy difficulties, child
care weaknesses and so on. […] The aim of a programme is to alter these regularities. Thus, […] evalu-
ations of programmes are concerned with understanding how regularities are altered. (p. 5)

This passage clearly shows a central idea of how programs produce their effect on their target var-
iables: they effectively intervene to alter a regularity. This way of conceiving a causal mechanism
seems to fit well with the so-called interventionist theories of causation, according to which A
causes B if there is a potential intervention I on A that is capable of generating a change in B
(Woodward, 2005) when all other factors relevant to A and B are kept stable. A program theory
describes just such type of potential intervention I, and its capacity is to change the effect of A on
B prior to the intervention, disrupting or altering the regularity A → B (when I is not correlated to
any other factors relevant to A and B). The crucial aspect that we want to highlight here is that pro-
grams are—in the light of the interventionist theory—counterfactually connected to their outcome: If
the program were not implemented, the regularity A → B would not have been altered. Woodward’s
(2005) interventionist theory of causation is originally intended to be a counterfactual theory. In ideal
situations, A causes B if there is a concrete intervention on A that leads to a ceteris paribus change in
B. However, as Runhardt (2020) discusses, causal claims should be defined in all situations in which
Matta et al. 5

ideal interventions are not possible (situations that are common in program evaluation). In these
cases, a causal claim is associated with a counterfactual claim about what change would result if
an intervention were applied to A. Therefore, we should interpret the shared claim of Chen (1989)
and Tilley (2000) about an intervening mechanism as a counterfactual concept, as ideal interventions
will often not be available when evaluating programs.
A further element that provides an indication of the nature of mechanisms is in Pawson and
Tilley’s (1997) work and is the fact that a mechanism is not a further intervening variable, but
rather “an account of the make-up, behavior, and interrelationships of those processes which are
responsible for the regularity” (p. 68). Just because a program theory is a theory, it is not enough
to describe the intervention as an intervening factor having a counterfactual effect on a regularity,
but there is a sense in which program theories should decompose the intervening factor into more
basic components and their relationships. Pawson and Tilley (1997) group these components into
two basic families: resources and reasoning, where the former includes all material artifacts that a
program involves, and the latter the reactions of the individual involved. A school-based program
against the bullying might use information material for parents to improve their capacity of identify-
ing the signs that their child is suffering from bullying. The mechanism consists of (a) the content of
the information material, (b) the parents’ cognitive beliefs and desires resulting from the information
material, (c) the relation between the two (the parents’ capacity changes, because the beliefs and
desires are a result of reasoning processes made possible by the content of the information material).
This is the idea of mechanisms as constitutive of the causal relationship between programs and their
outcomes that Pawson and Tilley argue for in the same chapter. We will return to the issue of
“makeup” in the next section. Here, we want to focus on the issue of relations. Assuming that the
relationships between the parts of a program are a necessary feature of the mechanism, it seems
that these inherit the more general counterfactual nature of programs discussed by Tilley (2000)
above. What we mean is that the relationships between the constituents of a mechanism in TBE
are themselves counterfactuals and that the global counterfactual relation between programs and out-
comes is determined by the internal counterfactual relationship among the parts of a program. Only,
the list of composing counterfactual relations between parts is not enough to specify the mechanism
responsible for the outcome. Mechanisms are structured, and their structure determines their out-
comes just as much as their components.
Consider again the anti-bullying program. It is supposed to work by raising parents’ awareness,
which is itself the effect of reading the information material. Even if the main components are in
place (parents’ awareness and information material), the account of how the program works is incom-
plete, as the program theory does not clarify that the parents’ awareness would not have changed in an
observed way if the information material had not provided the parents with the necessary resources.
A program mechanism is a network of counterfactuals, and the outcome of a program is brought
about by the parts of a program and their structural arrangements: two networks of the same program
parts arranged differently determine two different mechanisms. We call this the causal pathway
claim: it is a claim about the nature of program mechanisms that are necessary (but maybe not suf-
ficient), and that sets a standard for our understanding of what a mechanism is, but maybe not nec-
essarily about how a mechanism should be empirically studied.
Claims like the causal pathway claim have been defended in the philosophical literature on mech-
anisms in science. According to Cartwright and Stegenga (2012), in order for a mechanism to answer
a how question, an evaluator must be able to “trace out the causal pathway from policy variable to
effect” (p. 35). In other words, a mechanistic model of an intervention consists of a network of
causal relations that has its main end-node in the expected effect. Causal pathways are essential in
the analysis of mechanisms, as they express the idea that not only does an intervention on some
part of the mechanism corresponds to an observable effect on its outcome, but also that the effect
is effectively produced by the different parts of the intervention. Marchionni and Reijula (2019)
6 American Journal of Evaluation 0(0)

argue for a similar claim. According to them, given any parts A and B of a mechanism M, in order to
qualify M as a causal mechanism, it must be the case that A is counterfactually related to B; that is,
the contribution of A to B would not have been obtained if A had not been obtained. Moreover,
Marchionni and Reijula (2019) seem to claim that the causal path claim is both necessary and suffi-
cient to define a mechanism. The composing parts and their counterfactual relations are all there is
about mechanisms.
The concept of mechanism that emerges from the causal pathway claim is oftentimes represented
as a graph that includes all the relevant factors and in which internal counterfactual relations are rep-
resented as arrows. This is the approach to the causal mechanism advocated by Pearl (2000). The
approach to the causal mechanism he developed entails that the contribution of each factor in a
program can be represented as a structural equation, describing each factor as a function of the
other factors. The global mechanism that explains the outcome of a program can then be represented
as a system of structural equations. Factors that are only determined by contextual elements are exog-
enous, whereas factors that are determined by the parts of the program are endogenous. This is a way
of conceptualizing the mechanism-context-outcome configuration introduced by Pawson and Tilley
(1997).

Entities-and-Activities Claim
There are some interesting indications that the causal pathway claim might only tell a part of the story
about the nature of program mechanisms. The issue is whether counterfactual relations are sufficient
to build up a theory, that is, following Chen (1989), Weiss (1997), and Pawson and Tilley (1997), an
explanatory account of how a program brings about its outcome.
As we mentioned in the previous section, Pawson and Tilley (1997) argue that the specification of
intermediate variables is not enough to account for a mechanism; what is necessary is a theory spec-
ifying Pawson and Tilley’s “makeup” of and “relationships” among the parts of the program. As we
have argued in the previous section, the relationships between parts of a program must be counter-
factual, but there is nothing in that argument that justifies the claim that the specification of counter-
factual dependencies between the parts of a program sufficiently describes the program’s mechanism.
This is because counterfactual relations seem to lack the capacity of clarifying the “makeup” of mech-
anisms. Counterfactual relations cannot indicate anything more than a relationship between changes
(if X → Y, then there would not have been any change in Y if there had not been a change in X).
However, in our experience, counterfactual relations are grounded in concrete processes (a term
used by Chen (1989)) that can be suitable for specific descriptive accounts. Therefore, we might
say that the counterfactual relation is in itself rather thin. Take, for instance, Astbury and Leeuw
(2010):

Mechanisms appear too frequently as unexplained ‘causal arrows’ that seem to flourish so well in the
present climate of enthusiasm with visual logic models. This does not seem to be what theory-driven eval-
uators had in mind when they introduced the concept of ‘mechanism’ to the evaluation community.
(p. 367)

Astbury and Leeuw go on providing—as means of exemplification—a citation from Weiss


(1997):

[…] if counselling is associated with reduction in pregnancy, the cause of change might seem to be the
counselling. But the mechanism is not the counselling; that is the program activity, the program process.
The mechanism might be the knowledge that participants gain from the counselling. Or it might be that the
existence of the counselling program helps to overcome cultural taboos against family planning; it might
Matta et al. 7

give women confidence and bolster their assertiveness in sexual relationships; it might trigger a shift in the
power relations between men and women. These or any of several other cognitive, affective, social
responses could be the mechanisms leading to desired outcomes. (p. 46)

Here, Weiss (1997) claims that establishing (counterfactual) dependencies does not per se specify
a mechanism. Instead, this specification requires the description of the actual processes or activities
that ground that dependency.
Schmitt and Beach (2015) argue for a similar claim: leaving causal interdependencies between
parts of a program unspecified entails that the resulting program theory cannot be considered as a
mechanistic one. In fewer words, according to Schmitt and Beach (2015), a network of causal
arrows is not a mechanism. Instead, they argue that the crucial feature of mechanistic theories is
the specification of causal arrows by means of “activities” (pp. 431–434).
In our example of the anti-bullying program, activities would describe the specific cognitive pro-
cesses that lead parents to develop instruments for identifying signs that their child is a victim of bul-
lying, and how the information material could trigger those processes. For instance, the information
material could have described a simple tallying-heuristics that support parents’ decision-making,
consisting of a sparse number of signs and the rule “if more than two signs are observed, consider
the risk as high, otherwise low.” The information material could have also been structured in a
way that produced a high sense of identification among parents in high-risk contexts. Here, the
decision-making heuristics and the sense of identification are the activities that allow the effect of
the program to travel along its parts (information materials → parents → children).
Schmitt and Beach (2015) build their claim on Machamer, Darden, and Craven’s (MDC) founda-
tional work on mechanisms (2000). Nowadays the MDC account is considered one of the seminal
works in the so-called new mechanism-turn in the philosophy of science. The MDC definition of a
mechanism is: “Mechanisms are entities and activities organized such that they are productive of
regular changes from start or set-up to finish or termination conditions.” (Machamer et al., 2000,
p. 3). This account of mechanism has the main reward of introducing entities and activities as defin-
ing components, but it is not well-suited to TBE. The requirement of “regular changes” existing
“from start or set-up to finish or termination conditions” is simply too strong for many program eval-
uations. Many social interventions are unable to produce regular changes—consider, for instance,
educational interventions—so policymakers will often be content with irregular ones. At the same
time, although it might be quite reasonable to expect a program to have definite start conditions, it
is common for social interventions to have indefinite termination conditions. Interventions that act
on social norms have indefinite termination conditions, as they will let social collectives take over
the intervention and internalize it as their own process, thereby using it indefinitely.
More recently, Illari and Williamson (2012) have discussed a concept of mechanism across the
science that modifies the MDC account to drop these conditions. According to them: “A mechanism
for a phenomenon consists of entities and activities organized in such a way that they are responsible
for the phenomenon” (p. 120). Illari and Williamson’s theory provides a looser conceptualization of
mechanisms as any productive arrangement of entities and activities, having more or less regular out-
comes, and more or less definite starting and ending points. We contend that this is a more suitable
conceptualization of the concept of mechanism for TBE.
We therefore have two claims that together account for the nature of mechanisms in TBE. These
state that, in the context of TBE, the term mechanism should be understood as:

• A set of parts of a program that are counterfactually related to one another in the production of
a targeted effect, where the structure of counterfactual relations is itself a contributing factor
(causal pathway claim), and
8 American Journal of Evaluation 0(0)

• A set of entities and activities involved in the program and organized in such a way that they
are responsible for the program outcome (entities-and-activities claim).

Beach (2016) has discussed two basic understandings of mechanisms (the counterfactual-based view
and the system view) that cohere with our two claims above. Beach discusses these as contrasting
views, especially from a methodological point of view. We proposed that from a conceptual point
of view—that is, concerning how we understand the concept of a mechanism in TBE—the two
views are complementary. As we mentioned above, our conceptualization provides a proposal for
a normative standard for what evaluators should mean when they use the term mechanism. It can
be the case that several approaches to TBE conceive mechanisms in a different way or only in
terms of one of the claims above.
Finally, our conceptual proposal does not, per se, entail any set of methodological guiding prin-
ciples. The problem of deriving such guidance from our concept of mechanism is discussed in the
next sections.

Integration in TBE
In this paper, we are interested in the rewards of integration, and especially of data and theory inte-
gration, for TBE. The concept of integration refers to the way in which different elements are com-
bined, merged, or conjoined in order to achieve some epistemic goal. Integration has been and still is
a main—but not the only—methodological principle of mixed methods (e.g., Bazeley, 2012, 2017;
Bazeley & Kemp, 2012; Cronin, Alexander, Fielding, & Moran-Ellis, 2008; Fetters, Curry, &
Creswell, 2013; Fetters & Freshwater, 2015; Fetters & Molina-Azorin, 2017; Moran-Ellis et al.,
2006; Moseholm & Fetters, 2017), and of multi-methods/process-tracing research (Beach, 2020;
Beach & Brun Pedersen, 2013; Goertz, 2017; Goertz & Mahoney, 2012; Humphreys & Jacobs,
2015; Rohlfing, 2012; Seawright, 2016; Weller & Barnes, 2014). In the vast literature on integrative
methodologies, the integrands can be data, theories, designs, and more. Later in the text, we focus
only on data and theory integration, for the sake of simplicity and space, and because these two
forms of integration are simple, intuitive, and commonly used in mixed and multi-method research.
In simple terms, data integration entails integrating data sets, whereas theory integration entails inte-
grating models of the target phenomena. In this section, we set the stage for the later discussion on
data and theory integration in TBE, by showcasing some examples of claims that relate TBE to inte-
gration (now more generally understood) and that may be in some way connected to the idea of a
mechanism. The point here is to look at how proponents of TBE have discussed integrative
methodologies.
The earliest contribution to this issue that we could trace is found in Chen and Rossi (1989). Here,
the authors discuss the appropriateness of randomized controlled trials (RCTs) for theory-based eval-
uation and claim that “Close attention to the modelling of program processes can improve the power
of randomized experiments” (p. 304). This claim does not mention the term integration explicitly, but
clearly suggests that approaches that are capable of constructing models of program processes—that
is, theory-generating methodologies, such as process-tracing or grounded theory—can be used
together with randomized controlled experiments in order to improve the validity of evaluation.
Hence, Chen and Rossi (1989) seem to suggest that TBE can benefit from the integration of more
specified models to effect measurements. The connection to the issue of mechanism is also not
explicit here, unless the focus on program processes is understood as program activities as suggested
in the previous section. Even in this case, Chen and Rossi claim here that a theory of processes
improves the conclusions of a randomized experiment, and not that integration helps in identifying
mechanisms. Mechanisms (if processes are mechanisms) are tools for more valid conclusions and not
the primary epistemic goal. The issue is avoiding the potential threats to validity that are typically
Matta et al. 9

attributed to randomized trials, and mechanisms (or processes) might be a way of overcoming these
threats. Similar consideration is also found in Chen (1989) and Chen (2006). Chen and Rossi (1989)
also focus on “mixed methods,” without clearly discussing any specific integration. Here, the rewards
of using mixed methods in TBE are not directly connected with the idea of mechanisms but consist in
the possibility of either collecting data about different parts of the program, or in supporting different
kinds of validity appraisals.
In a later contribution, Chen (1997) argues that, although mixed methods should not be considered
as a dominant methodological paradigm for theory-based evaluation, it entails some clear rewards,
especially in the case of complex programs. These rewards are the possibility of using a specific
method for each part/level of the program, and the capacity of establishing a trade-off between inter-
nal and external validity. Here, too, integration provides data about the different parts of a program
and supports validity appraisals. Similar to the examples above, the focus is not explicitly on mech-
anisms. The main idea here seems to be that mixed methods can be used for the integration of claims
(claims about parts of a program, or claims about the scope of an evaluation) and that evaluations
require making and being able to justify different claims.
White (2008) seems to be the first to make a direct connection between the issue of integration and
the mechanistic goal of TBE:

The current benchmark for valid impact estimates is that the study has a credible counterfactual, which
means that it addresses the issue of selection bias where this is likely to be an issue […]. [However,]
just knowing if an intervention worked or not is not usually sufficient, we also want some idea of
why, how, and at what cost. And knowing if it worked, without knowing the context within which it
did so, limits the scope for generalisation and lesson learning. Answering the ‘why’ question is where
qualitative methods come in. (pp. 98–99)

In contrast to the examples above, White connects integration to the epistemic aim of program
evaluation, and not to the validity of the evaluative conclusions. The claim he presents is that inte-
gration is necessary for achieving the epistemic goal of answering a “why” or a “how” question.
White discusses several examples of integration involving data, designs, methods, and theories.
According to him, the answer to “why” or “how” questions are program theories, that is, theories
of mechanisms, and integration is necessary for the content of program theories. A further
element is worthy of remark in White’s discussion: he seems to connect quantitative methods to
the counterfactual element of program theories, while qualitative methods seem to be connected to
the theoretical content of these theories. This seems to suggest that the issue of integration is directly
connected with the two claims about the nature of mechanisms we presented above. We return to this
issue later in this section.
Killoran and Kelly (2010) have also argued that the combination of an account of a process (pro-
vided using qualitative methods) and an account of an effect (provided using quantitative methods) is
appropriate to the theoretical goals of TBE:

The [objective of realist evaluation] is, of course, to understand ‘why’ the programme works and this
involves close understanding of the programme mechanism and its precise utility and appeal in the dif-
ferent quarters of an intervention. […] Some broad [methodological] principles may be established, the
most fundamental of which is that theory-driven analysis demands a multi-method evidence base. […]
How does one trace [the basic and active ingredients of social change]? Well, data on process is generated,
broadly speaking, using qualitative methods; outputs and outcomes are measured via quantitative
approaches; contextual information requires comparative observation and measurement. Testing any pro-
gramme theory requires the conjunction or triangulation of all three. (pp. 51–52)
10 American Journal of Evaluation 0(0)

Killoran and Kelly are in line with White (2008) with their focus on the content of program the-
ories; they argue that answering a “why” question—that is, describing a mechanism—requires
knowledge of processes and of outputs/outcomes. As these forms of knowledge require different
methods, answering a “why” question requires the “conjunction,” or, in our words, integration, of
both. This short showcase of examples is not meant to be complete but provides an illustration of
how integration has been discussed in the literature on TBE and shows some instances in which inte-
gration has been directly connected to the concept of mechanism.
White’s (2008) and Killoran and Kelly’s (2010) suggestions provide us with a vantage point for
formulating a question, which will inform our discussion in the remainder of this paper. If, as it seems
to be suggested, integration is necessary to the TBE goal of accounting for mechanisms, are different
forms of integrations (especially data and theory integration) differently connected to the two aspects
of the nature of the mechanism described by the causal pathway claim and the entities-and-activities
claim?
In the literature on TBE, we could not find a clear discussion about how different forms of inte-
grations—and particularly data and theory integration—contribute to theorizing and assessing claims
about program mechanisms.1 In simple terms, it is not clear what mechanistic rewards result from
theory and data integration. The discussion of this issue is our main contribution to the field of
TBE, and the focus of the remainder of this paper. The discussion will proceed in the next sections
in the following way. We consider data integration and theory integration separately. For each form
of integration, we first present a reconstruction of the general rationale, and then discuss the connec-
tion between integration and the two claims about the nature of mechanisms we argued for: the causal
pathway claim and the entities-and-activities claim.
In the following sections, we support our claims using the case of a research project involving
a national-scale PD program for mathematics teachers. For this reason, it can be helpful to
provide the reader with some context about our case before discussing the different integration
strategies.

The Case of a Research Project Studying Large-Scale Teacher


Professional Development
The authors of this paper are currently involved in a research project conceptualizing and evalu-
ating a national-scale, state-coordinated PD program (Boost for Mathematics—BfM) for mathe-
matics teachers in Sweden. The overarching goal of BfM is to improve students’ mathematics
achievement by strengthening mathematics teaching. To facilitate this development, nearly
80% of all elementary school mathematics teachers in Sweden have participated in the year-long
PD program.2 One year of PD includes 16 rounds, consisting of four sessions each, in which
teachers: (A) individually study PD materials provided on a digital platform3; (B) meet with
their colleagues to plan for an activity (e.g., a lesson to conduct); (C) carry out the activity
with the class they normally teach; and (D) meet with their colleagues again to discuss experi-
ences gained from the conducted activity.
The aim of our research project is to characterize and examine relationships among different fea-
tures of this program, such as PD materials, collegial discussions, teachers’ beliefs and knowledge,
classroom instruction, and student achievement. Several data sets were collected, including the PD
materials (Session A), video recordings of collegial discussions (Sessions B and D) and classroom
lessons (Session C), interviews with the teachers (both before and after the lessons), and student
results on mathematics tests. The volume of data collected is quite extensive with about 160 video
recordings of collegial discussions, 180 video recordings of classroom lessons, and 200 recordings
of interviews. In this article, we make use of two of the project’s studies to illustrate our discussion
of data integration and theory integration with a practical case.
Matta et al. 11

Data Integration
According to a popular characterization of data integration (Caracelli & Greene, 1993, p. 197): “One
means by which qualitative and quantitative data can be integrated during analysis is to transform one
data type into the other to allow for statistical or thematic analysis of both data types together.”
Let us spell out data integration in further detail. We consider the situation in which several dif-
ferent data sets have been collected (e.g., structured and unstructured data). Each data set is assumed
to represent a part of the program. For instance, in our case, the targeted program is BfM, and the
involved teachers’ knowledge of the subject matter of mathematics is a property of the teachers, a
part of the program which is assumed to be relevant for the effect of the intervention. Such assump-
tions are common approaches to TBE that start out by formulating a crude program theory, describing
the essential contributing parts of the program, and collecting data with the purpose of refining the
crude theory into a specified program theory. In some cases, the crude program theory is a part of
the program, as stakeholders will plausibly organize program interventions to include its specific
part because of a set of theoretical assumptions. This is the case of BfM, which is a complex inter-
vention with complex theoretical planning behind it.
In the literature on mixed methods (e.g., Bazeley, 2017; Hesse-Biber & Johnson, 2015), data inte-
gration is typically based on the categorization of unstructured data as a means of comparison between
structured and unstructured data. Unstructured data (interview transcripts, ethnographic notes, or video
recordings) are coded, and the codes are integrated into the structured data as further variables. In our
project, we employed several types of coding. We categorized the lessons using the UTeach
Observation Protocol (UTOP; Walkington & Marder, 2018), in which each lesson is categorized
using 28 indicators organized in four sections (Classroom Environment, Lesson Structure,
Implementation, and Mathematics Content) and thereafter translated by observers to scores on a five-
point Likert scale. The CMs were categorized according to one of the frameworks developed in
Lindvall et al. (2018), in which the materials are classified according to one of five categories regarding
their content focus (Teachers’ and students’ content knowledge, Didactics, Teacher actions, Lesson
design, and Reflections on own learning and practice). Further, we categorized the collegial discussions
using the framework used in Steenbrugge et al. (2018), in which each collegial discussion is described
in terms of the agency of the teachers participating in it and thereafter classified as low or high.
This allowed us to integrate the UTOP data with the CM codes and the collegial discussions codes
into an integrated data set in which every row is a lesson, and the columns represent the UTOP items,
the connected CM codes, and the preceding collegial discussions codes. To each row, we added
further contextual data, such as teacher ID, school characteristics, etc.
This integrated data set is a data model (Harris, 2003) of BfM, which is an artifact resulting from
the theoretical re-description of the collected raw data materials. In this data model of the program,
each part (corresponding to an element of the crude program theory) is represented as a set of codes or
numerical variables. We move on now to the discussion of the mechanistic rewards of creating this
integrated data model.

Data Integration and Mechanism


The first clear mechanistic reward of an integrated data model is access to information about the parts
of the program. As we saw, a main feature of mechanisms is that they consist of different parts.
A mechanistic program theory should—minimally—specify these parts, and an integrated data
model provides exactly this specification. In our project, the crude program theory entails that the
focus of CMs affects teachers’ agency and participation in collegial discussions, which in turn
affects the UTOP scoring for the lessons. Data about all these program parts is required to specify
the content focus of the different CMs, and the teachers’ agency and participation in the collegial
discussions and the UTOP scorings for the teachers’ lessons.
12 American Journal of Evaluation 0(0)

The second reward is the possibility of comparing the variables in the integrated data set, allowing
for the study of relationships among the parts of the program. Comparison between variables can
result in a set of coefficients B = {β1,…, βn} describing the targeted relations between the elements
of P. Examples of such coefficients could be estimated parameters of multivariate statistical models,
such as mediation models, path or hierarchical models. By applying estimation methods to the inte-
grated data set, the resulting parametrized multivariate model is itself a further data model, as it
describes relationships between the data. These relational data models have the further advantage
of enabling a graphical representation of the path that connects the different parts of the program
to the targeted outcome.
Here comes a first complication. Even if integrated data sets are often used to estimate the param-
eters of a graphical model, they are very seldom used to estimate the structure of the graphical model
itself. When an integrated data set is constructed, it allows for all possible relationships between its
variables. If we have a preconception of how the parts of the program are causally arranged, then we
select a subset of all possible relations, and we can use an integrated data set to estimate the param-
eters of the model, but it is very uncommon to use the integrated data set to understand how the parts
of the program are causally arranged. The key issue here (one to which we will return on more occa-
sions in the paper) is that the causal structural arrangement of the parts in a program must be a part of
the background theory that motivates the different data collection and cannot be a product of statis-
tical analysis. In short, the path leading to connecting the intervention to its outcome is not a product
of the data integration, but rather a condition for it. In our crude theory, we assume participation and
agency to depend causally on the CM content focus. Our integrated data model would allow us to
study the converse causal arrow, but our background theory rules this out as irrelevant.
A further major issue concerns the interpretation of the parameters of the graphical model.
Following our proposed normative theory of mechanism, to interpret a graphical data model as
the observable consequence of an underlying mechanism, the relations depicted in the graphical
model must be counterfactual. This interpretation is however another issue that does not depend
on data integration, but is rather determined by theoretical and modeling assumptions. Several
approaches to statistical data analysis focus on counterfactuals, such as RCT, structural causal
models (SCMs), difference-in-difference (DiD), instrumental variables (IV), and propensity score
approaches. However, in order to interpret the data as supporting counterfactual claims, all of
these approaches require making theoretical assumptions, such as the randomized allocation into
test and control groups in RCTs, the assumption of a causal theory in SCMs, the ability to hypoth-
esize about confounders in IV, or the assumption of parallel trends in different groups in DiD. The
more assumptions we can satisfy, the stronger the support that a counterfactual theory gets from the
graphical data model. In simpler terms, graphical models of data do not always represent networks of
counterfactuals, but can, under specific conditions, support counterfactual theories. When data sets
are integrated, the capacity of constructing or refining a mechanistic theory depends on the successful
integration of the background theories for each original data set, and not on the integration of the data
itself.
A further problem arises when the background theory is “not enough.” Consider again the inte-
grated data set from our project. Our crude program theory states that:

(CM content focus) → (Agency and participation in discussions) → (UTOP dimensions)

However, this model is not specific enough to be tested against our data. In Figure 1, we depict all
possible causal arrows (C1–C10) connecting the parts of the PD program, once we assume our back-
ground theories.
Our background theory provided us with a detailed framework for the description of lessons, CMs,
and collegial discussions. However, neither this background theory nor our crude program theory is
Matta et al. 13

Figure 1. A partial, hypothetical path model of BfM. The rectangles are characterizations of CMs, part of the
framework discussed in Lindvall et al. (2018). The rhomboids are characterizations of collegial discussions, part
of the framework discussed in Steenbrugge et al. (2018). The final square on the right side of the diagram is a
dimension of the UTOP framework, used to characterize lessons.

sufficient to know which set of arrows should be included in our refined program theory. Our crude
program theory entails that some aspects can be related; however, we do not have a detailed back-
ground theory for which relationships between the specific dimensions in Figure 1 are relevant,
because our background theory concerns each part separately. Also, our crude program theory
rests mainly on the temporal succession of the parts of the program. Without a background theory
that selects a subset of arrows among C1–C10, we cannot exclude the possibility that some relation-
ships between different parts of the PD program are only apparent.
These issues show the importance of theory in the study of mechanisms. Both the structural and
the counterfactual elements of mechanisms described in the causal pathway claim put important
requirements on background theory and cannot be derived from the simple act of data integration.
Therefore, as long as mechanisms are minimally understood according to the causal pathway
claim, the construction of a mechanistic program theory depends on a background theoretical frame-
work and is not a product of data integration. In simple terms: mechanistic analysis requires mech-
anistic theories.
Moreover, the issue of the missing background theory has consequences for our normative theory
of mechanisms, indicating that the causal path claim and the entities-and-activities claim are equally
important for understanding mechanisms. When background theory is not sufficient, the way of
expanding it and completing the theoretical gaps is to theorize about the nature of the connection
between the parts of the program, that is, to hypothesize the activities that connect the entities belong-
ing to the different program parts.
14 American Journal of Evaluation 0(0)

A similar suggestion is found in the literature about multi-method research. In this field, one of the
main claims is that the study of mechanisms requires integrating between-case and within-case
studies. Between-case studies can be used to fit integrated data sets into available causal theories,
and thereby estimate the contribution of each part of a program. However, whenever the initial
theory is insufficient, within-case studies can be employed to study the nature of connections, that
is, the activities, between the parts or entities of the target phenomenon that is studied.
Within-case studies generate emergent theories that are then used to fill the gaps in the background
theory. This is the case of our project, in which the lack of a global mechanistic program made us opt
for a more in-depth study of the relationships between the parts of the PD program. For instance,
Steenbrugge et al. (2018) investigate the relationship between meaning potentials in CMs and
meaning negotiations in collegial discussions. A further case study (Insulander et al., 2019) investi-
gates the relationship between how teacher agency is constructed in CMs and how the agency is con-
stituted in collegial discussions.
Therefore, although data integration can be used to construct and refine mechanistic program the-
ories, it relies heavily on background theory for the satisfaction of the two main characteristics of
mechanisms, namely, structure and counterfactual relations. Furthermore, when background
theory cannot specify all relevant relationships, the analysis of integrated data models can be insuf-
ficient. Finally, in the cases when background theory is not sufficient, it seems that the construction
and refinement of a mechanistic program theory requires an understanding of the entities and activ-
ities involved in the program, which indicates that (as we have argued) the causal pathway condition
does not sufficiently capture the nature of mechanisms.

Theory-Driven Approaches to Integrated Data Sets


The limitations discussed in the last section seem to indicate that theory-based evaluation, and espe-
cially the construction and refinement of mechanistic program theories, cannot be fully accomplished
by means of data-driven methods.4 This is plausibly the motivating idea of theory-based evaluation:
evaluation requires theory. In this section, we discuss whether theory-driven methods can be more
helpful in the construction and refinement of a mechanistic program theory.
Theory-driven approaches, such as qualitative methods and process-tracing methods, analyze data
by looking at the relationships between concepts—and how these concepts are realized in the
observed settings—rather than between observations. More specifically, researchers use theoretical
resources to build or test a theoretical model of the specific ways and forms in which observed
events seem to co-occur. This process rests on theory-informed inferences where the connections
between two events are either—in the case of theory-generating methods, analyzed by putting
forward explanatory facts and successively evaluating the plausibility of those facts, or—in the
case of theory-testing approaches, by evaluating a theoretical model of the connection of the two
events via the derivation of its observable consequences.
Let us concretize our discussion with an example of qualitative analysis from our case:
Steenbrugge et al. (2018). This study focuses on how CMs of BfM can support teachers’ collective
learning. The data used for this study are CMs (Session A) from one round and the transcriptions of
two collegial discussions (Sessions B and D). The hypothesis is that the CMs might influence the
possibilities for teachers’ collective learning during the collegial discussion sessions.
CM texts and discussion transcriptions are categorized using coding manuals. The textual differ-
ences between the texts are conceptualized using a further theoretical framework, developed by
Kennedy (2016). This framework characterizes PD programs in terms of “approaches by which
the programs aim to facilitate the enactment of new ideas: through (1) prescription, (2) strategies,
(3) insight, or (4) presenting a body of knowledge” (Steenbrugge et al., 2018, p. 170). Therefore,
each CM text is categorized according to Kennedy’s classification by looking at its main approach,
Matta et al. 15

and each categorization describes a type of text. The collegial discussions are analyzed in terms of
their meaning of negotiations (Wenger, 2000; Wenger et al., 2002), and the authors examine the
extent to which the participants in the collegial discussions were involved in activities that
allowed them to negotiate the meaning of the concept involved in the CMs (and thereby made col-
lective learning possible). The meaning of negotiations is described using two dichotomous cate-
gorizations: the teachers’ participation in the discussions (categorized as high or low) and the
reification of central ideas (categorized as present or scarce). The second categorization, reifica-
tion, has the function of describing the extent to which the group makes the subject of the CMs
part of the teachers’ own conceptual repertoire.
The authors observed an interesting pattern in the findings, in which the texts characterized by a
“prescriptions and strategies” approach to the enactment of the central ideas led to high participation
and reification. In contrast, the content of the CMs characterized by an “insights and
body-of-knowledge” approach led to low participation and reification. The theoretical connection
between these categories is—according to the authors—that enacting the central ideas of a CM
text using a “prescriptions and strategies” approach facilitates teachers’ engagement, as it is often
supplemented with concrete instructions on what to do. On the contrary, the “insights and
body-of-knowledge” approach is not implemented with instructions, in the same way, making teach-
ers’ conceptualizations more difficult. In this example, the theory-driven approach identifies an oper-
ational relationship between differences in textual function (particularly a contrast between two
strategies) and differences in meaning negotiations, and specifies the nature of this difference,
putting forward a possible mechanism that connects the two events. The aim of theory-driven
approach methods is ultimately that of building and/or refining models of entities and activities.
The explanation provided by Steenbrugge et al. (2018) aims to identify the entities (meaning poten-
tialities, meaning negotiations) and the activities connecting them (the facilitation of teachers’
engagement deriving from concrete application/instructions). Therefore, data integration followed
by thematic analysis seems to facilitate the satisfaction of the entities-and-activities claim about
mechanisms.
The result of the theory-driven analysis can be used to construct a mechanistic theory if mecha-
nisms are understood as in the entities-and-activities claim. The analysis reveals the entities and activ-
ities that connect the parts of the program together in a way that (assuming that the meaning
negotiations can be traced in the same way to the UTOP scorings) explains the program outcome.
In this case, it is indeed the integration of parts that sets the stage for a mechanistic theory. By
connecting the parts of the program (CMs and collegial discussions) by means of theory-driven
approaches, we get a picture of the program mechanism. However, the mechanistic reward is pro-
vided by the integration of theoretical elements and not by data integration. In the example above,
the emerging theory is the result of the integration of theories of the curriculum materials and of
the collegial discussions with a theory of engagement facilitation (more applicable elements and con-
crete instructions are easier to manipulate by teachers and facilitate engagement). The theories are
integrated by means of the explanatory connection provided by the engagement facilitation, rather
than by the observable co-occurrence of certain events.
It is interesting, at this point, to wonder if the explanatory account that results from theory-driven
analysis describes a mechanism in the sense of the causal pathway claim. This is a question of the
status of emergent theories. According to a still common view among interpretivist researchers
(Lincoln & Guba, 1985), the theories that emerge from the application of interpretive qualitative
methods should not be interpreted counterfactually. Rios (2004) argues, for instance, that the theories
of interpretive sociology do not track mechanisms, because of their lack of counterfactual scope.
According to both the influential account of Lincoln and Guba, and Riós, the resulting theories
put events in a context, but only in descriptive terms. These theories do not include, in other
words, any claim about what would have resulted if the context had been different. In contrast to
16 American Journal of Evaluation 0(0)

this standpoint, other methodological approaches to qualitative methods—for example, in the field of
process tracing—have provided compelling cases for the claim that theory-driven methods can
indeed support counterfactual theories.
A very well-discussed example is Mahoney’s (2015) use of process tracing in historical research. In
this paper, Mahoney discusses a specific approach to process tracing which he calls counterfactual anal-
ysis. This is a method for theory construction and not specifically for theory testing, meaning that the
method can be used when the initial causal theory is not fully specified and questions about what mech-
anism causally connects X and Y still remain. In these cases, Mahoney suggests identifying the counter-
factual statements that connect the known factors in the phenomenon of interest (which in Mahoney’s
example are two historical events—the assassination of Archduke Franz Ferdinand and the starting of
the First World War—and which in TBE are the implementation of a program and its outcome).
The basis for this claim can be found in Woodward’s theory (2005) and has been discussed in
relation to process tracing by Runhardt (2015, 2020). The context considered by Mahoney (2015)
is any case in which, as in the case of historical research, it is impossible to introduce a concrete inter-
vention to assess a causal claim, and therefore it is necessary to consider ideal intervening factors in
the form of counterfactual claims that are assessed as proxies for the concrete interventions. Mahoney
discusses a methodology for this operation which consists of using background theories to look for
possible counterfactual connections whenever the theory is unspecified and then assessing if there is
evidence for this connection. The selection of a counterfactual should satisfy what Mahoney calls the
“minimal rewrite rule,” meaning that the counterfactual connecting X and Y should be such that it
describes the smallest intermediate event between X and Y sufficient to explain the causal connection
Z → Y. This clearly requires using analogical reasoning to similar cases, as this hypothetical coun-
terfactual is, by assumption, not part of the theory of the target phenomenon. Once a counterfactual
Z → Y or X → Z is selected via analogy and is judged to satisfy the minimal rewrite rule, the process
tracer should assess the available evidence in favor of or against the counterfactual. This requires
deriving observational consequences from the counterfactual claim that can be used to construct a
test. These observational consequences should clarify how an idealized intervention on Z (or on
X) could generate a change in Y (or on Z). The severity of the test resulting from the available evi-
dence either in favor of or against the counterfactual determines the level of support that the data pro-
vided to it.5
In our case above, the authors hypothesize a causal relationship between “prescriptions and
strategies”-oriented materials (X) and participation in the collegial discussion (Y), mediated by the
ease of engagement that concrete instruction can entail (Z). The counterfactuals identified here
through analogical reasoning (similar educational contexts exhibit these counterfactuals) are X → Z
and Z → Y. These seem to satisfy the minimal rewrite rule, as they account for a possible change
in Y and Z without the need for any other change in the context. If an intervention on Z resulted in
hindering teachers’ engagement, with all else being the same, participation would decline. These coun-
terfactuals are supported in our study by comparing how different collegial discussions developed.
Collegial discussions with lower participation/reification are shown to depend on teachers’ difficulty
in conceptualizing curriculum materials that are too abstract (“body of knowledge”-oriented).6
Therefore, theory-driven approaches can be applied to integrated data sets to theorize about mech-
anisms, even when the term is intended counterfactually as in the causal pathway claim. In the same
way as above, the capacity of process tracing and qualitative methods of theorizing about mecha-
nisms (in the causal pathway sense) depends mainly on the use of theories for making inferences
and much less on the integration of data. The possibility of constructing a counterfactual path
X → Z → Y in our example depends on using background theories and analogical reasoning to
assess the satisfaction of the minimal rewrite rule and to derive the observational consequences nec-
essary to trace each step of the path. Integrating data is only a precondition, in this case necessary, but
the leverage of the mechanistic theory requires integrating theoretical claims.
Matta et al. 17

In both cases of data-driven and theory-driven analysis, the chances of theorizing about program
mechanisms do not depend on the way data is integrated, but rather on the possibility of constructing
a theory (or theories) that is sufficiently detailed to explain how the different parts together form a
mechanism. Therefore, the discussion in this section allows for a relevant conclusion: data integration
is not the main prima facie driver for the construction and refinement of mechanistic program theory.
In the next section, we will discuss methodologies that focus more specifically on theory
integration.

Theory Integration
The main aim behind theory integration is to combine two or more models of phenomena (as opposed
to data models, as in the case of data integration). Models of phenomena are commonly called theories.
We use the term “theory integration” to encompass various terms that are used in the literature about
mixed methods, such as integrative analysis or data analysis integration (Bazeley, 2017), interpreta-
tion integration (Fetters et al., 2013), and results integration (Schieber et al., 2017). We prefer to use
the term “theory” to convey the idea that what is integrated is an epistemic representation of the target
phenomenon. In simple terms, we put together what we know about the program. In what follows, we
provide a general reconstruction of theory integration and discuss its mechanistic rewards.

Theory Integration and Mechanism


One of the studies generated within our research group (Lindvall et al., 2018) is an instructive
example of how theory integration suits the aim of TBE. The aim of the study is to evaluate the
impact of two PD programs (one of which is BfM) on student achievement, whereby two sets of
data were collected and analyzed. Firstly, the CMs for the two PD programs were analyzed.
Secondly, student results were collected on a mathematics test taken annually in the municipality
where the study was conducted. The test results were collected for three groups of students: those
whose teachers participated in (a) BfM, (b) the other PD program, and (c) no PD program.
The study involves two theories. The first is a hypothesis about the difference in student performance
between the three groups. The main parameter is the difference in performance, which is expressed with
a determination coefficient. The second theory is a model of the CMs used in the two interventions. This
model categorizes the CMs along two dimensions: content focus (the entities) and methods for facilitat-
ing enactment (the activities). As for the former, each CM text is categorized into one of five categories
describing the material’s main content focus (see above section Integration in TBE). As for the latter,
each material is categorized using one of Kennedy’s (2016) previously mentioned categories.
The main reason for using these two theories is that the two PD programs are, according to back-
ground theory (Desimone, 2009b), very alike concerning features other than those captured by the
two theories. Yet, the effect hypothesis is supported by the student data, which seems to indicate that
they differ in effect for some grades. The categorical model is employed to explain the differences in
effect size. According to the authors, the two programs differ in both content focus and methods for facil-
itating enactment, and these differences describe the mechanism underlying the group differences in
effect size. The two models are integrated via their overlapping parts, since the CMs are a constitutive
part of the PD program. This is therefore an example in which the effect size alone does provide evidence
that one of the PD programs has a greater effect, but at the same time cannot answer the question, “why
does one of the PD programs have a greater effect?” Instead, categorization is a way of answering this
question by describing the difference between the programs.
Shifting to our mechanism terminology, the effect-size hypothesis describes a causal relationship
between the two PD programs and student performance. Issues concerning the background theory
(the two programs are very similar, which excludes a set of relevant confounders) and data collection,
18 American Journal of Evaluation 0(0)

make the counterfactual interpretation of the effect measured in this natural experiment quite plau-
sible. However, in itself, the effect theory does not describe any mechanism. Once the two theories
are integrated then a mechanism is specified, and this mechanism seems to cohere with both norma-
tive mechanism claims. The integrated theory describes a pathway (CM with a certain content focus
and method for facilitating enactment → teacher knowledge → teaching → student results) and each
of these arrows can be described as an intervening factor with a differential effect (all the edges
except one are compared with contrasts). Moreover, the theory specifies which entities and activities
explain the outcome. Hence, the mechanism of this program theory is a mechanism according to both
the causal pathway and the entities-and-activities claim. Most importantly, this mechanism is the
product of theory integration. The example of theory integration we describe here coheres with
what the literature on multi-method research describes as integration between studies that are
between-cases (the effect study) and within-cases (the theory of CMs). Our contribution consists
in highlighting the pervasive role of theory in mechanism-oriented integration.
If we put this together with the considerations drawn in the previous section, we obtain our main
conclusion: our concept of mechanism entails that TBE is not only theory-oriented (i.e., a theory is
the main epistemic aim), but also theory-driven (theoretical approaches are the main ways of achiev-
ing the aims of TBE). Theory integration appears to be an important driver of theoretical elements in
the process of constructing a mechanistic theory. We conclude this section by discussing some pos-
sible complications of this process.
First of all, theory integration can be affected by problems regarding background theory that are
similar to those affecting data integration. In our case, the overlap between the two models given by
CMs is a sufficient point of contact for model combination. However, sometimes models will not
share any point of contact, which will require an ad hoc bridge (see Figure 2). This bridge might
sometimes be inherited by some background theory, but in many cases, such a theoretical resource
will not be available. In such cases, the situation will be the same as that in data integration. It may be

Figure 2. A diagrammatic representation of theory integration. Source: Models I and II are integrated into
Model III. As background theory is insufficient for model coupling, an ad hoc bridge (the dashed arrow in the
diagram) is necessary for integration.
Matta et al. 19

that the specific case will allow for the development of an ad hoc bridge theory through a within-case
study (in the same way as we did in our project), but this is not necessarily always the case.
The second problematic issue concerns scope. Usually, if data integration is possible, then there is
no problem with theories having different scopes (more or less general), since data integration
requires that the different data subsets cohere in some fashion at the unit level. In the case of
theory integration, this is not a guarantee. With enough goodwill and patience, models that greatly
differ in scope can be integrated with one another. But, if one of the models is based on an empirical
base that is “too small,” it is at least questionable whether the two models have the same strength.
Such a problem might emerge, for instance, if we attempt to integrate the model described in
Lindvall et al. (2018) with the one in Steenbrugge et al. (2018). These two models are significantly
different in size, which might imply that the scope of their main claims would also differ. Therefore,
integrating theories entails integrating claims: if the claims are not easily integrated, the theories will
not be either.

Conclusion
We have used conceptual analysis in this paper to discuss how data and theory integration can con-
tribute to the construction and refinement of a mechanistic program theory. Our conclusions can be
summarized in this way:

1. Mechanisms should be understood in TBE as both networks of counterfactual relations and


networks of entities and activities. Both views are necessary for the goals of TBE.
2. The mechanistic contribution of data integration is mainly to provide information about the
different parts of a program and to enable comparisons between these parts. However, inte-
gration of data does not in itself provide knowledge about mechanisms.
3. The analysis of integrated data sets can result in the construction or refinement of a mecha-
nistic program theory, but the main drivers of this theory are theoretical. Mechanisms are
described either by integrating theories or by applying further theoretical resources and
theory-driven approaches to fill the gaps and build bridges in the initial program theory.
4. Theory integration seems therefore to be one of the main drivers of mechanistic program the-
ories, along with the focus on specifying the entities and activities involved in contributing to
the program outcome. This focus on theoretical specification is pervasive in both data integra-
tion and theory integration.
5. It has been suggested that the causal pathway claim exhausts the concept of mechanism,
making the entities-and-activities claim redundant (Marchionni & Reijula, 2019). As a con-
sequence of our discussion in this paper, we conclude that the entities and activities are just as
fundamental, defining parts of all mechanisms.
6. The causal pathway claim has also been criticized as involving a host of methodological prob-
lems (Beach, 2016). Our discussion of theory integration indicates that the analysis of coun-
terfactuals is not an insurmountable obstacle for TBE, even when the data do not allow using
statistical tools.

Our proposed way of understanding mechanisms and our methodological discussion should not be
interpreted as entailing that any methodological approach that does not focus on both the counterfac-
tual and the theoretical dimension of the mechanism is flawed. Many methodological approaches to
TBE will focus on only one of these dimensions and can still be described as focusing on the mech-
anism. Our claim is rather that any conclusions based on only counterfactual analysis or on the spec-
ification of entities and activities that claims to have identified a program’s underlying mechanism
should be considered incomplete.
20 American Journal of Evaluation 0(0)

Our paper contributes to the literature on TBE and on evaluation research in general in several
ways. Firstly, we provide a detailed view of the complexity involved in the epistemic goal of con-
structing mechanistic program theories. Integration is a complex process that contributes to this
goal in various ways. In particular, the difference between data and theory integration and the meth-
odological importance of theory in mixed methods has not received sufficient attention.
Secondly, it has been argued that concrete examples of how TBE can be conducted in practice
(e.g., reports of successes and failures, analytical techniques, evaluation effects) are “seriously
needed in the published literature” (Coryn et al., 2011). In this article, we have provided practical
descriptions of approaches used in a project striving to conduct a TBE of a specific educational inter-
vention: a large-scale teacher PD program.
Thirdly, and as illustrated in our examples and conceptual arguments, TBEs of educational
interventions involve many obstacles to overcome. For example, conducting a TBE using a mixed-
methods design requires expertise in multiple methodologies, and it has been argued that few
scholars have the expertise to implement several different methodologies to a high standard,
whereby it is recommended that “teams of researchers with expertise in different methods, all
working together” be involved when studying educational interventions (Desimone, 2009a,
p. 172). However, as demonstrated in this study, the challenges are related not only to the use of dif-
ferent methodologies but also to how these methodologies can or should be integrated. We stress the
importance of seeing both data integration and theory integration, an issue that seems to have been
neglected in the literature on TBE. In other words, we argue that the proposed research teams need
not only expertise in certain methods for data collection and analysis, but also expertise in strategies
for integrating data as well as theories.

Declaration of Conflicting Interests


The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication
of this article.

Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication
of this article. This work was supported by the Vetenskapsrådet (grant number 2014-2008).

ORCID iD
Jannika Lindvall https://orcid.org/0000-0003-2964-6297

Notes
1. Whereas this is what we have observed about the literature on TBE, it is important to remark that the field of
multi-method and process-tracing research has discussed the relation between integrative methods and mech-
anisms in detail (see the referred literature above). We choose not to include a separate section on this topic
here to avoid this paper becoming too long. Instead, we will return to the literature on multi-method and
process-tracing research in the following sections.
2. The BfM includes PD for teachers in preschool, primary through upper secondary school (Grades 1–12), and
adult education. In the research project, however, emphasis was put on the PD program provided to teachers
in Grades 1–9.
3. These materials are available at https://larportalen.skolverket.se/#/moduler/1-matematik/Grundskola/alla.
4. The field of data science has devoted attention to the issue of acquiring knowledge of causal relation by data-
driven methods. This is the object of some approaches to machine learning, sometimes labelled as causal
learning algorithms (Glymour, Zhang, & Spirtes, 2019). Although the application of these algorithms is
an object of discussion, we cannot for limitation of space discuss this issue in this paper.
Matta et al. 21

5. Mahoney discusses a taxonomy (straw-in-the-wind, hoop, smoking gun and doubly decisive) of tests describ-
ing different levels of severity that are widely used in process-tracing methodology (e.g., Beach & Brun
Pedersen, 2013; Rohlfing, 2012). Some examples: a hoop test is such that passing the test, that is, observing
events consistent with the derived observable consequences, only supports (but does not warrant) the hypoth-
esis, whereas a failed test warrants the negation of the hypothesis. Passing a smoking gun test entails that the
claim is warranted, but failing it only supports (but does not warrant) the negation of the hypothesis.
6. Observing a high level of engagement in teachers’ discussions of abstract material would have warranted
rejecting Z as a mediating mechanism. However, passing the test only weakly supports our theory.
Therefore, the test constructed in this case is a hoop test. Here, it is also important to remark that, although
in our specific case, the possibility of testing the counterfactual connections rely on comparison between dif-
ferent collegial discussions (in process-tracing terms, between-cases analysis), this is not necessarily the case.
As Mahoney (2015) points out, assessing claims by means of between-cases comparisons might not always
be possible in historical research, which instead requires deriving observational consequences that are rele-
vant for the single case that is examined. In our case, we could have looked for observational consequences of
the counterfactual within the same collegial discussion, by, for instance, looking at teachers’ utterances indi-
cating that more concrete materials did in fact made it easier for them to engage in the discussion.

References
Astbury, B., & Leeuw, F. L. (2010). Unpacking black boxes: Mechanisms and theory building in evaluation.
American Journal of Evaluation, 31(3), 363–381. https://doi.org/10.1177/1098214010371972
Bazeley, P. (2012). Integrative analysis strategies for mixed data sources. American Behavioral Scientist, 56(6),
814–828. https://doi.org/10.1177/0002764211426330
Bazeley, P. (2017). Integrating analyses in mixed methods research (1st ed.). London: Sage Publications Ltd.
Bazeley, P., & Kemp, L. (2012). Mosaics, triangles, and DNA: Metaphors for integrated analysis in mixed methods
research. Journal of Mixed Methods Research, 6(1), 55–72. https://doi.org/10.1177/1558689811419514
Beach, D. (2016). What are we actually tracing? Process tracing and the benefits of conceptualizing causal mecha-
nisms as systems. Qualitative & Multi-Method Research, 14(1/2), 15–22. https://doi.org/10.5281/zenodo.823306
Beach, D. (2020). Multi-method research in the social sciences: A review of recent frameworks and a way
forward. Government and Opposition, 55(1), 163–182. https://doi.org/10.1017/gov.2018.53
Beach, D., & Brun Pedersen, R. (2013). Process-tracing methods: Foundations and guidelines. Ann Arbor:
University of Michigan Press.
Caracelli, V. J., & Greene, J. C. (1993). Data analysis strategies for mixed-method evaluation designs.
Educational Evaluation and Policy Analysis, 15(2), 195–207. https://doi.org/10.2307/1164421
Cartwright, N., & Stegenga, J. (2012). A theory of evidence for evidence-based policy. Oxford: Oxford
University Press/The British Academy.
Chen, H. (1997). Applying mixed methods under the framework of theory-driven evaluations. New Directions
for Evaluation, 1997(74), 61–72. https://doi.org/10.1002/ev.1072
Chen, H., & Rossi, P. H. (1989). Issues in the theory-driven perspective. Evaluation and Program Planning,
12(4), 299–306. https://doi.org/10.1016/0149-7189(89)90046-3
Chen, H. T. (1989). The conceptual framework of the theory-driven perspective. Evaluation and Program
Planning, 12(4), 391–396. https://doi.org/10.1016/0149-7189(89)90057-8
Chen, H. T. (2006). A theory-driven evaluation perspective on mixed methods research. Research in the Schools,
13(1), 75–83.
Coryn, C. L. S., Noakes, L. A., Westine, C. D., & Schröter, D. C. (2011). A systematic review of theory-driven
evaluation practice from 1990 to 2009. American Journal of Evaluation, 32(2), 199–226. https://doi.org/10.
1177/1098214010389321
Cronin, A., Alexander, V. D., Fielding, J., & Moran-Ellis, J. (2008). The analytic integration of qualitative data
sources. In P. Alasuutari, L. Bickman, & J. Brannen (Eds.), The SAGE handbook of social research methods
(pp. 572–584). London: Sage Publications Ltd.
22 American Journal of Evaluation 0(0)

Dalkin, S. M., Greenhalgh, J., Jones, D., Cunningham, B., & Lhussier, M. (2015). What’s in a mechanism?
Development of a key concept in realist evaluation. Implementation Science, 10(1), 1–7. https://doi.org/
10.1186/s13012-015-0237-x
Desimone, L. M. (2009a). Complementary methods for policy research. In D. Plank, G. Sykes, & B. Schneider
(Eds.), Handbook of education policy research (pp. 163–175). Abingdon: Routledge.
Desimone, L. M. (2009b). Improving impact studies of teachers’ professional development: Toward better con-
ceptualizations and measures. Educational Researcher, 38(3), 181–199. https://doi.org/10.3102/
0013189X08331140
Fetters, M. D., Curry, L. A., & Creswell, J. W. (2013). Achieving integration in mixed methods
designs-principles and practices. Health Services Research, 48(6 Pt 2), 2134–2156. https://doi.org/10.
1111/1475-6773.12117
Fetters, M. D., & Freshwater, D. (2015). The 1 + 1 = 3 integration challenge. Journal of Mixed Methods
Research, 9(2), 115–117. https://doi.org/10.1177/1558689815581222
Fetters, M. D., & Molina-Azorin, J. F. (2017). The journal of mixed methods research starts a new decade: The
mixed methods research integration trilogy and its dimensions. Journal of Mixed Methods Research, 11(3),
291–307. https://doi.org/10.1177/1558689817714066
Funnell, S. C., & Rogers, P. J. (2011). Purposeful program theory: Effective use of theories of change and logic
models. Jossey-Bass.
Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of causal discovery methods based on graphical models.
Frontiers in Genetics, 10(524), 1–15. https://www.frontiersin.org/article/10.3389/fgene.2019.00524
Goertz, G. (2017). Multimethod research, causal mechanisms, and case studies. Princeton University Press.
Goertz, G., & Mahoney, J. (2012). A tale of two cultures: Qualitative and quantitative research in the social
sciences. Princeton: Princeton University Press.
Harris, T. (2003). Data models and the acquisition and manipulation of data. Philosophy of Science, 70(5), 1508–
1517. https://doi.org/10.1086/377426
Hesse-Biber, S. N., & Johnson, R. B. (2015). The Oxford handbook of multimethod and mixed methods research
inquiry. Oxford: Oxford University Press.
Humphreys, M., & Jacobs, A. M. (2015). Mixing methods: A Bayesian approach. American Political Science
Review, 109(4), 653–673. https://doi.org/10.1017/S0003055415000453
Illari, P. M., & Williamson, J. (2012). What is a mechanism? Thinking about mechanisms across the sciences.
European Journal for Philosophy of Science, 2(1), 119–135. https://doi.org/10.1007/s13194-011-0038-2
Insulander, E., Brehmer, D., & Ryve, A. (2019). Teacher agency in professional development programmes – A
case study of professional development material and collegial discussion. Learning, Culture and Social
Interaction, 23, 1–9. https://doi.org/10.1016/j.lcsi.2019.100330
Kennedy, M. M. (2016). How does professional development improve teaching? Review of Educational
Research, 86(4), 945–980. https://doi.org/10.3102/0034654315626800
Killoran, A., & Kelly, M. P. (2010). Evidence-based public health: Effectiveness and efficiency. Oxford: Oxford
University Press.
Lemire, S., Kwako, A., Nielsen, S. B., Christie, C. A., Donaldson, S. I., & Leeuw, F. L. (2020). What is this thing
called a mechanism? Findings from a review of realist evaluations. New Directions for Evaluation,
2020(167), 73–86. https://doi.org/10.1002/ev.20428
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Beverly Hills: Sage Publications.
Lindvall, J., Helenius, O., & Wiberg, M. (2018). Critical features of professional development programs:
Comparing content focus and impact of two large-scale programs. Teaching and Teacher Education, 70,
121–131. https://doi.org/10.1016/j.tate.2017.11.013
Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about mechanisms. Philosophy of Science, 67(1),
1–25. https://doi.org/10.1086/392759
Mahoney, J. (2015). Process tracing and historical explanation. Security Studies, 24(2), 200–218. https://doi.org/
10.1080/09636412.2015.1036610
Matta et al. 23

Marchionni, C., & Reijula, S. (2019). What is mechanistic evidence, and why do we need it for evidence-based
policy? Studies in History and Philosophy of Science, 73, 54–63. https://doi.org/10.1016/j.shpsa.2018.08.003
Moran-Ellis, J., Alexander, V. D., Cronin, A., Dickinson, M., Fielding, J., Sleney, J., & Thomas, H. (2006).
Triangulation and integration: Processes, claims and implications. Qualitative Research, 6(1), 45–59.
https://doi.org/10.1177/1468794106058870
Moseholm, E., & Fetters, M. D. (2017). Conceptual models to guide integration during analysis in convergent
mixed methods studies. Methodological Innovations, 10(2), 1–11. https://doi.org/10.1177/2059799117703118
Pawson, R., & Tilley, N. (1997). Realistic evaluation. London: Sage.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press.
Rios, D. (2004). Mechanistic explanations in the social sciences. Current Sociology, 52(1), 75–89. https://doi.
org/10.1177/0011392104039315
Rogers, P. J., & Weiss, C. H. (2007). Theory-based evaluation: Reflections ten years on. New Directions for
Evaluation, 2007(114), 63–81. https://doi.org/10.1002/ev.225
Rohlfing, I. (2012). Case studies and causal inference: An integrative framework. Basingstoke: Palgrave
Macmillan.
Runhardt, R. W. (2015). Evidence for causal mechanisms in social science: Recommendations from
Woodward’s manipulability theory of causation. Philosophy of Science, 82(5), 1296–1307. https://doi.org/
10.1086/683679
Runhardt, R. W. (2020). Concrete counterfactual tests for process-tracing. Preprint retrieved 2022 June 15 from
https://doi.org/10.33774/apsa-2020-0vhbb.
Schieber, A.-C., Kelly-Irving, M., Génolini, J.-P., Membrado, M., Tanguy, L., Fabre, C., Marchand, P., & Lang,
T. (2017). Integrating multidisciplinary results to produce new knowledge about the physician–patient rela-
tionship: A methodology applied to the INTERMEDE project. Journal of Mixed Methods Research, 11(2),
174–201. https://doi.org/10.1177/1558689815588643
Schmitt, J. (2020). The causal mechanism claim in evaluation: Does the prophecy fulfill? New Directions for
Evaluation, 2020(167), 11–26. https://doi-org.proxy.lnu.se/10.1002/ev.20421
Schmitt, J., & Beach, D. (2015). The contribution of process tracing to theory-based evaluations of complex aid
instruments. Evaluation, 21(4), 429–447. https://doi.org/10.1177/1356389015607739
Seawright, J. (2016). Multi-method social science: Combining qualitative and quantitative tools. Cambridge:
Cambridge University Press.
Steenbrugge, H. V., Larsson, M., Insulander, E., & Ryve, A. (2018). Curriculum support for teachers’ negoti-
ation of meaning: A collective perspective. In L. Fan, L. Trouche, C. Qi, S. Rezat, & J. Viesnovska (Eds.),
Research on mathematics textbooks and Teachers’ resources (pp. 167–191). Cham: Springer.
Tilley, N. (2000). Realistic evaluation: An overview. Retrieved 2022 June 15 from https://www.researchgate.net/
publication/252160435_Realistic_Evaluation_An_Overview.
Walkington, C., & Marder, M. (2018). Using the UTeach Observation Protocol (UTOP) to understand the
quality of mathematics instruction. ZDM, 50(3), 507–519. https://doi.org/10.1007/s11858-018-0923-7
Weiss, C. H. (1997). Theory-based evaluation: Past, present, and future. New Directions for Evaluation,
1997(76), 41–55. https://doi.org/10.1002/ev.1086
Weller, N., & Barnes, J. (2014). Finding pathways: Mixed-method research for studying causal mechanisms.
Cambridge University Press.
Wenger, E. (2000). Communities of practice: Learning, meaning, and identity. Cambridge: Cambridge
University Press.
Wenger, E., McDermott, R. A., & Snyder, W. (2002). Cultivating communities of practice: A guide to managing
knowledge. Cambridge MA: Harvard Business Press.
White, H. (2008). Of probits and participation: The use of mixed methods in quantitative impact evaluation. IDS
Bulletin, 39(1), 98–109. https://doi.org/10.1111/j.1759-5436.2008.tb00436.x
Woodward, J. (2005). Making things happen: A theory of causal explanation. Oxford: Oxford University Press.

You might also like