Weisburd 03

Evaluation Review
http://erx.sagepub.com/
Ethical Practice and Evaluation of Interventions in Crime and Justice: The Moral Imperative for
Randomized Trials
David Weisburd
Eval Rev 2003 27: 336
DOI: 10.1177/0193841X03027003007
The online version of this article can be found at:

http://erx.sagepub.com/content/27/3/336
Published by:
http://www.sagepublications.com
Additional services and information for Evaluation Review can be found at:
Email Alerts: http://erx.sagepub.com/cgi/alerts
Subscriptions: http://erx.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://erx.sagepub.com/content/27/3/336.refs.html
>> Version of Record - Jun 1, 2003
What is This?
Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

10.1177/0193841X03252516
EVALUATION
Weisburd / MORAL
REVIEW
IMPERATIVE
/ JUNE 2003
FOR RANDOMIZED TRIALS ARTICLE
ETHICAL PRACTICE AND EVALUATION OF

INTERVENTIONS IN CRIME AND JUSTICE
The Moral Imperative for Randomized Trials
DAVID WEISBURD
The Hebrew University
The University of Maryland
In considering the ethical dilemmas associated with randomized experiments, scholars ordi-
narily focus on the ways in which randomization of treatments or interventions violates accepted
norms of conduct of social science research more generally or evaluation of crime and justice
questions more specifically. The weight of ethical judgment is thus put on experimental research
to justify meeting ethical standards. In this article, it is argued that just the opposite should be
true, and that in fact there is a moral imperative for the conduct of randomized experiments in
crime and justice. That imperative develops from our professional obligation to provide valid
answers to questions about the effectiveness of treatments, practices, and programs. It is sup-
ported by a statistical argument that makes randomized experiments the preferred method for
ruling out alternative causes of the outcomes observed. Common objections to experimentation
are reviewed and found overall to relate more to the failure to institutionalize experimentation
than to any inherent limitations in the experimental method and its application in crime and jus-
tice settings. It is argued that the failure of crime and justice practitioners, funders, and evalua-
tors to develop a comprehensive infrastructure for experimental evaluation represents a serious
violation of professional standards.
Keywords: experiments; randomized experiments; experimental criminology; criminal justice

evaluation; ethical practice
There is a long history of debate regarding whether the conduct of ran-

domized experiments on crime and justice issues meets basic ethical require-
ments for conducting research (e.g., see Baunach 1980; Bergstrom 1985;
Boruch 1975; Boruch, Victor, and Cecil 2000; Clarke and Cornish 1972;
Esbensen 1991; Erez 1986; Geis 1967; Graebsch 2000; Lempert and Visher
1988; Sieber 1982; White and Pezzino 1986). Ordinarily, such discussions
revolve around whether the random allocation of sanctions, programs, or
treatments in criminal justice settings can be justified on the basis of the bene-
fits accrued to society. Or conversely, whether the potential costs of not
EVALUATION REVIEW, Vol. 27 No. 3, June 2003 336-354
DOI: 10.1177/0193841X03252516
© 2003 Sage Publications
336

Weisburd / MORAL IMPERATIVE FOR RANDOMIZED TRIALS 337
providing treatment to some offenders (either in terms of harm to them or to

the principles of equity and justice in the criminal justice system) is out-
weighed by those benefits. The ethical dilemmas in this case are focused on
the ways in which randomization of treatments or interventions violates
accepted norms of conduct of social science research more generally or eval-
uation of crime and justice questions more specifically. The weight of ethical
judgment is thus put on experimental research to justify meeting ethical stan-
dards. Although there is now a large and established literature illustrating that
such ethical barriers can be overcome and that randomized experiments are
appropriate in a very diverse group of circumstances and across much of the
decision making in the criminal justice system (e.g., see Boruch, Snyder, and
DeMoya 2000; Dennis 1988; Farrington 1983; Feder and Boruch 2000;
Petrosino 1997, 1998; Petrosino et al. 2001; Weisburd 2000; Weisburd,
Sherman, and Petrosino 1990), there nonetheless remains a presumption
among crime and justice evaluators that ethical issues weigh more heavily for
experimental than for nonexperimental research.
In this article, I will argue that just the opposite should be true, and that the
failure of crime and justice funders and evaluators to develop a comprehen-
sive infrastructure for experimental evaluation represents a serious violation
of professional standards. In criminological and criminal justice study, as
opposed to medical research, there is, as Jonathan Shepherd (2003 [this
issue]) notes, a comparative famine of randomized trials. In medicine, exper-
imental studies are the preferred mode for discovering whether treatments
have positive benefit, and there have been hundreds of thousands of experi-
ments conducted during the past half a century (Petrosino et al. 2001;
Shepherd 2003). In crime and justice, experimentation is still the exception
rather than the norm, and there are probably less than 1,000 experiments con-
ducted during the same period (Petrosino et al. 2001). Although there are
some studies that suggest that the number of experiments in the social sci-
ences has been increasing (Boruch, Snyder, and DeMoya 2000), as Joel Gar-
ner and Christy Visher (2003 [this issue]) illustrate, there has been a decline
in funding of experimental study in crime in justice in the United States dur-
ing the past decade, at least for the largest funder of criminal justice
research—the National Institute of Justice. And this trend does not seem to be
limited to the United States. As Christopher Nuttall (2003 [this issue])
reports, after a short period of interest in experimental research more than a
quarter century ago, the British Home Office stopped conducting random-
ized studies altogether.
This famine in randomized evaluations of crime and justice programs and
practices is all the more striking given the widespread acceptance that

338 EVALUATION REVIEW / JUNE 2003
experiments provide more valid answers to policy questions than do

nonexperimental studies (e.g., see Boruch, Snyder, and DeMoya 2000;
Campbell and Boruch 1975; Cook and Campbell 1979; Farrington 1983;
Feder, Jolin, and Feyerherm 2000; Flay and Best 1982; Shadish, Cook, and
Campbell 2002; Weisburd, Lum, and Petrosino, 2001; Wilkinson and Task
Force on Statistical Inference 1999). The major problem that faces evaluation
researchers in crime and justice (and indeed, in the social sciences more gen-
erally) is that causes and effects are extremely difficult to isolate in the com-
plex social world in which treatments and programs are implemented. When
we find that some people, institutions, or places do better after treatment, we
are always confronted with the challenge that they did better not because of
treatment but because of some other confounding factor that was not mea-
sured. Sometimes that confounding factor derives from the nature of the
selection processes that lead some people to gain treatment. For example, if a
program relies on volunteers, it may recruit people who are likely to improve
in regard to drug use, crime, or other measures simply because they were
ready and motivated to improve when they volunteered. Sometimes the con-
founding is simply a matter of the natural course of events. For example,
when we choose sites for intervention, we often do so because they have very
serious problems. Accordingly, what appears to be a treatment effect may
simply be a natural regression of such problems to a more moderate level.
The task of isolating the effects of treatments or programs from other con-
founding aspects of selection or design is the researcher’s most significant
challenge in coming to a valid policy conclusion. Through randomization of
treatment and control or comparison conditions, the researcher can assume
that such threats to valid conclusions are distributed equally between the
treatment and control conditions (a point I will examine in detail in the next
section). Accordingly, in a randomized study the effect of treatment is disen-
tangled from the confounding effects of other factors. In nonrandomized
studies, natural or statistical controls must be identified for such threats to
valid conclusions. If such threats are unknown or unmeasured, they will
affect the observed outcomes in ways that are difficult to predict at the outset,
and thus the researcher can never be sure that the observed effects on subjects
are the result of treatment or whether they are due to unmeasured or unknown
causes. Randomized studies are thus the most powerful tool that crime and
justice evaluators have for making valid conclusions about whether programs
or treatments are effective.
For this reason, Robert Boruch (1975) argues that the failure to use ran-
domized experiments represents a serious ethical violation for researchers:

A related line of argument here is that a failure to discover whether a program is effective
is unethical. That is, if one relies solely on nonrandomized assessments to make judg-
ments about the efficacy of a program, subsequent decisions may be entirely inappropri-
ate. Insofar as a failure to obtain unequivocal data on effects then leads to decisions
which are wrong and ultimately damaging, that failure may violate good standards of
both social and professional ethics. Even if the decisions are “correct” in the sense of
coinciding with those one might make based on randomized experiment data, ethical
problems persist. The right action taken for the wrong reason is not especially attractive
if we are to learn anything about how to effectively handle the child abuser, the chroni-
cally ill, the poor trained, and so forth. (P. 135)
From an ethical standpoint, isolating the effects of treatment is one of the

evaluator’s most important obligations to society. Stating that a certain treat-
ment or protocol is effective when it is not will lead to significant societal
costs, economic and social. Moreover, failing to recognize the harms of treat-
ments or interventions can lead to much suffering on the part of the individu-
als receiving treatment, or communities that expect benefit not harms from
them (McCord forthcoming; Oakley 2000).
In this context, there should be an ethical presumption in favor of random-
ized experiments in crime and justice and not one against them. The contin-
ued use of nonexperimental methods in situations where randomized experi-
mental studies are appropriate represents a serious violation of professional
norms for researchers. Clearly, it is also a violation of the responsibilities of
the criminal justice agencies that develop and call for the evaluation of pro-
grams, and the governmental and nongovernmental agencies that often sup-
port such evaluation research.
In the following pages I will examine and develop my argument regarding
the moral imperative for experimental research. I begin by reviewing the the-
oretical basis for the claim of an advantage of experimental methods over
nonexperimental methods. Although it is common to note that experiments
are better, it is also common to minimize the potential damage that is done
when using nonexperimental approaches. I will then turn to basic arguments
that have been used to challenge experimental research. I will argue that criti-
cism of experimentation in crime and justice is linked more to the failures of
scientists, practitioners, and funders to build an infrastructure for experimen-
tal research than to limitations in the experimental method itself. In conclud-
ing, I return to the ethical failures of current crime and justice evaluation, and
the moral imperative for building a solid base for an experimental study of
crime and justice problems.

THE MORAL IMPERATIVE FOR

RANDOMIZED EXPERIMENTS
The moral imperative for experimental research is found in what scholars

define as the internal validity of a study. A research design in which the
effects of a treatment or intervention can be clearly distinguished from other
effects is defined as having high internal validity. A research design in which
the effects of treatment are confounded with other factors is one in which
there is low internal validity. For example, suppose a researcher seeks to
assess the effects of a specific drug treatment program on recidivism. If at the
end of the evaluation the researcher can present study results and confidently
assert that the effects of treatment have been isolated from other confounding
causes, the internal validity of the study is high. But if the researcher has been
unable to ensure that other factors such as the seriousness of prior records or
the social status of offenders have been disentangled from the influence of
treatment, he or she must note that the effects observed for treatment may be
due to such confounding causes. In this case internal validity is low.
In nonrandomized studies, two methods may be used for isolating treatment
or program effects. In the first, generally defined as quasi-experimentation,
researchers either intervene in the research environment to identify valid
comparisons or rely on naturally occurring comparisons for assessing treat-
ment or program impacts. For example, subjects, institutions, or areas that
have and have not received treatment may be matched based on data available
to the researcher. In the second approach, multivariate statistical techniques
are used to isolate the effects of an intervention or treatment from other con-
founding causes. In practice, these two methods are often combined in an
attempt to increase the level of a study’s internal validity.
Overall, nonexperimental methods are thus based on a logic which relies
on what is known about the phenomenon under study to derive valid conclu-
sions. Prior knowledge about groups or individuals under study, for example,
must be used to create treatment and no-treatment comparison groups. Simi-
larly, knowledge about potential confounding causes must be used to identify
factors that may bias the estimate of a program or treatment effect. The prob-
lem with these approaches is that one can never be sure that the comparison
groups are alike on every relevant factor, or that all such factors have been
taken into account through statistical controls. In statistical terms, if we do
not identify all possible confounders in our effort to create equivalence, our
estimates cannot be seen as unbiased.
A short review of the statistical logic here will help to emphasize why this
problem looms so large in making decisions about treatments or programs.
Let us return to our earlier example of the impact of a drug treatment program

on recidivism. If we use a nonexperimental statistical control approach, we

must identify every cause of the outcome (in this case, recidivism) that is also
related to the variable of interest (in our case, drug treatment). We would then
include all of those confounding causes and treatment in a multivariate statis-
tical model. Similarly, if we were to create valid treatment and control com-
parison groups, they would have to be equivalent in regard to these other fac-
tors that influence recidivism.
What happens if we miss a variable that is related to treatment and the out-
come measure? In this case, we have failed to create equivalence for our com-
parison of treatment and no treatment conditions. For example, let us say that
we have completed a study of drug treatment and found that there is a moder-
ate (B = .25) and statistically significant standardized effect of treatment on
recidivism. However, let us also assume that there is a variable Xj which has
an effect on recidivism and is related to treatment but is not measured by the
researcher (either because it was not a known cause, or such data could not be
collected). If we assume that it has a large relationship with recidivism (B2 =
.50) and a large relationship with drug treatment (B = .50), then its inclusion
in the statistical model would lead to an estimated effect of treatment of zero
(see Figure 1). This is because the exclusion of Xj created a bias in the esti-
mate of treatment. This bias can be calculated using standardized coefficients
by simply multiplying the standardized coefficient for the relationship
between drug treatment and Xj, and that of Xj and recidivism. In this example,
absent knowledge of Xj, the researcher would have mistakenly recommended
that the treatment is a useful method for reducing future offending.
Accordingly, at the root of nonexperimental evaluation lies an assumption
about our ability to identify causes of the outcome examined that are related
to the treatment or program conditions. We must assume that we can identify
all such causes because any unknown or unmeasured cause could lead us to
underestimate or overestimate the effect of treatment. This is a very high bar
to expect in any evaluation effort. It is a particularly problematic assumption
in an area where our understanding of causal mechanisms is still in an early
stage of development. The variance not explained in traditional causal mod-
eling in criminology is often substantial and thus must loom as a major threat
to the validity of any nonexperimental approach to evaluation.
If we did not have another method for developing valid conclusions
regarding treatments and programs, the scientific model would be to try to
create incremental advances in our understanding of crime and justice
causes. And indeed, this is the approach that social scientists take in develop-
ing broad models of individual or organizational behavior. However, ran-
domized experimental methods do offer another method for gaining high
internal validity in crime and justice evaluation. In this context, one would

A. Estimate of B1 in the case where the factor (Xj) is unmeasured and excluded
from the model. Estimate of B1 is .25.
Treatment
B1=.25
Outcome
B. Estimate of B1 in the case where the factor (Xj) is included. Estimate of B1 is .00.
Treatment
B1 =.00
B=.50
Outcome
B2 =.50
Xj
Figure 1: Example of the Bias in the Estimate of a Treatment Effect Caused by

the Exclusion of an Unknown or Unmeasured Factor (Xj)
expect a presumption for use of experimental approaches. Indeed, following

our earlier logic, there would be a moral imperative for the use of experimen-
tal methods.
In randomized experiments, valid conclusions about treatment and pro-
gram outcomes are not gained through knowledge of the varied causes of the
outcome examined. Indeed, the experimental approach is a naive approach. It
makes such causes unimportant to the evaluation question posed. When the
researcher randomizes individuals or other units to treatment and control or
comparison conditions, the relationship between treatment and other causes
of outcomes can be theoretically assumed to be zero.1 That is, if treatment is
applied to each subject or unit of analysis randomly, there is no reason to
expect that there is any systematic relationship between treatment and any
other potential cause of change in the outcome of interest. This being the
case, the bias of excluded measures that we illustrated earlier is not relevant
to randomized trials. When we multiply the theoretical effect of that excluded

measure on outcome by that of its relationship to treatment, we will in theory

gain a zero bias. In this sense, the treatment group and the nontreatment group
can be assumed to be equivalent in that the decision to place one in a treat-
ment or nontreatment group does not involve any systematic choice or bias.
However, once treatment is applied to those in the treatment group, there is a
systematic difference between treatment and nontreatment conditions. The
treatment group is subject to an intervention, whereas the control group is
not. Of course, this is precisely the difference that the researcher seeks to
examine in the first place.
If we knew all causes of the outcome examined that were related to treat-
ment, experimental and nonexperimental approaches would be equally valid
for coming to conclusions about outcomes. Indeed, in this scenario one
would likely choose the informed approach of nonexperimental studies
because they allow the researcher to examine the effects of treatment in the
context of the broad array of factors that influence outcomes (see Heckman
and Smith 1995). However, in practice there is no justification, given present
knowledge, for taking this approach in crime and justice. The likelihood of
unknown or unmeasured threats to the validity of the observations made by
crime and justice researchers is so high that there is a clear moral imperative
for the use of randomized experimental methods.
ETHICAL AND PRACTICAL BARRIERS TO

EXPERIMENTATION AND THE STRUCTURE
OF CRIME AND JUSTICE RESEARCH
I have established so far the basis for the moral imperative for experi-
mental research, but I have not examined the main arguments against exper-
imental study. Although there is large literature dealing with the barriers to
experimentation (e.g., see Bauman, Tsui, and Viadro 1994; Dunford 1990;
Feder, Jolin, and Feyerherm, 2000; Heckman and Smith 1995; Mitroff 1983;
Morrison 2001; Petersilia 1989; White and Pezzino 1986), I think it useful to
focus particular attention on concerns raised by Clarke and Cornish (1972)
more than a quarter century ago. Clarke and Cornish were to have a major
effect on the development of experimental research, and their arguments con-
tinue to be central in resistance to experimental methods in criminal justice
(e.g., see Pawson and Tilley 1997). Indeed, as documented by Farrington
(forthcoming) and Nuttall (2003), Clarke and Cornish’s 1972 article on “The
Controlled Trial in Institutional Research” played a central part in ending a
short period of experimental study in England in the late 1960s and early

1970s, and in preventing subsequent development of experimental studies in

England.
Clarke and Cornish’s (1972) critique of experimentation is based on their
experience in implementing a large-scale randomized trial evaluating a thera-
peutic community at the Kingswood Training School. Their status as experi-
menters of course added weight to the substance of their conclusion that the
experimental method represented a pitfall rather than a paradigm for penal
researchers. I will argue below that the concerns raised by Clarke and Cornish
can be seen as part of a failure to institutionalize experimental study in crimi-
nal justice rather than as an inherent limitation in the use of experimental
methods.
CONTRASTING BELIEF SYSTEMS

BETWEEN RESEARCHERS AND PRACTITIONERS
One of the major problems that Clarke and Cornish observed in the experi-
ment in Kingswood was that practitioners began to undermine the experi-
ment by limiting the number of boys who could be considered for random
allocation. This led to the research being extended for a much longer period
than expected, and finally to the research effort being stopped before it had
gained the desired number of cases for study. As Clarke and Cornish (1972)
note, an
important factor in the decrease in number of boys assigned to the school seems to have
been that the area classifying school became increasingly concerned about the advisabil-
ity of sending boys to Kingswood Training School, where, through the operation of ran-
dom allocation, they might not receive the treatment considered to be most suitable. (P. 8)
They concluded that “research workers have often failed to appreciate that
evaluation in the penal field poses particular ethical problems, of a complex-
ity not usually encountered in medical research” (1972, 8).
It is clear that there are special ethical questions posed by the random allo-
cation of criminal justice treatments or programs that may not apply in medi-
cine or other fields. These are perhaps most significant in the area of coercive
treatments where informed consent may not be desirable or possible. How-
ever, one might ask whether the specific ethical questions raised in the Kings-
wood experiment and more general concerns about ethical implementation
of experiments often have more to do with the contrasting belief systems of
researchers and practitioners than with the ethics of experimentation in crime
and justice.

One of the fundamental principles of experimental research is that experi-

ments should only be conducted if treatment effects are unknown or at least
uncertain (Boruch 1975). Accordingly, if practitioners had known what was
the best treatment for the boys who might be assigned to the Kingswood
experiment, it would indeed have been unethical to randomly allocate them to
the treatment and control conditions. What happened in Kingswood is that
practitioners believed that they knew which treatments worked best.
Researchers, however, thought that the effectiveness of treatment was not
clear. On the basis of their beliefs, researchers implemented a randomized
study. But on the basis of their beliefs, practitioners eventually undermined
the experimental evaluation. In this sense, this pitfall of the Kingswood
experiment is found in a fundamental disjuncture between the perceptions of
researchers and practitioners.
From where does this fundamental difference in beliefs develop? Clarke
and Cornish (1972, 5) note that researchers can “suspend judgment” on treat-
ment until scientific evidence is available, but that practitioners “cannot
afford to be so impartial.” They must make real decisions about people. They
have to use evidence that is currently available, and often rely on clinical
experience. Of course, one of the reasons why practitioners cannot be “so
impartial” is that they are not convinced of the utility of the scientific process
in developing evidence on what works. Similarly, I think it would be fair to
say that researchers often have little respect for the clinical experiences that
are the basis of many practitioner treatment decisions.
Clarke and Cornish’s concerns are thus very relevant to the successful
development of an experimental crime and justice. However, the problem
they raise may be better phrased as a lack of shared perceptions and values
about treatment and methods of learning about treatment than a specific
objective ethical barrier to experimental methods. The question, of course, is:
How can we overcome this disjuncture in the perceptions of researchers and
practitioners? Clarke and Cornish are perhaps correct that this will be diffi-
cult, given present institutional relations between researchers and practitio-
ners in crime and justice.
Shepherd (2003) argues in this light that the famine of criminal justice
research as contrasted with medical study can be traced in good part to the
structural differences between the two fields. He notes that the integrated
roles of university schools of health science where “clinical academics treat
patients, teach undergraduate clinical students and practitioners and carry out
research” (p. 291) facilitates the development of experimental practice in
medicine. Clinical work and academic research are integrated in theory and
practice, and perhaps just as important this model provides for shared

socialization of clinicians and researchers. The medical model of course does

not lead to an environment where contrasting perceptions are not found.
However, it helps to create shared norms about treatment and research, and
perhaps forces clinicians and researchers to recognize the value of clinical
and experimental knowledge.
Although it is not clear that the medical model described by Shepherd is
fully appropriate for criminal justice, it is certainly the case that the separa-
tion of academic criminology and criminal justice from crime and justice
practice has hindered the development of experimental research in this field.
In this context, it is perhaps not surprising that the most successful wide-scale
application of experimental methods in criminal justice developed when aca-
demic researchers were integrated within a criminal justice agency. This was
the case with the California Youth Authority in the 1960s and 1970s, when it
was responsible for fostering a series of randomized studies (Palmer and
Petrosino 2003 [this issue]). More generally, I think it fair to say that the
moral imperative of experimental research is challenged when there is not a
shared set of beliefs about treatment and evaluation methods among practi-
tioners and researchers. Indeed, the moral imperative for experimentation
relies on our belief that something important must be learned. When those
involved in experimentation believe that the answer is already known, the
justification for experimental study is lost.
THE SELF-SELECTION OF EXPERIMENTAL SITES
A second concern raised by Clarke and Cornish refers to the difficulty of

generalizing from experimental studies. They note that the special nature of
the specific institutional settings that were part of the Kingswood experiment
were difficult, if not impossible, to disentangle from the treatment itself. For
example, evaluation of the experimental process revealed “the very many
ways in which the houses differed in addition to preferred treatment tech-
niques” (1972, 15). Of course, this criticism can be raised with any evaluation
study, and indeed is a problem more generally for crime and justice study. As
Farrington (forthcoming) notes in a recent review of the Clark and Cornish
criticisms:
The generalization problem applies to all kinds of evaluation research. Any single pro-
gram is unique in some respects but representative in other respects. However, it is gener-
ally true that randomized experiments maximize internal validity, and that external valid-
ity and factors that moderate the impact of interventions need to be addressed in
replications. It is important to accumulate the results of numerous evaluations in system-

atic reviews and meta-analyses, to see how far results can be generalized, and how far
they are influenced by specific features of the program or the evaluation.
But perhaps more important in this regard is the concern that institutions
that agree to experimentation are themselves a self-selected group, and thus
what we learn from experimental trials tells us little about the operations of
treatment and their outcomes in the real world. As Clarke and Cornish (1972)
write,
With deeper knowledge of the treatment situation it became obvious that there was no
good reason for believing that the houses being studied were representative either of the
class of all possible “therapeutic communities” in approved schools, in the one case, or of
all actual “adult directed” house regimes in the other. Each of the houses studied was in
some respects unique. (P. 14)
It might be argued that all evaluation studies suffer to some degree from
this problem, based on the fact that there are factors that lead some communi-
ties or organizations to participate in research and others not. Nonetheless, it
is clearly the case that it is more difficult to gain institutional consent for a
randomized study than for one that is not randomized, because randomiza-
tion involves much more intrusion into the operations of the institution
affected (see also Eck 2002).
In this sense, experimental studies are likely to have a lower level of what
is commonly termed external validity than are nonexperimental evaluations.
But again, the problem may not lie primarily in the experimental method but
rather in the institutional relationships that lead to the development of
research studies. The history of experimental study in the California Youth
Authority (CYA) sheds light on this question (see Palmer and Petrosino
2003). External funding rather than problems of researcher access to institu-
tional participants appeared to play the central role in defining the methodol-
ogies used in carrying out CYA research. The National Institute of Mental
Health (NIMH), with its strong connection to medical and public health insti-
tutions, recognized the value and importance of experimental studies.
Accordingly, during the period when the NIMH was a primary funder of
CYA studies, randomized experimental evaluations were common. How-
ever, when NIMH funding ended, and CYA researchers were forced to look
to the newly established Law Enforcement Assistance Administration and
state and local criminal justice agencies for research support, there was no
longer recognition of the importance of experimental methods. Indeed, such
agencies provided “little opportunity and incentive for researchers who
might have wished to utilize randomized trials” (Palmer and Petrosino 2003,
224).

A similar history of feast and famine is described by Garner and Visher

(2003), who identify the heyday of experimental research in criminal justice
during the tenure of James K. Stewart as director the National Institute of Jus-
tice (NIJ). In a short period, a series of experimental studies were supported
in the area of policing dealing, for example, with domestic violence (e.g., see
Dunford, Huizinga, and Elliott 1990; Hirschel et al. 1990; Pate, Hamilton,
and Annan 1991; Sherman 1992; Sherman and Berk 1984; Sherman and
Cohn 1989) and crime hot spots (e.g., see Sherman and Rogan 1995;
Sherman and Weisburd 1995; Weisburd and Green 1995). Participating sites
were a diverse group representing different regions of the country and a broad
array of social and organizational contexts. Accordingly, the support of the
federal government for large-scale experimentation led a broad array of
police agencies to decide to be involved in experimental evaluations.
In the examples of the CYA and the NIJ field studies, institutional support
for experimentation led to a broader involvement of institutions in experi-
mentation. Nonetheless, it might be argued that this group is still likely to be
self-selected and thus our ability to generalize from our findings is still lim-
ited. Of course, there is a point at which this argument must lose value if we
accept the moral imperative of experimentation. More generally, it seems
clear that encouragement rather than discouragement of experimental study
by funders in criminal justice would quickly lead to the development of a
more generalizable experimental criminal justice. Perhaps as well, the moral
imperative of experimentation should lead to a presumption of experimental
analysis on the part of funders and practitioners, much as is the case in medi-
cal outcome studies.
THE SIMPLICITY OF EXPERIMENTATION AND

THE COMPLEXITY OF CRIME AND JUSTICE CONTEXTS
Perhaps the most damning criticism of randomized trials raised by Clarke

and Cornish (1972) is that experimental studies are too rigid to address the
complexity of crime and justice contexts. The treatments in the Kingswood
experiment involved so many components that it was virtually impossible to
define clearly what in fact treatment was in the first place. Moreover, the
treatments were variable over time. Summarizing Clarke and Cornish’s con-
cerns, Nuttall (2003) notes that the “complexity of the treatments and the fact
that they were changed significantly during the experiment meant that what-
ever results were obtained could not be explained” (p. 277). Nuttall further
reports on a conversation with Clarke in which Clarke argued “that the

experiment might have been able to say ‘what’ happened—but it could not
answer ‘how’ or ‘why.’ ”
This concern with the failure of experimental methods to deal with the
complexity of criminal justice settings has been raised by a number of other
prominent critics of experimentation. Mitroff (1983), for example, doubts
that experiments can be widely applied in social research because of the mess
of social systems. This concern is also at the core of Pawson and Tilley’s
(1997) influential attack on the experimental method. They argue that experi-
mentation tends to apply inflexible and broad categorical treatments. This, in
their view, often leads them to miss precisely what is interesting about treat-
ment in criminal justice, that there is an important interaction between the
nature of treatment and the nature of the subjects examined:
What we are suggesting is that the engine for program success comes when the right
horse chooses the right course, and experimentalists have been remarkably poor at play-
ing this particular field. . . . Our argument is that the best way to get at the crucial causal
harmonies is to hypothesize and test within-program variation in the success rates of dif-
ferent subgroups of subjects. Alas . . . the apparatus of the OXO experiment, with all of its
other priorities, tends to suppress such an insight. (P. 43)
For each horse there is a different course in this context, and therefore the
apparent rigidity of experimental designs is a fundamental barrier to their
application in social settings.
Interestingly, the idea of interaction and the complexity of the relationship
between treatment and outcomes has long been recognized in experimental
study in medicine. Bradford Hill (1962, 11), for example, a pioneer in ran-
domized trials in England, argued that physicians must take into account the
fact that “one man’s meat is another man’s poison.” Moreover, it is well rec-
ognized in medicine that differences between institutions or populations
studied can have important effects on the outcomes of treatments (e.g., see
Fleiss 1982). This is one reason why in medicine there are often multicenter
trials in which there is careful coordination of protocols and cooperation in
analysis of outcomes (Borok et al. 1994; Friedman, Furberg, and DeMets
1985; Hill 1962; Stanley, Stjernsward, and Isley 1981). Such studies provide
an experimental basis for looking at the complexity of treatment/context
interactions.
Nonetheless, it is true that in evaluations of crime and justice interven-
tions, there has been little concern with interactions between treatment and
context (e.g., through the use of block randomization methods), and the idea
of multisite trials has only recently been raised (see Weisburd and Taxman
2000). Although there is growing evidence of the success of experimentation
in evaluating treatments and outcomes in crime and justice, concerns

regarding the complexity of treatment and the interaction of treatments and

contexts remain. The moral imperative for experimentation that I have
described does not only demand the use of experimental approaches but also
the development of experimental methods that are capable of addressing the
complexity of crime and justice treatments, settings, and subjects. Such an
experimental crime and justice, however, cannot be developed without a
more concerted institutionalization of experimental methods. As Shepherd
(2003) argues in regard to medicine,
Mounting randomized clinical trials requires machinery for ethical, scientific, financial
and service management scrutiny and approval. In medical science, these processes have
been defined over many years and have facilitated randomized trials not only of medical
treatment with pharmaceutical products but also trials of surgical interventions, such as
endoscopic surgery. (p. 307)
CONCLUSIONS
I began my discussion by defining a moral imperative for randomized

experiments in crime and justice. That imperative develops from our profes-
sional obligation to provide valid answers to questions about the effective-
ness of treatments, practices, and programs. It is supported by a statistical
argument that makes randomized experiments the preferred method for rul-
ing out alternative causes of the outcomes observed. I also discussed core
concerns that have been raised about the development of randomized trials in
crime and justice. These were seen overall to relate more to the failure to
institutionalize experimentation than to any inherent limitations in the exper-
imental method and its application in crime and justice settings.
At the core of my argument is the idea of ethical practice. In some sense, I
turn traditional discussion of the ethics of experimentation on its head. Tradi-
tionally, it has been assumed that the burden has been on the experimenter to
explain why it is ethical to use experimental methods. My suggestion is that
we must begin rather with a case for why experiments should not be used. The
burden here is on the researcher to explain why a less valid method should be
the basis for coming to conclusions about treatment or practice. The ethical
problem is that when choosing nonexperimental methods we may be violat-
ing our basic professional obligation to provide the most valid answers we
can to the questions that we are asked to answer.
But the development of a comprehensive experimental approach to crime
and justice will demand the creation of a series of new institutional arrange-
ments. The present disjuncture between academic criminology/criminal

justice and clinical practice will have to be altered if we are to carry out suc-
cessful trials in crime and justice settings. The priorities of criminal justice
funders will also have to be reoriented to reflect the moral imperative for
experimental study. History shows that the development of experimentation
and the wide diffusion of these methods demands the support of funding
agencies. Finally, there is a necessity for concentrating on a more compre-
hensive infrastructure for supervising the conduct of randomized experi-
ments in crime and justice and dealing with methodological problems spe-
cific to this field. Such an infrastructure is crucial if experiments in crime
justice are to be as varied and complex as the social contexts in which they are
found.
My call for a new infrastructure for experimental crime and justice study
is certainly ambitious. But it is in some sense simply a fulfillment of our ethi-
cal and professional responsibilities to the wider community which is
affected by crime and justice problems. Not to develop a comprehensive
experimental crime and justice is to rely on less valid methods in answering
important public policy questions. To tolerate this situation strikes me as an
ethical breach, but one that is today shared by researchers, practitioners, and
institutions that are responsible for crime and justice evaluation.
NOTE
1. In practice, there may be relationships between other causes and treatment, although using
this logic we assume that such relationships are chance ones and likely to be balanced out in
terms of negative and positive bias.
REFERENCES
Bauman, K., A. Tsui, and C. Viadro. 1994. Use of true experimental designs for family planning
program evaluation: Merits, problems and solutions. International Family Planning Per-
spectives 20 (3): 111-13.
Baunach, P. J. 1980. Random assignment in criminal justice research—Some ethical and legal
issues. Criminology 17 (4): 435-44.
Bergstrom, K. R. 1985. Police experimentation with civilian subjects—Formalizing the infor-
mal. In Police leadership in America, edited by W. Geller, 444-48. Westport, CT: Praeger.
Borok, G., D. Reuben, L. Zendle, D. Ershoff , G. Wolde-Tsadik, L. Rubenstein, V. Ambrosini,
L. Fishman, and J. Beck. 1994. Rationale and design of a multi-center randomized trial of
comprehensive geriatric assessment consultation for hospitalized patients in a HMO. Jour-
nal of the American Geriatric Society 42:536-44.

Boruch, R. 1975. On common contentions about randomized field experiments. In Experimental

testing of public policy: The Proceedings of the 1974 Social Sciences Research Council Con-
ference on Social Experimentation, edited by R. F. Boruch and H. W. Reicken, 107-42. Boul-
der, CO: Westview Press.
Boruch, R., B. Snyder, and D. DeMoya. 2000. The importance of randomized field trials. Crime &
Delinquency 46:156-80.
Boruch, R., T. Victor, and J. Cecil. 2000. Resolving ethical and legal problems in randomized
experiments. Crime & Delinquency 46 (3): 300-53.
Campbell, D. T., and R. Boruch. 1975. Making the case for randomized assignments to treat-
ments by considering the alternatives: Six ways in which quasi-experimental evaluations in
compensatory education tend to underestimate effects. In Central issues in social program
evaluation, edited by C. Bennett and A. Lumsdaine. New York: Academic Press.
Clarke, R., and D. Cornish. 1972. The controlled trial in institutional research: Paradigm or pit-
fall for penal evaluators? London: HMSO.
Cook, T. D., and D. T. Campbell. 1979. Quasi-experimentation: Design and analysis issues for
field settings. Chicago: Rand McNally.
Dennis, M. L. 1988. Implementing randomized field experiments: An analysis of criminal and
civil justice research. Unpublished dissertation, Northwestern University. Ann Arbor, MI:
University Microfilms.
Dunford, F. 1990. Random assignment: Practical considerations for field experiments. Evalua-
tion and Problem Planning 13:125-32.
Dunford, F., D. Huizinga, and D. Elliott. 1990. The role of arrest in domestic assault: The Omaha
police experiment. Criminology 28:183-206.
Eck, J. 2002. Learning from experience in problem oriented policing and crime prevention: The
positive function of weak evaluations and the negative functions of strong ones. In Evalua-
tion for crime prevention: Crime prevention studies, Vol. 14, edited by N. Tilley, 93-117.
Monsey, NY: Criminal Justice Press.
Esbensen, F. 1991. Ethical considerations in criminal justice research. American Journal of
Police 10 (2): 87-104.
Erez, E. 1986. Randomized experiments in correctional context: Legal, ethical, and practical
concerns. Journal of Criminal Justice 14:389-400.
Farrington, D. Forthcoming. British randomized experiments on crime and justice. Annals of the
American Academy of Political and Social Science.
. 1983. Randomized experiments in crime and justice. In Crime and justice: An annual
review of research, edited by N. Morris and M. Tonry, 257-308. Chicago: University of Chi-
cago Press.
Feder, L., and R. Boruch. 2000. The need for experiments in criminal justice settings. Crime &
Delinquency 46:291-94.
Feder, L., A. Jolin, and W. Feyerherm. 2000. Lessons from two randomized experiments in crim-
inal justice settings. Crime & Delinquency 46 (3): 380-400.
Flay, B. R., and J. A. Best. 1982. Overcoming design problems in evaluating health behavior pro-
grams. Evaluation and the Health Professions 5 (1): 43-69.
Fleiss, J. 1982. Multicentre clinical trials: Bradford Hill’s contributions and some subsequent
development. Statistics in Medicine 1:353-59.
Friedman, L., C. Furberg, and D. DeMets. 1985. Fundamentals of clinical trials. Littleton, MA:
PSG.
Garner, J. H., and C. A. Visher. 2003. The production of criminological experiments. Evaluation
Review 27 (3): 316-35.

Geis, G. 1967. Ethical and legal issues in experimentation with offender populations. In
Research in correctional rehabilitation. Washington, DC: Joint Commission on Correc-
tional Manpower and Training.
Graebsch C. 2000. Legal issues of randomized experiments on sanctioning. Journal of Crime
and Delinquency 46 (2): 271-82.
Heckman, J., and J. A. Smith. 1995. Assessing the case for social experiments. Journal of Eco-
nomic Perspectives 9 (2): 85-110.
Hill, B. 1962. Principles of medical statistics. New York: Oxford University Press.
Hirschel, D., I. Hutchison III, C. Dean, J. Kelley, and C. Pesackis. 1990. Charlotte Spouse
Assault Replication Project: Final report. Washington, DC: National Institute of Justice.
Lempert, R. O., and C. A. Visher. 1988. Randomized field experiments in criminal justice agen-
cies. Washington, DC: National Institute of Justice.
McCord, J. Forthcoming. Cures that harm: Unanticipated outcomes of crime prevention pro-
grams. Annals of the American Academy of Political and Social Science.
Mitroff, I. 1983. Beyond experimentation. In Handbook of social intervention, edited by
E. Seidman. Beverly Hills, CA: Sage.
Morrison, K. 2001. Randomized controlled trials for evidence-based education: Some problems
in judging “what works.” Evaluation and Research in Education 15 (2): 69-83.
Nuttall, C. 2003. The Home Office and random allocation experiments. Evaluation Review 27 (3):
267-89.
Oakley, A. 2000. Historical perspective on the use of randomized trials in social science settings.
Crime & Delinquency 46 (3): 315-29.
Palmer, T., and A. Petrosino. 2003. The “experimenting agency”: The California Youth Author-
ity Research Division. Evaluation Review 27 (3): 228-66.
Pate, A., E. Hamilton, and S. Annan. 1991. Metro-Dade Spouse Abuse Replication Project:
Draft final report. Washington, DC: Police Foundation.
Pawson, R., and N. Tilley. 1997. Realistic evaluation. London: Sage.
Petersilia, J. 1989. Implementing randomized experiments: Lessons from BJA’s Intensive
Supervision Project. Evaluation Review 13 (5): 435-58.
Petrosino, A. 1997. “What works?” revisited again: A meta-analysis of randomized field experi-
ments in individual level interventions. Unpublished dissertation. Ann Arbor, MI: Univer-
sity Microfilms.
. 1998. A survey of 150 randomized experiments in crime reduction: Some preliminary
findings. Forum (Justice Research and Statistics Association) 16 (1): 7-8.
Petrosino, A., R. Boruch, H. Soydan, L. Duggan, and J. Sanchez-Meca. 2001. Meeting the chal-
lenges of evidence-based policy: The Campbell Collaboration. Annals of the American
Academy of Political and Social Science 578:14-34.
Petrosino, A., C. Turpin-Petrosino, and J. Finckenauer. 2000. Well-meaning programs can have
harmful effects! Lessons from experiments of programs such as Scared Straight. Crime &
Delinquency 46 (3): 354-79.
Shadish, W., T. Cook, and D. Campbell. 2002. Experimental and quasi-experimental designs for
generalized causal inference. Boston: Houghton Mifflin Company.
Shepherd, J. P. 2003. Explaining feast or famine in randomized field trials. Evaluation Review
27 (3): 290-315.
Sherman, L. 1992. Policing domestic violence: Experiments and dilemmas. New York: The Free
Press.
Sherman, L., and R. Berk. 1984. The specific deterrent effects of arrest for domestic assault.
American Sociological Review 49 (2): 261-72.

Sherman, L., and E. Cohn. 1989. The impact of research on legal policy: The Minneapolis
Domestic Violence Experiment. Law and Society Review 23 (1): 117-44.
Sherman, L., and D. Rogan. 1995. Deterrent effects of police raids on crack houses: A random-
ized, controlled experiment. Justice Quarterly 12:755-81.
Sherman, L., and D. Weisburd. 1995. General deterrent effects of police patrol in crime “hot
spots”: A randomized controlled trial. Justice Quarterly 12 (4): 625-48.
Sieber, J. Ed. 1982. The ethics of social research. New York: Springer-Verlag.
Stanley, K., M. Stjernsward, and M. Isley. 1981. The conduct of a cooperative clinical trial. New
York: Springer-Verlag.
Weisburd, D. 2000. Randomized experiments in criminal justice policy: Prospects and prob-
lems. Crime & Delinquency 46 (2): 181-93.
Weisburd, D., and L. Green. 1995. Policing drug hot spots: The Jersey City Drug Market Analysis
Experiment. Justice Quarterly 12:711-35.
Weisburd, D., C. Lum, and A. Petrosino. 2001. Does research design affect study outcomes in
criminal justice? Annals of the American Academy of Political and Social Science 578:50-70.
Weisburd, D., L. Sherman, and A. Petrosino. 1990. Registry of randomized experiments in sanc-
tions. Washington, DC: National Institute of Justice Report.
Weisburd, D., and F. Taxman. 2000. Developing a multi-center randomized trail in criminology:
The case of HIDTA. Journal of Quantitative Criminology 16 (3): 315-40.
White, K., and J. Pezzino. 1986. Ethical, practical and scientific considerations of randomized
experiments in early childhood special education. Topics in Early Childhood Special Educa-
tion 6 (3): 100-16.
Wilkinson, L., and Task Force on Statistical Inference. 1999. Statistical methods in psychology
journals: Guidelines and explanations. American Psychologist 54:594-604.
David Weisburd holds a joint appointment at the Hebrew University and the University of
Maryland.

Weisburd 03

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Weisburd 03

Uploaded by

Copyright:

Available Formats

Evaluation Review

The online version of this article can be found at:

Email Alerts: http://erx.sagepub.com/cgi/alerts

>> Version of Record - Jun 1, 2003

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

ETHICAL PRACTICE AND EVALUATION OF

Keywords: experiments; randomized experiments; experimental criminology; criminal justice

There is a long history of debate regarding whether the conduct of ran-

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

providing treatment to some offenders (either in terms of harm to them or to

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

experiments provide more valid answers to policy questions than do

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

From an ethical standpoint, isolating the effects of treatment is one of the

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

THE MORAL IMPERATIVE FOR

The moral imperative for experimental research is found in what scholars

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

on recidivism. If we use a nonexperimental statistical control approach, we

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

Figure 1: Example of the Bias in the Estimate of a Treatment Effect Caused by

expect a presumption for use of experimental approaches. Indeed, following

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

measure on outcome by that of its relationship to treatment, we will in theory

ETHICAL AND PRACTICAL BARRIERS TO

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

1970s, and in preventing subsequent development of experimental studies in

CONTRASTING BELIEF SYSTEMS

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

One of the fundamental principles of experimental research is that experi-

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

socialization of clinicians and researchers. The medical model of course does

THE SELF-SELECTION OF EXPERIMENTAL SITES

A second concern raised by Clarke and Cornish refers to the difficulty of

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

A similar history of feast and famine is described by Garner and Visher

THE SIMPLICITY OF EXPERIMENTATION AND

Perhaps the most damning criticism of randomized trials raised by Clarke

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

regarding the complexity of treatment and the interaction of treatments and

I began my discussion by defining a moral imperative for randomized

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

Boruch, R. 1975. On common contentions about randomized field experiments. In Experimental

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

Downloaded from erx.sagepub.com at ARIZONA STATE UNIV on August 17, 2014

You might also like