Professional Documents
Culture Documents
Weisburd 03
Weisburd 03
http://erx.sagepub.com/
Ethical Practice and Evaluation of Interventions in Crime and Justice: The Moral Imperative for
Randomized Trials
David Weisburd
Eval Rev 2003 27: 336
DOI: 10.1177/0193841X03027003007
Published by:
http://www.sagepublications.com
Additional services and information for Evaluation Review can be found at:
Subscriptions: http://erx.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://erx.sagepub.com/content/27/3/336.refs.html
What is This?
DAVID WEISBURD
The Hebrew University
The University of Maryland
In considering the ethical dilemmas associated with randomized experiments, scholars ordi-
narily focus on the ways in which randomization of treatments or interventions violates accepted
norms of conduct of social science research more generally or evaluation of crime and justice
questions more specifically. The weight of ethical judgment is thus put on experimental research
to justify meeting ethical standards. In this article, it is argued that just the opposite should be
true, and that in fact there is a moral imperative for the conduct of randomized experiments in
crime and justice. That imperative develops from our professional obligation to provide valid
answers to questions about the effectiveness of treatments, practices, and programs. It is sup-
ported by a statistical argument that makes randomized experiments the preferred method for
ruling out alternative causes of the outcomes observed. Common objections to experimentation
are reviewed and found overall to relate more to the failure to institutionalize experimentation
than to any inherent limitations in the experimental method and its application in crime and jus-
tice settings. It is argued that the failure of crime and justice practitioners, funders, and evalua-
tors to develop a comprehensive infrastructure for experimental evaluation represents a serious
violation of professional standards.
336
A related line of argument here is that a failure to discover whether a program is effective
is unethical. That is, if one relies solely on nonrandomized assessments to make judg-
ments about the efficacy of a program, subsequent decisions may be entirely inappropri-
ate. Insofar as a failure to obtain unequivocal data on effects then leads to decisions
which are wrong and ultimately damaging, that failure may violate good standards of
both social and professional ethics. Even if the decisions are “correct” in the sense of
coinciding with those one might make based on randomized experiment data, ethical
problems persist. The right action taken for the wrong reason is not especially attractive
if we are to learn anything about how to effectively handle the child abuser, the chroni-
cally ill, the poor trained, and so forth. (P. 135)
A. Estimate of B1 in the case where the factor (Xj) is unmeasured and excluded
from the model. Estimate of B1 is .25.
Treatment
B1=.25
Outcome
B. Estimate of B1 in the case where the factor (Xj) is included. Estimate of B1 is .00.
Treatment
B1 =.00
B=.50
Outcome
B2 =.50
Xj
I have established so far the basis for the moral imperative for experi-
mental research, but I have not examined the main arguments against exper-
imental study. Although there is large literature dealing with the barriers to
experimentation (e.g., see Bauman, Tsui, and Viadro 1994; Dunford 1990;
Feder, Jolin, and Feyerherm, 2000; Heckman and Smith 1995; Mitroff 1983;
Morrison 2001; Petersilia 1989; White and Pezzino 1986), I think it useful to
focus particular attention on concerns raised by Clarke and Cornish (1972)
more than a quarter century ago. Clarke and Cornish were to have a major
effect on the development of experimental research, and their arguments con-
tinue to be central in resistance to experimental methods in criminal justice
(e.g., see Pawson and Tilley 1997). Indeed, as documented by Farrington
(forthcoming) and Nuttall (2003), Clarke and Cornish’s 1972 article on “The
Controlled Trial in Institutional Research” played a central part in ending a
short period of experimental study in England in the late 1960s and early
One of the major problems that Clarke and Cornish observed in the experi-
ment in Kingswood was that practitioners began to undermine the experi-
ment by limiting the number of boys who could be considered for random
allocation. This led to the research being extended for a much longer period
than expected, and finally to the research effort being stopped before it had
gained the desired number of cases for study. As Clarke and Cornish (1972)
note, an
important factor in the decrease in number of boys assigned to the school seems to have
been that the area classifying school became increasingly concerned about the advisabil-
ity of sending boys to Kingswood Training School, where, through the operation of ran-
dom allocation, they might not receive the treatment considered to be most suitable. (P. 8)
They concluded that “research workers have often failed to appreciate that
evaluation in the penal field poses particular ethical problems, of a complex-
ity not usually encountered in medical research” (1972, 8).
It is clear that there are special ethical questions posed by the random allo-
cation of criminal justice treatments or programs that may not apply in medi-
cine or other fields. These are perhaps most significant in the area of coercive
treatments where informed consent may not be desirable or possible. How-
ever, one might ask whether the specific ethical questions raised in the Kings-
wood experiment and more general concerns about ethical implementation
of experiments often have more to do with the contrasting belief systems of
researchers and practitioners than with the ethics of experimentation in crime
and justice.
The generalization problem applies to all kinds of evaluation research. Any single pro-
gram is unique in some respects but representative in other respects. However, it is gener-
ally true that randomized experiments maximize internal validity, and that external valid-
ity and factors that moderate the impact of interventions need to be addressed in
replications. It is important to accumulate the results of numerous evaluations in system-
atic reviews and meta-analyses, to see how far results can be generalized, and how far
they are influenced by specific features of the program or the evaluation.
But perhaps more important in this regard is the concern that institutions
that agree to experimentation are themselves a self-selected group, and thus
what we learn from experimental trials tells us little about the operations of
treatment and their outcomes in the real world. As Clarke and Cornish (1972)
write,
With deeper knowledge of the treatment situation it became obvious that there was no
good reason for believing that the houses being studied were representative either of the
class of all possible “therapeutic communities” in approved schools, in the one case, or of
all actual “adult directed” house regimes in the other. Each of the houses studied was in
some respects unique. (P. 14)
It might be argued that all evaluation studies suffer to some degree from
this problem, based on the fact that there are factors that lead some communi-
ties or organizations to participate in research and others not. Nonetheless, it
is clearly the case that it is more difficult to gain institutional consent for a
randomized study than for one that is not randomized, because randomiza-
tion involves much more intrusion into the operations of the institution
affected (see also Eck 2002).
In this sense, experimental studies are likely to have a lower level of what
is commonly termed external validity than are nonexperimental evaluations.
But again, the problem may not lie primarily in the experimental method but
rather in the institutional relationships that lead to the development of
research studies. The history of experimental study in the California Youth
Authority (CYA) sheds light on this question (see Palmer and Petrosino
2003). External funding rather than problems of researcher access to institu-
tional participants appeared to play the central role in defining the methodol-
ogies used in carrying out CYA research. The National Institute of Mental
Health (NIMH), with its strong connection to medical and public health insti-
tutions, recognized the value and importance of experimental studies.
Accordingly, during the period when the NIMH was a primary funder of
CYA studies, randomized experimental evaluations were common. How-
ever, when NIMH funding ended, and CYA researchers were forced to look
to the newly established Law Enforcement Assistance Administration and
state and local criminal justice agencies for research support, there was no
longer recognition of the importance of experimental methods. Indeed, such
agencies provided “little opportunity and incentive for researchers who
might have wished to utilize randomized trials” (Palmer and Petrosino 2003,
224).
experiment might have been able to say ‘what’ happened—but it could not
answer ‘how’ or ‘why.’ ”
This concern with the failure of experimental methods to deal with the
complexity of criminal justice settings has been raised by a number of other
prominent critics of experimentation. Mitroff (1983), for example, doubts
that experiments can be widely applied in social research because of the mess
of social systems. This concern is also at the core of Pawson and Tilley’s
(1997) influential attack on the experimental method. They argue that experi-
mentation tends to apply inflexible and broad categorical treatments. This, in
their view, often leads them to miss precisely what is interesting about treat-
ment in criminal justice, that there is an important interaction between the
nature of treatment and the nature of the subjects examined:
What we are suggesting is that the engine for program success comes when the right
horse chooses the right course, and experimentalists have been remarkably poor at play-
ing this particular field. . . . Our argument is that the best way to get at the crucial causal
harmonies is to hypothesize and test within-program variation in the success rates of dif-
ferent subgroups of subjects. Alas . . . the apparatus of the OXO experiment, with all of its
other priorities, tends to suppress such an insight. (P. 43)
For each horse there is a different course in this context, and therefore the
apparent rigidity of experimental designs is a fundamental barrier to their
application in social settings.
Interestingly, the idea of interaction and the complexity of the relationship
between treatment and outcomes has long been recognized in experimental
study in medicine. Bradford Hill (1962, 11), for example, a pioneer in ran-
domized trials in England, argued that physicians must take into account the
fact that “one man’s meat is another man’s poison.” Moreover, it is well rec-
ognized in medicine that differences between institutions or populations
studied can have important effects on the outcomes of treatments (e.g., see
Fleiss 1982). This is one reason why in medicine there are often multicenter
trials in which there is careful coordination of protocols and cooperation in
analysis of outcomes (Borok et al. 1994; Friedman, Furberg, and DeMets
1985; Hill 1962; Stanley, Stjernsward, and Isley 1981). Such studies provide
an experimental basis for looking at the complexity of treatment/context
interactions.
Nonetheless, it is true that in evaluations of crime and justice interven-
tions, there has been little concern with interactions between treatment and
context (e.g., through the use of block randomization methods), and the idea
of multisite trials has only recently been raised (see Weisburd and Taxman
2000). Although there is growing evidence of the success of experimentation
in evaluating treatments and outcomes in crime and justice, concerns
Mounting randomized clinical trials requires machinery for ethical, scientific, financial
and service management scrutiny and approval. In medical science, these processes have
been defined over many years and have facilitated randomized trials not only of medical
treatment with pharmaceutical products but also trials of surgical interventions, such as
endoscopic surgery. (p. 307)
CONCLUSIONS
justice and clinical practice will have to be altered if we are to carry out suc-
cessful trials in crime and justice settings. The priorities of criminal justice
funders will also have to be reoriented to reflect the moral imperative for
experimental study. History shows that the development of experimentation
and the wide diffusion of these methods demands the support of funding
agencies. Finally, there is a necessity for concentrating on a more compre-
hensive infrastructure for supervising the conduct of randomized experi-
ments in crime and justice and dealing with methodological problems spe-
cific to this field. Such an infrastructure is crucial if experiments in crime
justice are to be as varied and complex as the social contexts in which they are
found.
My call for a new infrastructure for experimental crime and justice study
is certainly ambitious. But it is in some sense simply a fulfillment of our ethi-
cal and professional responsibilities to the wider community which is
affected by crime and justice problems. Not to develop a comprehensive
experimental crime and justice is to rely on less valid methods in answering
important public policy questions. To tolerate this situation strikes me as an
ethical breach, but one that is today shared by researchers, practitioners, and
institutions that are responsible for crime and justice evaluation.
NOTE
1. In practice, there may be relationships between other causes and treatment, although using
this logic we assume that such relationships are chance ones and likely to be balanced out in
terms of negative and positive bias.
REFERENCES
Bauman, K., A. Tsui, and C. Viadro. 1994. Use of true experimental designs for family planning
program evaluation: Merits, problems and solutions. International Family Planning Per-
spectives 20 (3): 111-13.
Baunach, P. J. 1980. Random assignment in criminal justice research—Some ethical and legal
issues. Criminology 17 (4): 435-44.
Bergstrom, K. R. 1985. Police experimentation with civilian subjects—Formalizing the infor-
mal. In Police leadership in America, edited by W. Geller, 444-48. Westport, CT: Praeger.
Borok, G., D. Reuben, L. Zendle, D. Ershoff , G. Wolde-Tsadik, L. Rubenstein, V. Ambrosini,
L. Fishman, and J. Beck. 1994. Rationale and design of a multi-center randomized trial of
comprehensive geriatric assessment consultation for hospitalized patients in a HMO. Jour-
nal of the American Geriatric Society 42:536-44.
Geis, G. 1967. Ethical and legal issues in experimentation with offender populations. In
Research in correctional rehabilitation. Washington, DC: Joint Commission on Correc-
tional Manpower and Training.
Graebsch C. 2000. Legal issues of randomized experiments on sanctioning. Journal of Crime
and Delinquency 46 (2): 271-82.
Heckman, J., and J. A. Smith. 1995. Assessing the case for social experiments. Journal of Eco-
nomic Perspectives 9 (2): 85-110.
Hill, B. 1962. Principles of medical statistics. New York: Oxford University Press.
Hirschel, D., I. Hutchison III, C. Dean, J. Kelley, and C. Pesackis. 1990. Charlotte Spouse
Assault Replication Project: Final report. Washington, DC: National Institute of Justice.
Lempert, R. O., and C. A. Visher. 1988. Randomized field experiments in criminal justice agen-
cies. Washington, DC: National Institute of Justice.
McCord, J. Forthcoming. Cures that harm: Unanticipated outcomes of crime prevention pro-
grams. Annals of the American Academy of Political and Social Science.
Mitroff, I. 1983. Beyond experimentation. In Handbook of social intervention, edited by
E. Seidman. Beverly Hills, CA: Sage.
Morrison, K. 2001. Randomized controlled trials for evidence-based education: Some problems
in judging “what works.” Evaluation and Research in Education 15 (2): 69-83.
Nuttall, C. 2003. The Home Office and random allocation experiments. Evaluation Review 27 (3):
267-89.
Oakley, A. 2000. Historical perspective on the use of randomized trials in social science settings.
Crime & Delinquency 46 (3): 315-29.
Palmer, T., and A. Petrosino. 2003. The “experimenting agency”: The California Youth Author-
ity Research Division. Evaluation Review 27 (3): 228-66.
Pate, A., E. Hamilton, and S. Annan. 1991. Metro-Dade Spouse Abuse Replication Project:
Draft final report. Washington, DC: Police Foundation.
Pawson, R., and N. Tilley. 1997. Realistic evaluation. London: Sage.
Petersilia, J. 1989. Implementing randomized experiments: Lessons from BJA’s Intensive
Supervision Project. Evaluation Review 13 (5): 435-58.
Petrosino, A. 1997. “What works?” revisited again: A meta-analysis of randomized field experi-
ments in individual level interventions. Unpublished dissertation. Ann Arbor, MI: Univer-
sity Microfilms.
. 1998. A survey of 150 randomized experiments in crime reduction: Some preliminary
findings. Forum (Justice Research and Statistics Association) 16 (1): 7-8.
Petrosino, A., R. Boruch, H. Soydan, L. Duggan, and J. Sanchez-Meca. 2001. Meeting the chal-
lenges of evidence-based policy: The Campbell Collaboration. Annals of the American
Academy of Political and Social Science 578:14-34.
Petrosino, A., C. Turpin-Petrosino, and J. Finckenauer. 2000. Well-meaning programs can have
harmful effects! Lessons from experiments of programs such as Scared Straight. Crime &
Delinquency 46 (3): 354-79.
Shadish, W., T. Cook, and D. Campbell. 2002. Experimental and quasi-experimental designs for
generalized causal inference. Boston: Houghton Mifflin Company.
Shepherd, J. P. 2003. Explaining feast or famine in randomized field trials. Evaluation Review
27 (3): 290-315.
Sherman, L. 1992. Policing domestic violence: Experiments and dilemmas. New York: The Free
Press.
Sherman, L., and R. Berk. 1984. The specific deterrent effects of arrest for domestic assault.
American Sociological Review 49 (2): 261-72.
Sherman, L., and E. Cohn. 1989. The impact of research on legal policy: The Minneapolis
Domestic Violence Experiment. Law and Society Review 23 (1): 117-44.
Sherman, L., and D. Rogan. 1995. Deterrent effects of police raids on crack houses: A random-
ized, controlled experiment. Justice Quarterly 12:755-81.
Sherman, L., and D. Weisburd. 1995. General deterrent effects of police patrol in crime “hot
spots”: A randomized controlled trial. Justice Quarterly 12 (4): 625-48.
Sieber, J. Ed. 1982. The ethics of social research. New York: Springer-Verlag.
Stanley, K., M. Stjernsward, and M. Isley. 1981. The conduct of a cooperative clinical trial. New
York: Springer-Verlag.
Weisburd, D. 2000. Randomized experiments in criminal justice policy: Prospects and prob-
lems. Crime & Delinquency 46 (2): 181-93.
Weisburd, D., and L. Green. 1995. Policing drug hot spots: The Jersey City Drug Market Analysis
Experiment. Justice Quarterly 12:711-35.
Weisburd, D., C. Lum, and A. Petrosino. 2001. Does research design affect study outcomes in
criminal justice? Annals of the American Academy of Political and Social Science 578:50-70.
Weisburd, D., L. Sherman, and A. Petrosino. 1990. Registry of randomized experiments in sanc-
tions. Washington, DC: National Institute of Justice Report.
Weisburd, D., and F. Taxman. 2000. Developing a multi-center randomized trail in criminology:
The case of HIDTA. Journal of Quantitative Criminology 16 (3): 315-40.
White, K., and J. Pezzino. 1986. Ethical, practical and scientific considerations of randomized
experiments in early childhood special education. Topics in Early Childhood Special Educa-
tion 6 (3): 100-16.
Wilkinson, L., and Task Force on Statistical Inference. 1999. Statistical methods in psychology
journals: Guidelines and explanations. American Psychologist 54:594-604.
David Weisburd holds a joint appointment at the Hebrew University and the University of
Maryland.