Download as pdf or txt
Download as pdf or txt
You are on page 1of 192

DOI: 10.1002/cl2.

1158

UPDATED SYSTEMATIC REVIEWS

Multisystemic Therapy® for social, emotional, and


behavioural problems in youth age 10 to 17: An updated
systematic review and meta‐analysis

Julia H. Littell1 | Therese D. Pigott2 | Karianne H. Nilsen3 | Stacy J. Green4 |


Olga L. K. Montgomery5

1
Graduate School of Social Work and Social
Research, Bryn Mawr College, Bryn Mawr, Abstract
Pennsylvania, USA
2 Background: Multisystemic Therapy® (MST®) is an intensive, home‐based inter-
School of Public Health, Georgia State
University, Atlanta, Georgia, USA vention for families of youth with social, emotional, and behavioural problems. MST
3
Regional Centre for Child and Adolescent therapists engage family members in identifying and changing individual, family, and
Mental Health, Eastern and Southern Norway
(RBUP), Oslo, Norway environmental factors thought to contribute to problem behaviour. Intervention
4
Counseling and Psychological Services, may include efforts to improve communication, parenting skills, peer relations,
Swarthmore College, Swarthmore,
school performance, and social networks. MST is widely considered to be a well‐
Pennsylvania, USA
5
Richmond, Virginia, USA
established, evidence‐based programme.
Objectives: We assessed (1) impacts of MST on out‐of‐home placements, crime and
Correspondence
Julia H. Littell, Graduate School of Social Work
delinquency, and other behavioural and psychosocial outcomes for youth and
and Social Research, Bryn Mawr College, families; (2) consistency of effects across studies; and (3) potential moderators of
Bryn Mawr, PA 19010, USA.
Email: jhlittell@gmail.com and
effects including study location, evaluator independence, and risks of bias.
jlittell@brynmawr.edu Search Methods: Searches were performed in 2003, 2010, and March to April 2020.
We searched PsycINFO, MEDLINE, ERIC, NCJRS Abstracts, ProQuest and World-
CAT dissertations and theses, and 10 other databases, along with government and
professional websites. Reference lists of included articles and research reviews were
examined. Between April and August 2020 we contacted 22 experts in search of
missing data on 16 MST trials.
Selection Criteria: Eligible studies included youth (ages 10 to 17) with social,
emotional, and/or behavioural problems who were randomly assigned to licensed
MST programmes or other conditions. There were no restrictions on publication
status, language, or geographic location.
Data Collection and Analysis: Two reviewers independently screened 1802 titles and
abstracts, read all available study reports, assessed study eligibility, and extracted data
onto structured electronic forms. We assessed risks of bias (ROB) using modified ver-
sions of the Cochrane ROB tool and What Works Clearinghouse standards.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium,
provided the original work is properly cited.
© 2021 The Authors. Campbell Systematic Reviews published by John Wiley & Sons Ltd on behalf of The Campbell Collaboration.

Campbell Systematic Reviews. 2021;17:e1158. wileyonlinelibrary.com/journal/cl2 | 1 of 192


https://doi.org/10.1002/cl2.1158
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 192 | LITTELL ET AL.

Where possible, we used random effects models with inverse variance weights to
pool results across studies. We used odds ratios for dichotomous outcomes and
standardised mean differences for continuous outcomes. We used Hedges g to ad-
just for small sample sizes. We assessed the heterogeneity of effects with χ2 and I2.
Pairwise meta‐analyses are displayed in forest plots, with studies arranged in sub-
groups by location (USA or other country) and investigator independence. We
provide separate forest plots for conceptually distinct outcomes and endpoints. We
assessed differences between subgroups of studies with χ2 tests.
We generated robust variance estimates, using correlated effects (CE) models
with small sample corrections to synthesise all available outcome measures within
each of nine outcome domains. Exploratory CE analyses assessed potential mod-
erators of effects within these domains.
We used GRADE guidelines to assess the certainty of evidence on seven primary
outcomes at one year after referral.
Main Results: Twenty‐three studies met our eligibility criteria; these studies in-
cluded a total of 3987 participating families. Between 1983 and 2020, 13 trials were
conducted in the USA by MST program developers and 10 studies were conducted
by independent teams (three in the USA, three in the UK, and one each in Canada,
the Netherlands, Norway, and Sweden).
These studies examined outcomes of MST for juvenile offenders, sex offenders,
offenders with substance abuse problems, youth with conduct or behaviour pro-
blems, those with serious mental health problems, autism spectrum disorder, and
cases of child maltreatment. We synthesised data from all eligible trials to test the
claim that MST is effective across clinical problems and populations.
Most trials compared MST to treatment as usual (TAU). In the USA, TAU con-
sisted of relatively little contact and few services for youth and families, compared
with more robust public health and social services available to youth in other high‐
income countries. One USA study provided “enhanced TAU” to families in the
control group, and two USA studies compared MST to individual therapy for youth.
The quality of available evidence for MST is mixed. We identified high risks of
bias due to: inadequate randomisation procedures (in 9% of studies); lack of com-
parability between groups at baseline (65%); systematic omission of cases (43%);
attrition (39%); confounding factors (e.g., between‐group differences in race, gender,
and attention; 43%); selective reporting of outcomes (52%); and conflicts of interest
(61%). Most trials (96%) have high risks of bias on at least one indicator.
GRADE ratings of the quality of evidence are low or moderate for seven primary
outcomes, with high‐quality evidence from non‐USA studies on out‐of‐home
placement.
Effects of MST are not consistent across studies, outcomes, or endpoints. At one
year post randomisation, available evidence shows that MST reduced out‐of‐home
placements in the USA (OR 0.52, 95% confidence interval [CI] 0.32 to 0.84; P < .01), but
not in other countries (OR 1.14, CI 0.84 to 1.55; P = .40). There is no overall evidence of
effects on other primary outcomes at one year. When we included all available out-
comes in CE models, we found that MST reduced placements and arrests in the USA,
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 3 of 192

but not in other countries. At 2.5 years, MST increased arrest rates in non‐USA
countries (OR 1.27, CI 1.01 to 1.60; P = .04) and increased substance use by youth in
the UK and Sweden (SMD 0.13, CI −0.00 to 0.27; P = .05). CE models show that MST
reducesd self‐reported delinquency and improved parent and family outcomes, but
there is no overall evidence of effects on youth symptoms, substance abuse, peer
relations, or school outcomes. Prediction intervals indicate that future studies are likely
to find positive or negative effects of MST on all outcomes.
Potential moderators are confounded: USA studies led by MST developers had
higher risks of bias, and USA control groups received fewer services and had worse
outcomes than those in independent trials conducted in other high‐income coun-
tries. The USA/non‐USA contrast appears to be more closely related to effect sizes
than than investigator independence or risks of bias.
Authors' Conclusions: The quality of evidence for MST is mixed and effects are
inconsistent across studies. Reductions in out‐of‐home placements and arrest/con-
viction were observed in the USA, but not in other high‐income countries. Studies
that compared MST to more active treatments showed fewer benefits, and there is
evidence that MST may have had some negative effects on youth outside of the USA.
Based on moderate to low quality evidence, MST may reduce self‐reported de-
linquency and improve parent and family outcomes, but there is no overall evidence
of effects on youth symptoms, substance abuse, peer relations, or school outcomes.

1 | P L AI N LAN G U A G E S U M M A RY

What is the aim of this review?


1.1 | Effects of Multisystemic Therapy® are
This Campbell updated systematic review
inconsistent within and across studies
and meta‐analysis synthesised data from
all eligible trials to test the claim that
Twenty‐three randomised controlled trials provided evidence of ef-
Multisystemic Therapy® is effective across
fects of Multisystemic Therapy® (MST®) compared with treatment
clinical problems and populations.
as usual (TAU) or other treatments for youth with social, emotional,
and behavioural problems. The quality of this evidence is uneven. It
shows that effects of MST vary across studies, settings, outcomes,
and endpoints.
1.1.2 | What studies are included?

1.1.1 | What is this review about? Included studies examined outcomes of MST for juvenile offenders, sex
offenders, offenders with substance abuse problems, youth with con-
MST® is an intensive, home‐based intervention for families duct or behaviour problems, those with serious mental health problems,
of youth with social, emotional, and behavioural problems. autism spectrum disorder, and cases of child maltreatment.
MST therapists engage family members in identifying and This review summarises findings from 23 randomised controlled
changing individual, family, and environmental factors thought trials of the effects of MST. These trials were conducted in the USA,
to contribute to problem behaviour. Intervention may include UK, Canada, the Netherlands, Norway, and Sweden.
efforts to improve communication, parenting skills, peer rela- Most trials compared MST to TAU. In the USA, TAU consisted of
tions, school performance, and social networks. MST is relatively little contact and few services for youth and families,
widely considered to be a well‐established, evidence‐based compared with more robust public health and social services avail-
programme. able to youth in other high‐income countries. One USA study pro-
We synthesised data from all eligible trials to test the claim that vided “enhanced TAU” to families in the control group, and two USA
MST is effective across clinical problems and populations. studies compared MST to individual therapy for youth.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 192 | LITTELL ET AL.

1.1.3 | What are the main findings of this review? 1.1.6 | How up‐to‐date is this review?

Available evidence shows that MST reduced rates of out‐of‐home The review authors searched for studies that were reported through
placement and arrest or conviction in the USA, but not in March 2020.
other countries. Moderate to low quality evidence shows that
MST had positive effects on self‐reported delinquency and parent
and family functioning, but we found no evidence of overall im- 2 | BACKGROUND
pacts on youth symptoms, substance abuse, peer relations, or
school outcomes. Prediction intervals indicate that future studies 2.1 | Description of the condition
are likely to find positive or negative effects of MST on all
outcomes. Social, emotional, and behavioural problems affect young people's
functioning in their homes, schools, peer groups, and other social and
community settings. Beyond the internal and external struggles that
1.1.4 | What is the quality of the evidence? often arise in adolescence (e.g., moodiness, angst, and interpersonal
conflict), the social, emotional, and behavioural problems of interest
The quality of evidence for MST is mixed. There was only one pro- here are mental health disorders, crime, and delinquency. These problems
spectively registered trial with complete reporting on all planned can have immediate, negative, and lasting consequences for youth and
outcomes and endpoints. Nineteen trials (83%) had missing data on others, and may lead to long‐term difficulties and disabilities in adult-
subgroups, outcomes, or endpoints. hood (Fergusson 2007, Narusyte 2016, WHO 2020). They pose risks
We identified high risks of bias due to: inadequate randomisation and have costs for individuals, families, and society. As such, these
procedures, lack of comparability between groups at baseline; sys- problems are of concern to professionals working in mental health,
tematic omission of cases; attrition; confounding factors, such as juvenile justice, school, child welfare, and community settings.
between‐group differences in race, gender, and attention; selective Mental health disorders in youth include: conduct disorder, op-
reporting of outcomes; and conflicts of interest. positional defiant disorder, anxiety, depression, substance use dis-
Most MST trials (96%) had high risks of bias on at least one orders, attention‐deficit/hyperactivity disorder (ADHD), obsessive‐
indicator. GRADE ratings of the quality of evidence for seven primary compulsive disorder, posttraumatic stress disorder, and pervasive
outcomes are low to moderate, with high quality evidence on out‐of‐ developmental disorders, such as autism spectrum disorders
home placements from non‐USA studies. USA studies led by MST (APA 2013). Many of these problems are classified in two broad
developers had higher risks of bias, and USA control groups received spectrums: internalising (depressive, anxious, somatic) or externalis-
fewer services and had worse outcomes (more out‐of‐home place- ing (impulsive, disruptive, aggressive, rule‐breaking) behaviours
ments and arrests) than those in independent trials conducted in (Achenbach 2016). Externalising behaviours include crime, de-
other high‐income countries. Although these moderators are con- linquency, and problematic sexual behaviour.
founded, the USA/non‐USA contrast appears to be more closely re- The prevalence of social, emotional, and behavioural problems among
lated to variations in effects across studies than investigator youth varies across counties and over time. The detection of these
independence or risks of bias. problems is affected by the methodologies used, and by cultural norms
that affect their expression and societal responses. Across community
surveys conducted in many countries, approximately one‐quarter of
1.1.5 | What are the implications for research and youth experienced at least one mental disorder in the past year, with
policy? one‐third of children and youth having experienced a mental disorder at
some point in their lives (Merikangas 2009). Anxiety disorders are most
Our results stand in stark contrast to many previous reports and common in youth, followed by behaviour disorders, mood disorders, and
reviews on MST. Although most MST trials produced a mixture of substance use disorders. The prevalence and expression of some mental
positive, negative, and null findings, many reports focused health disorders varies by gender and age. Anxiety and mood disorders
selectively on positive, statistically significant results instead of all are more common in girls, while boys have higher rates of behaviour
results. disorders, and substance use disorders are equally common in girls and
Careful appraisal of study methods and risks of bias was lacking boys around the world (Merikangas 2009; boys have higher rates of
in many published reports and reviews. Some investigators and many substance abuse in the USA, Merikangas 2010). ADHD and anxiety
reviewers failed to consider alternative plausible explanations for disorders may begin in childhood, while the onset of conduct disorder
results that appeared to favour MST (e.g., lack of comparability of often occurs at early adolescence, and mood disorders tend to begin in
groups at baseline; differential attrition; confounding influences of late adolescence (Merikangas 2009). Half of all mental health conditions
race, gender, and additional attention paid to MST cases; and selec- begin in childhood and adolescence, but most cases go undetected and
tive reporting of results). untreated (WHO 2020).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 5 of 192

There are substantial cross‐national variations in societal re- Efforts to prevent and treat social, emotional, and behavioural
sponses to juvenile crime and delinquency. Some counties have no problems in youth are vital, because “the consequences of not ad-
minimum age of criminal responsibility, while others set minimums dressing adolescent mental health conditions extend to adulthood,
that range from 6 to 18 years of age (Hazel 2008). This means that impairing both physical and mental health and limiting opportunities
some countries do not arrest, convict, or detain young people who to lead fulfilling lives as adults” (WHO 2020).
violate the law. In contrast, the number of arrests of youth (under 18
years of age) in the USA peaked at almost 2.7 million in 1996 and
dropped to 728,000 in 2018 (the lowest level in four decades, Puz- 2.2 | Description of the intervention
zanchera 2020). Alternative approaches, including restorative justice,
are becoming more common in many countries. The use of secure MST® is a multifaceted, short‐term, home‐ and community‐based
custody (detention and incarceration) for juvenile offenders is vir- intervention for families of youth with severe psychosocial and be-
tually nonexistent in some jurisdictions, while there are over 48,000 havioural problems. Based on social ecological and family systems
young people in state custody on any given day in the USA (down theories, and on research on the causes and correlates of serious
from almost 109,000 in 2000, Sawyer 2019). antisocial behaviour in youth (Henggeler 1998, Henggeler 2002a),
Youth may receive services to address social, emotional, and beha- MST was designed to address complex psychosocial problems and
vioural problems from paediatricians, schools, community service or- provide alternatives to out‐of‐home placement of children and youth.
ganisations, and mental health specialists. Only about half of youth with The conceptual framework for MST was derived from reviews of
current mental disorders receive specialist mental health treatment, and research on juvenile delinquency and other psychosocial problems in
ethnic minority youth are unlikely to receive any mental health services childhood and adolescence that point to the influences of a variety of
(Merikangas 2009). In the USA, approximately one‐third (36%) of ado- individual, family, school, peer, neighbourhood, and community
lescents with mental health disorders receive services for these pro- characteristics (Fraser 1997a, Henggeler 1998). MST program de-
blems; service rates are higher for those with ADHD and behaviour velopers argued that, if these problems are multidetermined, “it fol-
disorders than for youth with other mental health problems; treatment lows that effective interventions should be relatively complex,
is more likely in cases with severe or comorbid disorders; and Black and considering adolescent characteristics as well as aspects of the key
Hispanic youth are less likely than others to receive services for anxiety systems in which adolescents are embedded” (Henggeler 1995,
disorders, mood disorders, and ADHD even when those conditions re- p. 116). They noted that this is consistent with social ecological
sult in severe impairment (Merikangas 2011). theories of human development (e.g., Bronfenbrenner 1979), in which
Compared with their peers, youth with mental health problems behaviour is viewed as a product of reciprocal interactions between
have poorer school attendance, lower grades, and lower rates of high individuals and their social environments, and with family systems
school completion. For most youth, however, the course of mental theories, in which children's behaviours are thought to reflect more
health distress is episodic, not permanent (youth.gov/youth‐topics/ complex family interactions (Haley 1976, Minuchin 1974).
youth‐mental‐health/how‐mental‐health‐disorders‐affect‐youth). As described by its developers (Henggeler 1998, Henggeler
According to the WHO, mental health conditions account for 16% 2002a, Henggeler 2009), MST uses a “family preservation service
of the global burden of disease and injury in people age 10 to 19 delivery model” that provides time‐limited services (4 to 6 months) to
(WHO 2020). Mental health and substance use disorders account for the entire family. Treatment teams consist of professional therapists
one‐quarter of all years lived with disability (YLD; Erskine 2015). These and crisis caseworkers, who are supervised by clinical psychologists
problems are the leading cause of disability (measured in disability‐ or psychiatrists. Therapists are mental health professionals with
adjusted life years, DALYs) among children and youth in high‐income Master's or doctoral degrees; they have small caseloads and are
countries, but rank seventh in causes of disabilities (DALYs) in low‐ and available to program participants 24 hours a day, 7 days a week.
middle‐income countries (after infectious diseases, nutritional defi- Treatment is individualized to address specific needs of youth and
ciencies, injuries, and other causes; Erskine 2015). families, and includes work with other social systems including
The long‐term sequelae of these problems in adolescence are not schools and peer groups (hence, the term multisystemic). Treatment
well documented. A longitudinal study shows that internalising and may focus on cognitive and/or behavioural change, communication
externalising behaviours in youth increase the risks of work in- skills, parenting skills, family relations, peer relations, school perfor-
capacity (sickness absence and disability pensions) among young mance, and/or social networks.
adults in Sweden (Narusyte 2016). In New Zealand, the frequency of Clinical features of MST include a comprehensive assessment of
major depression in adolescence is associated with adverse mental child development, family interactions, and family members' interac-
health and economic outcomes, including welfare dependence and tions in other social systems. Interviews with family members usually
unemployment in early adulthood (Fergusson 2007). In the USA, take place in the family's home. In consultation with family members,
psychological and behavioural problems in youth are associated with the therapist identifies a well‐defined set of treatment goals. Tasks re-
higher unemployment (Carter 2019), lower educational achieve- quired to accomplish these goals are identified, assigned to family
ments, and lost income in adulthood (Smith 2010). Additional costs members, and monitored in regular family sessions that occur at least
may be associated with increased service use. once a week, sometimes daily, in the family's home.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 192 | LITTELL ET AL.

MST programmes are licensed by MST Services, LLC (www. illustrates iterative steps in assessment, goal setting, and intervention:
mstservices.com). MST Institute (MSTI.org) is a nonprofit organisa- Therapists and clients link the reasons for the referral to outcomes
tion that provides web based information and quality assurance tools desired by family members and other key participants, in order to
to programmes implementing MST. Considerable attention has been identify overarching goals. Emphasis is on understanding reasons for
paid to the transportability and dissemination of MST, and to the referral and the factors that contribute to or maintain those problems.
fidelity of MST replications (e.g., Henggeler 2002b, Schoenwald 2000, This is MST's conceptualisation of “fit” and it is developed in an “en-
Schoenwald 2001). vironment of alignment and engagement of family and key participants”
MST has most often been employed to address conduct disorder, (Henggeler 2002a, p. 18). Therapists look for factors that might provide
delinquency, problem sexual behaviours, and serious mental health is- maximum leverage to achieve goals and for potential barriers to suc-
sues. In recent years, MST has developed specialised programmes to cess. Immediate goals are prioritised, intervention is developed and
address needs of different clinical populations (MST Services 2019). implemented, progress and barriers are assessed, and the situation is re‐
These programmes focus on families with child abuse and neglect (MST‐ evaluated, leading back to reassessment of the “fit” (Henggeler 2002a,
CAN), those involved in juvenile drug court (MST‐JDC), youth with p. 18). Throughout this process therapists are encouraged to develop
problem sexual behaviour (MST‐PSB), youth with psychiatric needs and test hypotheses about the causes and solutions to problems.
(MST‐psychiatric), youth with autism spectrum disorder and disruptive “Random acts of intervention are therefore minimised, and the
behaviours (MST‐ASD), and other populations. likelihood of rapid treatment progress and sustainability of treatment
goals is increased” (Henggeler 2002a, p. 37).
Although well articulated, MST's principles and analytic process
2.3 | How the intervention might work (the “do loop”) are not unique to MST. Some observers note that
these are the hallmarks of good social work or social casework: a
MST does not have a unique set of intervention techniques; instead, strengths orientation; involvement of clients in treatment planning;
intervention strategies are integrated from other pragmatic, hypothesis development and testing; and an iterative process of goal
problem‐focused treatment models including strategic family ther- setting, treatment planning, implementation, and evaluation. Fur-
apy, structural family therapy, and cognitive behaviour therapy thermore, there is considerable overlap with other systemic inter-
(Henggeler 1995, p. 121). According to its developers, “Multisystemic ventions for youth with disruptive behaviours, as these interventions
therapy is distinguished from other intervention approaches by its share many common elements (van der Pol 2019).
comprehensive conceptualisation of clinical problems and the multi‐ Markham noted that there is limited guidance in MST manuals
faceted nature of its interventions” (Henggeler 1995, p. 121). about how clinicians are to decide which factors are most directly
MST follows nine principles (paraphrased below): related to problem behaviours, and these choices clearly impact de-
cisions about which treatments to use (Markham 2016). There is an
1. Understand the “fit” between identified problems and the broader underlying assumption that change can occur quickly, although many
systemic context; of the difficulties experienced by these families have persisted over
2. Emphasise the positive, using systemic strengths as levers for many years (Markham 2016, p. 12).
change; Much attention has been paid to the issue of fidelity to MST
3. Promote responsible behaviour and decrease irresponsible principles and processes, as these are thought to be essential for
behaviour among family members; success. The MST Treatment Adherence Measure (TAM) is routinely
4. A present‐focused, action‐oriented approach that targets specific, collected from MST clients, and several studies have shown that TAM
well defined problems; scores are positively correlated with treatment outcomes. The pro-
5. Target behaviour sequences within and between multiple systems blem here is that the TAM does not have face validity; it measures
that maintain identified problems; well known predictors of success across treatments, not adherence
6. Use developmentally appropriate interventions that fit develop- to MST per se. Items on this scale measure therapeutic alliance
mental needs of youth; (Lange 2017, Lange 2018), client engagement (Tan 2017), and client
7. Require daily or weekly effort by family members; satisfaction (sample items are: “My family and the therapist worked
8. Continuous evaluation of intervention from multiple perspectives, together effectively”, “Family members and the therapist agreed
with providers assuming accountability for overcoming barriers to upon the goals of the session”, “The therapist recommended that
success; family members do specific things to solve our problems”, “The
9. Promote treatment generalisation and long‐term maintenance of therapist's recommendations should help family members to become
change by empowering caregivers to address family needs across more responsible”, and “The session was lively and energetic”;
multiple systemic contexts (Henggeler 2002a, p. 20). Schoenwald 2000, p. 88). Of course, better therapeutic alliance, client
engagement, and client satisfaction predict more positive outcomes,
At the beginning of each case, MST therapists aim to develop clear but this is true in any intervention. To our knowledge, no study has
and measurable goals in collaboration with family members and other compared TAM scores from MST clients with TAM scores from cli-
community agencies. The “MST analytical process”‐‐or “do loop”‐‐ ents receiving another intervention, in order to demonstrate whether
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 7 of 192

the TAM detects adherence to MST. It is striking that this simple step Center (FSRC) at the Medical University of South Carolina (MUSC).
in validating an adherence measure has not be conducted, and that Independent trials have been conducted in six countries.
TAM scores are only collected from MST cases in MST trials. Further, Studies have assessed effects of MST on a wide array of out-
TAM scores are not comparable across countries (Lange 2016). comes in diverse samples of youth and families. Outcomes were
As discussed below, much attention has been paid to evidence of measured after treatment and at several follow‐up points. Follow‐ups
the effectiveness and cost‐effectiveness of MST. Based on analysis of range from several months to 22 years after referral. Thus, there is
data from ten studies, Aos and colleagues estimated that MST re- ample evidence to assess the effectiveness of MST across problems,
duced crime outcomes by 10.5%; if accurate, this would translate into populations, outcomes, and endpoints.
substantial benefits to crime victims and tax payers (Aos 2006). In Our previous review included eight trials conducted in the USA,
contrast, Goorden and colleagues reviewed 11 controlled studies of Canada, and Norway. Since then, more than a dozen new trials have
the cost‐effectiveness of family‐based treatments for adolescent been completed in the Netherlands, Sweden, USA, and the UK, and
behaviour disorders, substance abuse, and delinquency, including additional follow‐up data are available on three of the eight studies
eight studies of MST; they concluded that the quality of these eco- included in our earlier review.
nomic evaluations was not sufficient to determine cost‐effectiveness
(Goorden 2016, p. 237; also see NICE 2018).
2.4.2 | Other reviews

2.4 | Why it is important to do this review We identified 417 published reviews of research on the effectiveness
of MST (note that the number of reviews is five times greater than
There is need for effective treatments and support for youth with the number of published studies and 15 times greater than the
social, emotional, and behavioural problems and for their families. number of RCTs identified by MST Services). However, most of these
Hence, there is widespread interest in evidence for programmes in reviews are nonsystematic narratives that do not meet scientific
this area. For more than 20 years MST has been at or near the top of standards for evidence synthesis (e.g., PRISMA, Moher 2009).
most lists of Evidence Based Practices (EBPs) for youth and families Results of MST outcome studies are summarised in nonsyste-
(Hoagwood 2001, Kazdin 1998). It has been characterised as a “well matic reviews of effects of family preservation services (Fraser
established” programme (van der Stouwe 2014) with “excellent evi- 1997b), interventions for child physical and sexual abuse (Swen-
dence” (Kazdin 2015) and, as a result, MST has been widely son 2003), treatment for substance abuse (NIDA 1999), treatment
disseminated. for delinquency and disruptive behaviour in youth (Smith 1997),
According to MST Services LLC, there are more than 500 MST children's mental health services (Burns 2004, Burns 2000, Kaz-
programmes operating in 15 countries and 34 USA states, and more din 1998, Kazdin 2015), and programmes to reduce crime (Aos 2001,
than 200,000 families have received MST services (www.mstservices. US DHHS 2001) and prevent violence (Mihalic 2004). Several re-
com/). MST services are funded by national, state, and local gov- views suggested that MST is one of the most promising empirically
ernments (including Medicaid in the USA), along with funding from based treatments for children and youth (Hoagwood 2001, Kaz-
philanthropic and charitable organisations (www.mstservices.com/ din 1998). One nonsystematic review concluded that MST has posi-
our‐community). tive effects that been replicated “across problems, therapists, and
Widespread dissemination of MST is based on assurances that settings. This shows that the treatment and methods of decision
the program is “scientifically proven” (www.mstservices.com/). This making can be extended and that treatment effects are reliable”
claim deserves a closer look. (Kazdin 1998, pp. 27–28). These conclusions were often repeated. At
least 20 published reviews relied primarily on other reviews of MST
or did not cite any sources (Littell 2008).
2.4.1 | Research base MST trials are included in meta‐analytic reviews of effects of a
wider array of interventions with juvenile offenders (Lipsey 1998),
In 2020, funding for research on MST exceeded $75 million USD family treatment of youth delinquency (Latimer 2001), and family
(MST Services 2020b). According to MST Services LLC, 79 MST and parenting interventions for conduct disorder and delinquency
outcome studies have been published, involving 58,000 families (Woolfenden 2002, Woolfenden 2004). These reviews do not speak
across studies (because reviews of existing studies are included in to the effectiveness of MST per se.
this list, many families are counted more than once). Of these studies, There are seven previous systematic reviews or meta‐analyses of
28 were randomised controlled trials (RCTs) conducted to assess research on effects of MST (not including the earlier version of our
impacts of MST for youth with a wide range of presenting problems review). These reviews are described below and we provide a brief
(including studies of youth with medical problems, which were not assessment of qualities of these reviews, using an adapted version of
included in our review). the AMSTAR tool (Shea 2007), in Table 1.
Most MST trials were conducted in the USA by the developers of Farrington and Welsh reviewed results of six MST trials (five con-
MST, many of whom were based at the Family Services Research ducted in the USA and one in Canada) (Farrington 2003). Their search
T A B L E 1 Assessment of prior systematic reviews and meta‐analyses of research on effectiveness of MST (using AMSTAR, adapted from Shea 2007)
8 of 192

Study
|

Farrington Curtis Lofholm van der Lux Markham


2003 2004 2013 Stouwe 2014 2016 2016 Tan 2017

1. Was an “a priori” design provided? The research question and inclusion criteria should No No No No No No No
be established before the conduct of the review.

2. Was there duplicate study selection and data extraction? There should be at least two Unclear Unclear Yes Yes Yes Yes Unclear
independent data extractors and a consensus procedure for disagreements should be partial partial partial
in place.

3. Was a comprehensive literature search performed? At least two electronic sources No No Yes Unclear Unclear Yes No
should be searched. The report must include years and databases used (e.g. Central,
EMBASE and MEDLINE). Key words and/or MESH terms must be stated and where
feasible the search strategy should be provided. All searches should be supplemented
by consulting current contents, reviews, textbooks, specialized registers or experts in
the particular field of study, and by reviewing references in studies found.

4. Was the status of publication (i.e., grey literature) avoided as an inclusion criterion? Unclear No Yes Yes Yes Yes No
The authors should state that they searched for reports regardless of their publication
type. The authors should state whether or not they excluded any reports (from the
systematic review), based on their publication status, language and so forth.

5. Was a list of studies (included and excluded) provided? A list of included and excluded No No No No No No No
studies should be provided.

6. Were the characteristics of the included studies provided? In an aggregated form such No No Yes Yes No Yes Yes
as a table, data from the original studies should be provided on the participants,
interventions and outcomes. The ranges of characteristics in all the studies analyzed,
for example, age, race, sex, relevant socioeconomic data, disease status, duration,
severity, or other diseases should be reported.

7. Was the scientific quality of the included studies assessed and documented? A priori No No No Yes No Yes No
methods of assessment should be provided (e.g., for effectiveness studies if author(s)
chose to include only randomised, double‐blind, placebo‐controlled studies, or
allocation concealment as inclusion criteria); for other types of studies alternative
items will be relevant.

8. Was the scientific quality of the included studies used appropriately in formulating No No No Noa No No Noa
conclusions? The results of the methodological rigor and scientific quality should be
considered in the analysis and the conclusions of the review, and explicitly stated in
formulating recommendations.

9. Were the methods used to combine the findings of studies appropriate? For the Unclearb Noc N/A Yes Unclear Nod Nod
pooled results, a test should be done to assess their homogeneity (i.e., χ2 test for
homogeneity, I2).
LITTELL
ET AL.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 9 of 192

Incorrect calculation of effect size and variance, inappropriate use of adjustments for small sample bias, inclusion of multiple dependent effect sizes from some samples, and failure to use appropriate weights
strategy and meta‐analytic methods were not fully explained, there was

Tan 2017
no study quality assessment, and no heterogeneity tests were reported.
Based on comparisons of mean effect sizes (ES) and confidence intervals

No

No
(CIs), authors concluded that MST is the most effective family‐based
crime prevention programme (Farrington 2003, p. 143).
Markham

Curtis and colleagues (Curtis 2004) reported results of a meta‐


2016

analysis of seven published studies of effects of MST programmes


No

No
conducted by MST program developers in the USA. Unpublished
studies and those conducted by independent researchers were not
included. This review included studies of abusing or neglectful par-
2016
Lux

ents, juvenile sexual offenders, violent and chronic juvenile offenders,


No

No

substance abusing juvenile offenders, and psychiatrically disturbed


adolescents. Effect sizes (d indexes) and their variance were esti-
Stouwe 2014

(e.g., inverse variance methods) in meta‐analysis. “Meta‐analysis” is a simple arithmetic average of 11 mean effects sizes from seven studies (Littell 2008).
mated incorrectly, and some nonsignificant and negative effects were
van der

ignored (see Littell 2008). Corrections for small sample bias were
Yes

applied to only one study. Curtis and colleagues reported an overall,


No

unweighted effect size of d = 0.55 based on 11 summary effect sizes


from seven studies. The effect sizes in this estimate are not in-
Lofholm

dependent (as they should be), because some samples are re-
2013

No

No

presented twice. Reviewers did not use inverse variance methods or


other methods to adjust for differences in the precision of the esti-
mates. Results appear to be affected by publication bias (cf. Roth-
Curtis
2004

stein 2005), allegiance effects (cf. Luborsky 1999), and estimation


No

No

errors (Littell 2008).


Lofholm and colleagues reviewed results of 13 RCTs of MST to
explore differences in TAU conditions across studies (Lofholm 2013).
Farrington

They found greater variability in recidivism rates between the TAU


Study

2003

groups in these studies than between MST groups. Authors noted


No

No

Unclear if fixed effect or random effects models were used; no heterogeneity tests are reported.

that these differences made it difficult to compare outcomes and


treatment effects across studies.
should include a combination of graphical aids (e.g., funnel plot) and/or statistical tests

11. Was the conflict of interest stated? Potential sources of support should be clearly

Van der Stouwe and colleagues conducted a multilevel meta‐


10. Was the likelihood of publication bias assessed? Assessment of publication bias

analysis of 22 studies of effects of MST for youth with antisocial


behaviour, delinquency, and/or conduct disorders (van der
acknowledged in both the systematic review and the included studies.

Stouwe 2014). They included both randomised controlled trials


and nonrandomised comparison group studies, and unpublished as
well as published studies. Study quality was assessed with a uni‐
dimensional scale (a practice abandoned by Cochrane and other
reviewers, who view study quality as a multidimensional construct;
Jüni 1999, Jüni 2001). Van der Stouwe and colleagues found small,
but statistically significant effects of MST on delinquency, out‐of‐
Vote‐counting was used instead of meta‐analysis.

home placement, substance use, and peer relations; but these ef-
fects were not significant after adjustments were made for pub-
lication bias. Small but statistically significant effects on
Use of a unidimensional quality scale.

psychopathology and family factors were evident, even after ad-


justments for publication biases. There were no significant effects
(e.g., Egger regression test).

on skills or cognitions, and no evidence of publication bias in re-


(Continued)

ports on those outcomes.


Lux conducted a meta‐analysis of 127 effect sizes from 35
unique MST studies (using 44 published and unpublished reports;
Lux 2016). This review did not provide: a full description of the
TABLE 1

search strategy, methods for study selection, a list of excluded


studies, study quality assessment, or discussion of methods used
for moderator analysis. Both RCTs and quasi‐experimental designs
b

d
a

c
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 192 | LITTELL ET AL.

were included. Continuous and dichotomous study ES were con- 3 | OBJECTIVES


verted to correlation coefficients for meta‐analysis. Both fixed and
random effects meta‐analyses were performed (best practice is to • Assess impacts of MST on out‐of‐home living arrangements, crime
select one of these models based on a priori assumptions and delinquency, and other behavioural and psychosocial out-
about which model best fits the distribution of effect sizes; comes for youth and families.
Borenstein 2010). • Assess the consistency (homogeneity) of effects across studies.
Markham provided a systematic review of 11 RCTs of MST • Assess potential moderators of effects including characteristics of
conducted within and outside of the USA (Markham 2016, Mark- studies (e.g., location, independence, risks of bias) and outcome
ham 2018). Picking up where our previous review left off, Mark- measures.
ham included studies published from 2006 to 2014. She noted that
comparisons between these studies are challenging, due to in-
consistencies in reporting on usual services and cultural differ- 4 | METHO DS
ences in the cross‐national transportation of MST. She used
narrative review methods, not meta‐analysis, and concluded 4.1 | Criteria for considering studies for this review
that outcomes for MST “continue to be mixed across studies”
(Markham 2018, p. 67). 4.1.1 | Types of studies
Tan and Fajardo (Tan 2017) reported a systematic review of 12
RCTs on the efficacy of MST. This review was limited to published This review was limited to experimental studies in which partici-
studies, hence it is vulnerable to publication bias. Authors assessed pants were randomly assigned to treatment and comparison
study quality on a uni‐dimensional scale (limitations of this approach groups. Outcome evaluation studies using other group designs
are noted above). Tan and Fajardo presented results in narrative and were identified, but not included. There were no publication or
tabular forms. Instead of conducting meta‐analysis, they used simple language restrictions.
vote‐counting to summarise results across studies (e.g., “2 out of 3
studies showed positive outcomes of MST in reduction of antisocial
behaviour”, Tan 2017, p. 97). 4.1.2 | Types of participants
Problems with vote counting have long been recognised
(Hedges 1980, Gurevitch 2018). The Cochrane Handbook states that, Participants included children and youth (age 10 to 17) with social,
when based on statistical significance or subjective rules, vote emotional, and behavioural problems, and their family members.
counting is an “unacceptable synthesis method” (Higgins 2020). This These youth may have been at risk of out‐of‐home placement. Par-
approach often leads to the wrong conclusions. ticipants included:
As shown in Table 1, none of these reviews had protocols that
were available a priori, none provided a list of excluded studies, none • Abused, neglected, and dependent children and youth at risk of foster
completed thorough study quality or risk of bias (ROB) assessments, care or other out‐of‐home placements in child welfare settings;
none had adequate methods for taking study quality into account, • Children and youth with mental health problems at risk of
and none provided conflict of interest statements. Our previous re- psychiatric hospitalisation; and
view, published in 2005, had most of these features; the present • Delinquent youth at risk of incarceration or placement in
version has all of them. residential treatment settings.
In 2005, we thought it was premature to draw conclusions
about the effectiveness of MST based on inconsistent results from Given these eligibility criteria, programmes for emerging adults
eight trials that varied in quality and context (Littell 2005a, (age 17 to 26) were excluded, as were programmes for youth whose
Littell 2005b). Others have cited more limited evidence with presenting problems were medical in nature (e.g., diabetes, HIV,
weaker review methods and more surety. Even after the publica- obesity, asthma).
tion of more than 400 reviews, questions about the benefits of
MST remain: are effects of MST consistent across populations,
problems, outcomes, and over time? Can variations in effects be 4.1.3 | Types of interventions
explained by study qualities, sample characteristics, intervention
characteristics, comparison conditions, or contexts? Methodolo- MST (as defined above) was compared with any counterfactual
gical weaknesses in many previous reviews limit confidence in the condition, including (a) TAU, (b) an alternative treatment condition
answers they provide. (e.g., individual therapy, group therapy), or (c) no treatment. To be
By updating our systematic review—with evidence from new trials, included in this review, focal programmes had to be licensed MST
additional follow‐up data on old trials, and newer meta‐analytic programmes; other “multisystemic” treatments were not included.
methods—we address unresolved issues and provide more robust es- In recent years, MST developers created specialised programmes
timates of effects of MST on outcomes for youth and families. to address needs of various clinical populations (MST Services 2019).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 11 of 192

In addition to the original version of MST, specialised MST pro- • Family outcomes included out‐of‐home placements of children and
grammes included in our review focused on youth (incarceration, hospitalisation, residential treatment, and
foster care) and qualities of family functioning.
• Child abuse and neglect (MST‐CAN),
• Youth involved in juvenile drug court (MST‐JDC), These outcomes were assessed in a variety of ways, including
• Youth with problem sexual behaviour (MST‐PSB), data extracted from official agency records, self‐reports on stan-
• Youth with psychiatric needs (MST‐psych), and dardised instruments, observational measures, and biologic tests.
• Youth with autism spectrum disorder and co‐occurring disruptive Data on events such as arrest or conviction, out‐of‐home placement,
behaviours (MST‐ASD). and school attendance were often obtained from official agency re-
cords (law enforcement, hospital, school, and child welfare agency
Consistent with our original eligibility criteria, studies were administrative records), although some studies relied on interviews
not included in our review if focal interventions: (a) served youth with caregivers to ascertain children's living arrangements or grades
and families whose problems are primarily medical in nature, (b) in school. Psychosocial outcomes were often assessed on standar-
targeted youth younger than 10 or older than 17 years of age, or dised instruments that were self‐administered or embedded in
(c) combined MST with other treatments. For example; MST has structured interviews. Observational measures were sometimes used
been combined with Contingency Management (CM) for substance to assess certain aspects of family functioning or relationships. A few
abuse; CM is a distinct intervention with its own evidence base studies used biologic measures of substance use; others used self‐
(Blonigen 2015). Thus, we excluded studies of the following reports. Many studies employed multiple data collection procedures,
programmes: which had different potential risks of bias. We conducted separate
risk‐of‐bias assessments for the following types of data:
• MST plus contingency management (MST‐CM) for substance‐
abusing youth; • Data extracted from administrative records, and
• MST‐Building Stronger Families (MST‐BSF) which combines MST‐ • Self‐reports (from youth) and collateral reports (from caregivers or
CAN with Reinforcement Based Therapy (RBT) for parental sub- teachers) on structured instruments.
stance use;
• MST‐Family Integrated Transitions (MST‐FIT) which combines Outcome measures were obtained at varying points in time; some
MST with Motivational Enhancement Therapy (MET), relapse were anchored to the time that had elapsed since random assignment,
prevention, and Dialectical Behaviour Therapy (DBT); others were anchored to the end of treatment. Some studies collected
• BlueSky which includes MST, Functional Family Therapy, and data during or immediately after treatment (4 to 8 months after random
Multidimensional Treatment Foster Care; assignment). Because some cases were still receiving treatment at
• MST plus Community Restitution Apprenticeship Focused Training 8 months, we assessed outcomes in the following categories:
(MST‐CRAFT);
• MST‐Health Care (MST‐HC) for juvenile diabetes which includes • 1 year follow‐up (9–18 months),
medical treatments; • 2.5 year follow‐up (19–40 months), and
• MST for HIV‐positive adolescents (MST‐HIV) which includes • 4 year follow‐up (41–60 months)
medical treatments; and
• MST‐Emerging Adults (MST‐EA) for 17‐ to 26‐year olds with Before beginning our update of this review, we identified the
criminal justice involvement and serious mental health problems. following primary and secondary outcomes.

Primary outcomes
4.1.4 | Types of outcome measures Primary outcomes were:

We examined measures of behavioural, psychosocial, and family • Out‐of‐home placements (e.g., incarceration, detention, hospitali-
outcomes. sation, residential treatment, community foster care),
• Antisocial behaviour (arrest, conviction, self‐reported delinquency),
• Youth behavioural outcomes included antisocial behaviour (evi- • Drug and alcohol use,
denced by arrest, conviction, or sentencing for criminal offences), • Youth psychiatric symptoms (internalizing and externalizing
drug use, and school attendance. behaviours),
• Youth psychosocial outcomes included measures of youth psy- • Qualities of parenting (discipline, supervision, communication), and
chiatric symptoms, self‐reported delinquency, peer relations, and • Family functioning (adaptability, cohesion, conflict‐hostility)
academic performance.
• Parent psychosocial outcomes included parents' psychiatric We identified seven of the most important (and most often
symptoms, parenting behaviours, and social support. studied) primary outcomes for the Summary of Findings Table. These
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 192 | LITTELL ET AL.

outcomes were assessed at one year post random assignment (or • ProQuest Dissertations & Theses Global (formerly Dissertation
with the nearest report available): Abstracts International): all dates (searched April 14, 2020)
• PsycINFO (APA, OVID): 1806 to March Week 4 2020 (searched
1. Out‐of‐home placements, March 29, 2020)
2. Criminal offences (arrests or convictions), • Science Direct (searched February 17, 2011, March 29, 2020)
3. Self‐reported delinquency, • Social Care Online (searched September 17, 2010, March
4. Externalizing behaviours, 29, 2020)
5. Internalizing behaviours, • Social Services Abstracts (ProQuest): 1979 to September 2010
6. Family adaptability, and (searched September 17, 2010), 2010–2020 (searched March
7. Family cohesion. 29, 2020)
• Social Science Citation Index (SSCI, Web of Science): 1900 to
Secondary outcomes March 29, 2020 (searched March 29, 2020)
Secondary outcomes were: school attendance, school performance, • Sociological Abstracts (ProQuest): 1952 to September 2010
peer relations, self‐esteem among young people, along with in- (searched September 17, 2010), 2010–2020 (searched March
dicators of parent's mental health. These outcomes were usually 29, 2020)
reported by youth, parents/caregivers and/or teachers. • Trials (formerly the Cochrane Central Register of Controlled Stu-
We excluded outcomes related to satisfaction with services, life dies or CENTRAL) part of The Cochrane Library, www.
events, civil lawsuits, and outcomes experienced by siblings of the thecochranelibrary.com: 2020 Issue 3 (searched 28 March 2020)
focal young person. • WorldCAT dissertations and theses: all dates (searched February
21, 2011, April 13, 2020)

4.2 | Search methods for identification of studies Two databases were searched for the original review, but not
included in this update: the C2 Spectr database is no longer main-
Search strategies for the original version of this review were re- tained and InfoTrac is not available to us.
ported in Littell 2005a.

4.2.2 | Searching other resources


4.2.1 | Electronic searches
We searched the following websites on April 13, 2020, using search
We searched for new studies in September 2010 and again in March to strings shown in Appendix A.
April 2020. In advance of these searches, we revised our original search
strategies to reflect changes in databases and interfaces, and to increase • MST Services (www.mstservices.com)
the sensitivity of the research design terms. We used the Cochrane • U.S. Department of Health and Human Services
Highly Sensitive Search Strategy for identifying randomised trials for • U.S. National Institutes of Health, RePORTer database (formerly
MEDLINE. Original date restrictions were lifted in order to find any CRISP)
relevant studies which the original search may have missed. Specific • U.S. Centers for Disease Control
search strategies for each database are shown in Appendix A No lan- • U.S. Government Printing Office (gpo.gov)
guage restrictions were applied. We searched the following databases: • UK Home Office

• ASSIA (ProQuest): 1987 to September 2010 (searched September In September 2010 we conducted a Google Scholar search and
17, 2010), 2010–2020 (searched March 31, 2020) examined the top 200 hits. On March 31, 2020, we updated this
• Cambridge University Press Journals Complete: all dates (searched search, limiting the date range to 2010–2020 and using the following
14 April 2010) search string: (multisystemic OR multi‐systemic OR “multi systemic”)
• CINAHL (EbscoHost): 1937 to September 2010 (searched Sep- AND (therapy OR treatment). We examined the top 100 hits. (These
tember 16, 2010), 2010–2020 (searched March 29, 2020) searches were more specific than the Google searches we ran in
• EMBASE Classic+Embase: 1947 to March 27, 2020 (searched January 2003.)
March 28, 2020)
• ERIC (OVID): 1965 to August 2019 (searched March 29, 2020) Personal contacts
• MEDLINE (OVID, R): 1946 to March 26, 2020 (searched March We made personal contacts with MST developers and independent
28, 2020) investigators to identify unpublished reports and ongoing studies,
• National Criminal Justice Reference Service (NCJRS) Abstracts and to request additional information on MST trials. These contacts
Database: 1974 to 24 February 2011 (searched February 24, included Steve Aos, Robert Barnoski, Charles Borduin, Alison Cun-
2011), 2010–2020 (searched March 29, 2020) ningham, Scott Henggeler, Alan Leschied, Mark Lipsey, Marsha
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 13 of 192

Miller, Terge Ogden, Sonja Schoenwald, Knut Sundell, Jane Timmons‐ “first strike” rule: eligibility questions were answered in a pre-
Mitchell, and Bahr Weiss. Initial contacts were made in 2003. Experts determined order and, if a study failed to meet one criterion, that
were contacted in again September and October 2006. reason for exclusion was documented and the screening process was
From April to August 2020 we sought additional information on stopped. It is possible that studies failed to meet additional criteria
16 MST trials from 22 experts: Jessica Asscher, Stephen Butler, that are not documented. Selection decisions were reviewed and
Redonna Chandler, Phillippe Cunningham, Peter Fonagy, Charles disagreements were resolved by the review team.
Glisson, Scott Henggeler, Sarah Hurley, Danielle Jansen, Ava Ro- We summarised results of searches, screening, and eligibility
senroth, Sylvia Rowlands, Valerie Russo, Cindy Schaeffer, Sonja decisions using a PRISMA flowchart. A complete list of excluded
Schoenwald, Kaitlin Sheerin, Ashli Sheidow, Keller Strother, Cynthia studies is provided (see Section 5).
Cupit Swenson, Jane Timmons‐Mitchell, Karin Vermeulen, David
Wagner, and Trisha Wiley.
4.3.2 | Data extraction and management
Cross‐referencing of bibliographies
We retrieved full text reports for 353 reviews and harvested re- Information on study design and implementation, sample character-
levant references from 128 of the most recent reviews. istics, intervention characteristics, and outcomes was extracted from
Citations and abstracts were stored in a group library in Zotero, included studies and coded using a structured data extraction form (see
as were full text reports. Appendix B, Levels 3–5). Two reviewers independently read all reports
associated with an included study and coded information on that study.
Differences between raters were discussed in attempt to resolve any
4.3 | Data collection and analysis discrepancies. When needed, a third rater was consulted.
When we encountered conflicting reports on the number of
For screening purposes, citations were imported from Zotero into cases that had been randomly assigned to treatments within a study,
Excel. Screening and data extraction codes were entered in Excel. we selected the largest credible count. We used this number as the
Analyses were performed in RevMan and R. denominator when calculating rates of attrition over time and in
subsequent reports.
When we encountered conflicting reports on the presence or
4.3.1 | Selection of studies absence of between‐group differences on baseline characteristics, we
relied on accounts that provided descriptive data on those char-
Two reviewers independently screened titles and abstracts identified acteristics at the group level. We used the What Works Clearing-
in the search, using the Level 1 coding scheme shown in Appendix B house criteria for group equivalence on baseline characteristics
to indicate which reports were clearly ineligible (and why) and which (between‐group differences d < 0.25, WWC baseline). We used David
documents should be retrieved. If an abstract was not available, we Wilson's ES calculator to compute the d statistic (using the probit
attempted to retrieve the full text. We made inclusive screening method) to quantify the magnitude of differences between groups.
decisions at this first stage; that is, if either reviewer thought the We extracted information on all outcome measures mentioned in
document might be eligible for our review or if there was not enough the study protocol (if there was one) and in all subsequent reports,
information in the title and abstract to make this decision with regardless of whether outcome data were ever reported. This pro-
confidence, we retrieved the full text. vided us with the information we needed to assess selective re-
Before formally applying our eligibility criteria to each study, we porting of outcomes and missing data.
grouped all documents that belonged to that study. It was important To the extent possible, we extracted data on all primary and
to focus on the study as the main unit of analysis, instead of focusing secondary outcomes at all endpoints, recording data on the timing of
on study reports. We define a study as a set of research procedures the measurement of each outcome. We extracted data on total
that involves a unique sample of participants, a sample which does scores and subscale scores, when both were provided. We recorded
not overlap with samples used in other investigations. This is to avoid data on composite events (e.g., all arrests) and their subtypes (e.g.,
confusion in the narrative, allow more in‐depth analysis of study arrests for violent crimes, arrests for nonviolent crimes) when these
characteristics and methods, and avoid double‐counting of partici- data were provided.
pants in meta‐analysis. Studies often generated multiple reports re- Some studies provided reports on the same outcome at the same
lated to different research questions, subgroups, types of analyses, or endpoint in multiple documents (e.g., preliminary and final reports).
end points; these are not treated as separate studies, if they are To avoid duplication and include the most complete report, we se-
based on the same sample or overlapping subsamples. lected the outcome data with the largest valid n.
Working independently, two reviewers read all of the documents A few studies provided results for observed data along with
that belonged to each study and applied the eligibility criteria to that analyses that used multiple imputation of missing data. When both
study, following the algorithm shown in Appendix B, Level 2. We types of results were available, we extracted results for observed
recorded one reason for exclusion for each excluded study, using a data, because this approach was more common across studies.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 of 192 | LITTELL ET AL.

Multiple imputation made little difference in results of one of the 4.3.3 | Assessment of risks of bias (ROB) in included
largest studies (Fonagy 2018, 2018a, p. 17). When observed results studies
were not available, we extracted data from analyses that used im-
putation, but only if the valid n was >50% of the total sample size. We updated our approach to the assessment of ROB, to incorporate
When data on the full sample was available, we did not extract more explicit methods that had been developed since the publication
data on subgroups (e.g., we did not extract data from analyses limited of the protocol for this review (Littell 2004). We adapted the first
to program completers or recidivists). We did not extract data on version of the Cochrane ROB tool (Higgins 2011) and What Works
outcome measures collected during treatment (<4 months after Clearinghouse standards for baseline equivalence and attrition (WWC
referral). attrition; WWC baseline) and applied these criteria to all studies.
We did not extract data on time‐to‐events or hazard rates, given
our plans to use SMD and OR effect size metrics. Study‐level ROB assessments
We found conflicting reports on some outcomes, despite the fact Random assignment of participants to treatment and control/com-
that the data came from identical samples (same valid n), measures, parison conditions was an inclusion criterion for this review, given its
reporters, and endpoints. When this occurred, we selected reports importance in minimising selection bias in studies of intervention
that provided data needed to calculate effect sizes (e.g., valid ns, effects (Schulz 1995). We rated the adequacy of the random se-
means, SDs for treatment and control groups) and/or more complete quence generation and allocation concealment, using the following
accounts of the details of measurement and analysis. Given their categories.
greater length and some evidence that dissertations exhibit stronger Adequate sequence generation: Investigators described a random
methodologies than published reports in this field (McLeod 2004), we component in the sequence of assignments, such as use of computer
sometimes used data from dissertations instead of published reports. random number generator, table of random numbers, drawing lots or
As indicated above, some studies anchored follow‐ups to the envelopes, coin tossing, shuffling cards, or throwing dice.
beginning of treatment and others anchored follow‐ups to the end of
treatment. For follow‐up periods anchored to end of treatment, we • Yes = Low risk of bias
added six months to the reported endpoint to make these observa- • Unclear risk: insufficient information; random assignment was
tions comparable to those anchored to random assignment. For ex- mentioned, but not described in detail
ample, a one year follow‐up period that begins at the end of • No = High risk: investigators described a nonrandom component in
treatment is estimated to be 18 months after random assignment. the sequence of assignments, such as alternation or rotation, date
For outcomes related to events (e.g., out‐of‐home placement, of birth, date of admission or referral, case record number, clinical
arrest, or conviction), we included reports on events that occurred judgement, client preference, or service availability.
between random assignment and the follow‐up endpoint (whenever
possible) in pairwise meta‐analysis. Some studies did not provide data Adequate allocation concealment: Participants and investigators could
on events that occurred during treatment, so that the observation not foresee assignment, because randomisation was performed at central
period began when treatment ended. Others provided data on events site remote from the trial location or investigators monitored use of
that occurred within discrete intervals (e.g., 0 to 6, 6 to 12, and 12 to assignments contained in sequentially numbered, sealed, opaque
18 months). We could not collapse dichotomous data on events into envelopes.
longer intervals (e.g., 0 to 12 or 0 to 18 months), because we did not
know how many people experienced events within multiple time • Yes = Low risk
periods. • Unclear risk: insufficient information (e.g., random assignment was
When studies provided two estimates of outcomes within the mentioned, but not described in detail) or adequacy of conceal-
same interval used in our pairwise meta‐analysis (e.g., observations at ment was unclear (e.g., use of coin toss, card shuffle, dice, envel-
18 and 30 months post random‐assignment both fit our criteria for opes with unspecified characteristics)
the 2.5 year observation period), we selected the estimate with the • No = High risk: allocation was not adequately concealed; for ex-
largest valid n or (if valid ns were identical) the longest observation. ample, investigators used open random number lists, transparent
For continuous data on events that occurred within specific time or unsealed envelopes, or quasi‐randomisation methods such as
intervals, we were able to aggregate data across intervals when the alternation or rotation, date of birth, date of admission or referral,
valid ns for those intervals were identical. The cumulative mean is case record number, or service availability.
computed by adding group means for all relevant intervals (e.g., mean
number of offences in 0 to 6 months + mean number of offences in > Random assignment does not always produce groups that are
6 to 12 months = mean for 0 to 12 months). The corresponding comparable on important characteristics at baseline. The law of
standard deviation is calculated by adding the variances for each large numbers suggests that the risk of baseline imbalance is
interval and taking the square root of the sum of the variances. greater in studies with small samples. Because we encountered
Authors of studies with missing data were contacted and some trials (of various sizes) with large between‐group differences on
additional data were obtained as a result. important characteristics, such as race and referral source, we
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 15 of 192

used What Works Clearinghouse criteria to assess baseline Intention‐to‐treat analysis: Data were analysed according to par-
equivalence (WWC baseline). ticipants’ initial group assignment, regardless of whether assigned
Baseline equivalence: Initial differences between groups were services were received or completed.
small or moderate (d < 0.25).
• Yes for all outcomes = Low risk
• Yes = Low risk • Yes for some outcomes = Unclear risk
• Unclear risk: insufficient information (e.g., group‐level on back- • Unclear risk: insufficient information
ground characteristics were not provided, d cannot be computed) • No = High risk.
• No = High risk: there were baseline differences between groups
with d > 0.25. Standardised observation periods: Follow‐up data were
collected from each case at fixed points in time after random
As described below, included studies were also assessed on risks assignment, or analyses included controls for variable observa-
associated with performance bias, attrition bias, detection bias, de- tion periods.
viation from intention‐to‐treat analyses, nonstandardised (variable)
observation periods, unreliable outcome measures, selective report- • Yes for all outcomes = Low risk
ing, and conflicts of interest. • Yes for some outcomes = Unclear risk
Avoidance of performance bias (confounding): No systematic differ- • Unclear risk: insufficient information
ences between groups in levels of care or attention, or in exposure to • No = High risk.
factors other than the interventions of interest (Higgins 2011, 8.4.2).
Validated outcome measures: Use of instruments with
• Yes = Low risk demonstrated reliability (e.g., Chronbach's α > .7, Nunnally
• Unclear risk: insufficient information 1994; Cohen's κ > .7, McHugh 2012) and validity in this
• No = High risk: one group received more attention, care, or sur- sample or similar samples, or use of use of external
veillance than another; or factors likely to be related to outcomes administrative data on events (e.g., arrests, incarceration,
(confounding factors) were unequally distributed between groups. hospitalisation).

Avoidance of detection bias (blinding of assessors): Assessor was • Yes for all outcomes = Low risk
unaware of group assignment when collecting outcome data. • Yes for some outcomes = Unclear risk
• Unclear risk: insufficient information
• Yes for all outcomes = Low risk • No = High risk.
• Yes for some outcomes = Unclear risk
• Unclear risk: insufficient information Free of selective reporting: The study protocol was available and all
• No = High risk. prespecified outcomes were reported in the prespecified way; all
expected outcomes were reported in full and for all cases (regardless
Avoidance of attrition bias: Losses to follow up were ≤ 25% overall of direction and significance of results).
and equally distributed (< 10% difference in response rates) across
groups (adapted from WWC attrition). Group equivalence on base- • Yes = Low risk
line characteristics was retained after losses to follow‐up (d < 0.25, • Unclear risk (e.g., protocol was not available)
adapted from WWC baseline). • No = High risk: some outcomes were not reported or some out-
comes were reported incompletely (e.g., for subgroups only, or
• Yes for all outcomes = Low risk without sufficient detail for meta‐analysis).
• Yes for some outcomes = Unclear risk overall
• Unclear risk: insufficient information Free of conflicts of interest: Investigators would not benefit if re-
• No = High risk: loss of baseline equivalence (d > 0.25), losses to sults favoured MST or control/comparison groups. None of the study
follow up > 25%, or losses were unequally distributed (> 10% authors, data collection staff, or data analysts were paid to develop,
difference) across groups. supervise, or provide services to the MST or comparison group; none
of these investigators were members of consulting firms linked to
Given substantial proportions of missing data in some long‐term MST or comparison conditions.
follow‐ups, we considered raising the threshold for ROB assessments
of overall attrition from 25% to 30%. However, this change would not • Yes = Low risk
have affected any study's ROB ratings for attrition, so we did not • Unclear risk
change the threshold. • No = High risk.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 of 192 | LITTELL ET AL.

We included ratings of conflicts of interest (COI) because several and the interaction of these two treatments, we compared the arms that
reviews showed that program developers' involvement in research was best represented MST and a similarly situated control group.
associated with the direction and significance of results (Petrosino 2005,
Eisner 2009, Gorman 2018), while others did not (Welsh 2012).
4.3.6 | Dealing with missing data
Outcome‐level ROB assessments
Following the rubrics described above, we conducted separate as- When we identified missing data (on studies, cases, outcomes, or effect
sessments of risks of bias related to detection and attrition for two sizes), we contacted investigators with requests for more information.
kinds of outcomes: those that relied on administrative data versus We are concerned here with possible reasons for missing data.
self‐reports. Thus, within studies, outcomes based on data extracted Data can be missing completely at random (MCAR), missing at ran-
from official agency records could have different risks of detection dom (MAR), or missing not at random (MNAR). MCAR and MAR data
bias or attrition bias than outcomes obtained from structured in- are not likely to affect results of meta‐analysis, but MNAR data will
terviews with youth, caregivers, or others. (Pigott 2019). When studies, cases, outcomes, or effect sizes are not
fully reported for reasons that are related to their results, meta‐
analysis of available data will be biased. For example, nonpublication
4.3.4 | Measures of treatment effect or nonreporting of negative or null results will lead to inflated effect
sizes in meta‐analysis, as will the systematic loss or omission of
Continuous data were analysed if means and standard deviations subgroups of participants or sites with more negative outcomes.
were available or there was some other way to calculate effect size To assess issues related to missing data, we recorded data on
(e.g., from t tests, F tests, or exact p values). When reports contained attrition and differential attrition for each outcome and each end-
insufficient data, we sought additional information from the authors. point. As discussed below, we tracked the reporting of outcomes and
Studies used diverse scales to measure the same clinical outcomes assessed evidence of reporting bias and publication bias.
(e.g., psychiatric symptoms), so we used standardised mean differ- When published analyses systematically excluded data on sub-
ences (SMD) to facilitate comparisons across studies. The RevMan samples (sites or cases) with poor outcomes, we conducted best case/
formula for SMD is Hedge's g, which is like Cohen's d but includes an worse case (BC/WC) scenario analysis to calculate the range within
adjustment for small sample bias. which a reported effect size must lie. For dichotomous outcomes, this
Binary outcomes were analysed by calculating odds ratios involves calculating a lower bound, which assumes that all missing
(OR) with 95% CIs. Attempts were made to preserve information MST cases had negative outcomes and all missing all control cases
about base rates (in control groups) and between‐group differ- had positive outcomes (worst case), and an upper bound, in which all
ences in proportions, since this provided important contextual missing MST cases had positive outcomes and all missing control
information. cases had negative outcomes (best case).
After computing effect sizes (ORs and SMDs), we examined When important details of analyses (e.g., valid ns, SDs) were not
outliers and checked to make sure that our data accurately reflected available from authors, we estimated missing data using methods
study reports. We used log odds ratios (LORs) in meta‐analysis, and described in Appendix C. We used Cochrane's Finding_SDs.xls to
converted results back to ORs for presentation. calculate missing standard deviations (training.cochrane.org/
resource/revman‐calculator).

4.3.5 | Unit of analysis issues


4.3.7 | Assessment of heterogeneity
MST trials randomly assigned youth and their families to treatments.
In addition to a focal young person, some studies conducted analyses Heterogeneity was evaluated with I2, the χ2 test of heterogeneity,
of outcomes for siblings or sibling groups. We did not include data on and visual examination of overlap between CIs in forest plots.
outcomes for siblings, because families were the main units of ana-
lysis in most studies, the focal youth and parent(s) were the main
focus of intervention, and some focal youth did not have siblings. 4.3.8 | Assessment of reporting biases
When we encountered multi‐armed studies, we limited our com-
parisons to the two arms that best represented typical implementation of We extracted data from all available study reports (including pro-
MST and a non‐MST control group. For example, if a three‐armed study tocols, when available), and tracked the reporting and nonreporting
compared MST plus another intervention to MST‐only and a usual ser- of data on specific outcomes and endpoints. We identified full re-
vices control group, we ignored the first group (which did not meet our porting, partial reporting, and missing data on specific outcomes in an
inclusion criteria) and compared results for the last two groups. Similarly, outcome matrix (following Dwan 2010) and we documented missing
for studies that used factorial designs to test MST, another intervention, data on endpoints in a separate table. When the number of studies (k)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 17 of 192

in an analysis was > 10, we examined funnel plots for evidence of Correlated effects models
publication bias and small sample bias. Included studies reported multiple dependent outcomes, including
multiple measures of the same construct, measures from different
data sources, and repeated measures from the same participants
4.3.9 | Data synthesis over time. Several strategies have been used by others to include
multiple dependent measures in meta‐analysis. As discussed by
We used pairwise meta‐analysis to synthesise data from multiple Pustejovsky and Tipton 2020, commonly used hierarchical or multi-
studies on comparable outcome measures at similar points in time. level models (e.g., the models used by van der Stouwe 2014 and
We used correlated effects (CE) meta‐analysis models to synthesise others) assume that effect sizes are independent within studies, an
data on all available outcomes within nine conceptually distinct assumption that fails to hold up in our dataset, given that all out-
outcome domains: out‐of‐home placements, arrest or conviction, self‐ comes are measured on the same participants within studies. In a
reported delinquency, substance use, peer relations, youth behaviour correlated hierarchical effects (CHE) model, effect sizes are nested
and symptoms, parent behaviour and symptoms, family functioning, within studies and the model accounts for the assumption that these
and school outcomes. nested effect sizes are correlated. A large imbalance in the number of
Given substantial differences between studies in participant outcomes reported by different studies in our review precluded our
characteristics, treatment implementation, comparison conditions, use of the CHE model. Thus, we used the correlated effects (CE)
and research methods, we did not expect all studies to produce es- model, described by Pustejovsky and Tipton 2020, which assumes
timates of the same population parameters. For this reason, we used that there are dependencies among effect sizes within studies, in-
random effects models whenever possible (i.e., in pairwise meta‐ cludes corrections for small sample bias, and produces robust var-
analysis and in CE models with more than five studies). iance estimates (RVE). This approach provides “valid point estimates,
standard errors, and hypothesis tests even when the degree and
Pairwise meta‐analysis structure of dependence between effect sizes is unknown” (Fisher &
In pairwise meta‐analysis, each study (or independent sample) con- Tipton 2015, p. 1; also see Hedges 2010, Tanner‐Smith 2014, Tanner‐
tributed no more than one effect size, so that meta‐analysis was based Smith 2016).
upon a set of independent estimates. Each study‐level effect size was Studies reported similar outcomes in different ways (e.g., some
based on data from a unique pair: a treatment group and a control group. reported days of school attendance, others reported days absent
We used RevMan Web, the latest version of the Cochrane Col- from school), so before conducting CE analysis, we reverse‐scored
laboration's meta‐analysis software to conduct pairwise meta‐ outcomes so that
analysis. Separate meta‐analyses were conducted for continuous and
dichotomous outcomes, using SMDs for continuous outcomes and • Negative scores always represent beneficial outcomes of MST on
ORs for dichotomous outcomes. (reductions in) out‐of‐home placements, arrests/convictions, de-
We conducted separate analyses for different endpoints, by linquency, substance abuse, youth behaviour problems and symp-
collapsing endpoints into the following categories: 1 year (9–18 toms, and parent behaviour and symptoms; and
months post random assignment), 2.5 years (19–40 months), and 4 • Positive scores always represent beneficial outcomes of MST on
years (41–60 months). When a study provided data on the same peer relations, family functioning, and school outcomes.
outcome at multiple endpoints within one of these categories (e.g.,
at 24 and 36 months), we selected the endpoint with the largest After eliminating duplicate reports, we used all available data on
valid n for inclusion in forest plots. If valid ns were identical at two our primary and secondary outcomes in the CE models, including
or more endpoints within an interval, we selected the endpoint multiple measures of the same outcome at different points in time.
with the longest observation (e.g., 36 months rather than We assumed there was a correlation of 0.8 for effect sizes
24 months). measured within the same study, but we tested this assumption with
When a primary study provided multiple measures of the same sensitivity analysis, assessing results for ρ = 0.0, 0.2, 0.4, 0.6, 0.8, and
outcome (e.g., parent and youth reports on family cohesion) at the 1.0. Results showed that different values of rho produced consistent
same point in time, we selected the most direct source for pairwise estimates of mean ES coefficients, standard errors, and τ2 (all of these
meta‐analyses. In forest plots, we displayed youth reports on youth estimates were consistent within ± 0.005).
behaviours and parent reports on outcomes related to parent and We estimated effect size models (both the mean effect size model
family functioning. and any moderator models) using the R programmes metafor and robu-
Inverse variance methods were used to pool SMDs, so that each meta. The variance component for the random effect size model was
effect size was weighted by the inverse of its variance in an overall estimated in robumeta using REML. When there were more than two and
estimate of effect size. Mantel‐Haenszel methods were used to fewer than five studies reporting on an outcome in these analyses, we
combine binary outcome data (odds ratios) across studies. CIs of 95% used a fixed effect model in metafor to compute the mean effect size.
were used for individual study data and for pooled estimates. Results We compute separate CE estimates for dichotomous and con-
are displayed in forest plots. tinuous variables. For dichotomous outcomes, our synthesis was
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 of 192 | LITTELL ET AL.

conducted using the log odds ratio (LOR), and we converted results versus non‐USA location, (b) developer‐involvement versus in-
back to ORs for ease of interpretation. Then, to increase statistical dependent teams, and (c) ROB ratings.
power, we converted odds ratios to SMDs and produced CE models For the CE moderator analysis, we selected the ROB categories
with all available outcomes in the analysis. in which there were substantial variations between studies: sequence
When there are fewer than four degrees of freedom, results of generation, baseline equivalence, performance bias, intent‐to‐treat
CE models are unreliable. (ITT) analysis, and selective reporting. We assessed these moderators
Where possible, we provide 95% prediction intervals (PIs) as well individually, rather than creating a composite ROB score (or a
as 95% CIs around point estimates of main effects. PIs (ES ± [1.96 × methodological quality score), following Jüni 2001.
SQRT[τ2]]) show the range of values within which results of future
studies are likely to fall. Program and sample characteristics
The R code for our CE analysis is provide in Appendices D–G We had planned to conduct subgroup and moderator analyses to esti-
Raw data are provided in an online supplentary file in the Supporting mate effects of specialised MST programmes (MST‐CAN, MST‐JDC, MST‐
Information section. Study IDs are deciphered in Appendix H. PSB, and MST‐psychiatric), but there were too few studies in each of
these categories for meaningful analysis. We had also planned to conduct
separate analyses of groups of studies, based on whether study partici-
4.3.10 | Subgroup analysis and investigation of pants were primarily involved with juvenile justice, mental health, or child
heterogeneity welfare systems, but these distinctions did not hold up well (e.g., youth
with problem sexual behaviour were often involved in multiple service
We examined several characteristics of studies, samples, pro- systems). In some studies, referrals came from multiple service systems,
grammes, and outcomes as possible moderators of treatment effects. so it was not possible to create discrete service system contrasts.
In forest plots, we grouped studies using two categorical moderators
(described below), examined results within and across subgroups of Outcomes
studies, and used χ2 tests for differences between subgroups. We planned to conduct separate assessments of effects on different
We assessed eleven potential moderators (discussed below) types of out‐of‐home placements, because these outcomes could be
using CE models. In our dataset, meta‐regression models with mul- defined differently for different populations (e.g., incarceration of juve-
tiple moderator variables produced statistical tests with fewer than nile offenders, hospitalisation of youth with psychiatric disorders, com-
four degrees of freedom; these results were not reliable. Given this munity placements for youth with disruptive behaviour). However, most
constraint, along with an imbalance in the number of effect sizes studies used composite measures that included out‐of‐home placements
reported across studies, concerns about statistical power, and con- of multiple types. Even studies of juvenile offenders included data on
founded moderators, we used single‐variable CE models to explore hospitalisation and residential treatment in their measures of out‐of‐
heterogeneity. That is, our exploratory analyses of potential effects home placements (e.g., Henggeler 1999a, Henggeler 2006). Thus, we did
of moderators uses a separate CE model for each moderator and not explore effects on different types of out‐of‐home placements.
outcome. Relevant psychosocial outcomes (e.g., symptoms, behaviours,
peer relations, family functioning) were defined in similar ways across
Study characteristics studies and populations, although studies used a wide array of out-
We conducted several analyses to see whether and how treatment come measures. We pooled results across different measures of the
effects related to geographic location (USA vs. other countries), in- same construct in pair‐wise meta‐analysis.
vestigator independence, and several risk of bias (ROB) categories. We conducted separate CE analyses for different outcome do-
We expected to find larger effects in studies conducted in the USA mains. Within these categories, we pooled results across all studies that
(where usual services are relatively scant, compared with services provided data on relevant outcomes. This was done to test claims that
provided in other high‐income countries), in studies conducted by positive effects of MST “have been replicated across youths with dif-
MST program developers, and in studies with relatively higher ROB. ferent types of problems” and “across problems, therapists, and set-
As described below, these study characteristics are confounded. tings” (Kazdin 1998, pp. 27–28; also see Kazdin 2015, p. 150) and is
Developer‐involved studies were defined as studies with reports consistent with previous MST meta‐analyses that combined outcomes
that were co‐authored by one or more of the founders of MST (Scott across populations and comparison conditions (e.g., Curtis 2004).
Henggeler, Charles Borduin, Sonja Schoenwald, or Melissa Rowland). We assessed differences in outcomes that relied on data from
These co‐founders are current or former stakeholders in MST Ser- different sources (e.g., administrative data versus parental or other
vices LLC or MST Associates LLC. Studies co‐authored by other in- reports on out‐of‐home placements; youth reports versus other
vestigators were classified as independent. sources of data on youth symptoms and behaviours; parent reports
To deal with confounded moderators, we identified subgroups of versus other sources of information on parent and family outcomes).
studies, based on their country and independence. In forest plots, we We expected results to vary based on the timing of outcome
show results for each subgroup and display all ROB ratings next to measurement, and thought that initial treatment effects might di-
each study. In CE analysis, we tested for moderator effects of (a) USA minish over time. Thus, we assessed the timing of outcome
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 19 of 192

measurement (defined as months elapsed since random assignment)


as a potential moderator of effects.
We assessed affects of overall attrition and differential attrition
on results, using outcome‐level measures of these two variables.
Attrition is defined as the proportion of missing data (1 − [valid n/
total N]) for a given outcome at a given endpoint. Differential attri-
tion is the difference between MST and control groups in the pro-
portion of missing data ([valid n1/total N1] − [valid n2/total N2]) for a
given outcome at a given endpoint.

4.3.11 | Sensitivity analysis

As described above, we conducted best case/worse case scenario


analyses to estimate the range of effects that might have been ob-
tained in studies with data that were missing not at random (MNAR).
Also mentioned above, we assessed the sensitivity of CE models
to various assumptions about the size of the correlations between
effect sizes within studies.

4.3.12 | Summary of findings and assessment of the


certainty of the evidence

We used the GRADE guidelines (gdt.gradepro.org) to assess the


certainty of evidence regarding seven primary outcomes.

5 | RESULTS

5.1 | Description of studies

Studies were identified using the search methods described above.


Results of the search and characteristics of included and excluded
studies are detailed below.

5.1.1 | Results of the search

Electronic database searches produced a total of 3784 records (411


records were identified in 2003, 1103 in 2010, and 2270 in 2020).
Internet searches conducted in 2003 produced 4662 hits (including
results of a Google search); more sensitive internet searches (using
Google Scholar instead of Google) were conducted in 2010 and 2020
and these yielded 218 and 220 hits respectively. Reference har-
vesting yielded 350 citations. Personal contacts helped us identify 55
documents. Many bibliographic records appeared in multiple data-
bases and also in other sources. After duplicates were eliminated, we
had 1808 unique citations (see Figure 1). Six of these could not be
screened because we did not have access to abstracts or full‐text. FIGURE 1 PRISMA flow diagram
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
20 of 192 | LITTELL ET AL.

We screened 1,802 titles and abstracts, ruling out: unrelated At least half of the studies covered a mixture of urban, rural, and
topics and programmes (n = 739 records); studies of MST for medical suburban settings; two were conducted in rural settings; and four
conditions (n = 37); research reviews (n = 417); descriptive, correla- trials took place in urban settings.
tional, single group, and case studies (n = 143); theory or position
papers, editorials, and book reviews (n = 203); and practice guidelines Study methods
and treatment manuals (n = 25). All studies used some form of random assignment to treatment and
We retrieved 723 full‐text reports, including 353 reviews. Forty‐ control groups, although the details of these procedures were not
two of these retrievals required inter‐library loans. always clear. Ten studies used simple random assignment; five mul-
We identified 104 distinct MST outcome studies. Many studies tisite studies randomised cases within sites; three trials used other
had multiple reports, with an average of 2.77 reports per study. stratifying variables (such as gender, age, presence or onset of con-
We contacted 22 experts in attempt to obtain missing data on 16 duct problems, and/or difference in age between juvenile offenders
studies. Some experts were contacted in relation to multiple studies. and victims), and three studies used yoked pairs.
Thirteen experts (59%) responded. From these contacts we learned In the “yoked” studies, cases that were randomly assigned to
that two studies did not meet our eligibility criteria (Hurley 2004, MST were paired with cases randomly assigned to usual services,
Sheidow 2003), results of one study were not available (Schoen- based on timing of their entry into the study. For example, in one
wald 2004), and we received additional information on three studies study, “eligible youths were referred…in yoked pairs, with one youth
(Asscher 2013, Butler 2011, Glisson 2010). randomly selected…to receive MST and the other to receive the usual
services” (Henggeler 1992a, p. 954). Since there was no treatment
completion date for usual services cases, “post‐treatment” assess-
5.1.2 | Included studies ments for both cases were conducted after MST services ended in
the MST case. If one member of the pair was lost to follow‐up, the
Twenty‐three studies met the inclusion criteria for this review. These other case was usually retained in the study.
studies involved 3987 participant families who were randomly as- Two studies used factorial designs to assess main effects and
signed to eligible treatment and comparison conditions. interaction effects of two different interventions: One multi‐armed
Included studies had at least one and up to 20 reports per study, study (Henggeler 2006) compared four interventions for juvenile
with an average of 7.17 reports per study (SD = 3.01). Of the 165 offenders with substance abuse problems; these intervention were:
reports associated with these 23 studies, 104 reports were published (1) family court, (2) drug court, (3) drug court plus MST, and (4) drug
(e.g., journal articles, book chapters) and 61 were unpublished (e.g., court plus MST plus CM. For purposes of our review, we limited our
conference presentations, dissertations, theses, government or analysis to the second and third arms, to assess the impact of MST
foundation reports). As shown in Table 2, most studies produced for cases involved in drug court.
both published and unpublished reports. One report was written in Another study (Glisson 2010) used a factorial design to in-
Swedish and the rest were written in English (we used Google vestigate effects of two different interventions and their interactions.
translation software to read the Swedish report). One of these interventions (termed ARC for Availability, Respon-
Characteristics of included studies are described below. siveness, and Continuity) was implemented at the community level,
and communities were randomly assigned to ARC or non‐ARC con-
Study settings ditions. Within the communities in both conditions, families were
Included studies were conducted between 1983 and 2020 in six randomly assigned to MST and control groups. For purposes of this
countries. Sixteen studies were conducted in the USA, three in the review, we focused only on comparisons between MST and control
UK, and one each in Canada, the Netherlands, Norway, and Sweden. cases within communities.
By 1990, three studies had been launched in the USA by MST pro-
gram developers; by 2000, there were four new trials, three led by Sample characteristics
independent teams, including one outside of the USA (in Canada); Of 13 studies of effects of MST for juvenile offenders, four focused
eight trials began between 2000 and 2009 (three were non‐USA, on sex offenders, two included offenders with substance abuse
independent studies); and two studies began in 2010 or later. problems, and seven focused on juvenile offenders in general. Two
Seven studies were conducted in multiple sites, including two sites in studies assessed effects of MST for youth with serious mental health
a South Carolina study (Henggeler 1997), two sites in the Netherlands problems (such as suicidal ideation), while six studies included youth
(Asscher 2013), four in Ontario (Leschied 2002); four in Norway (Og- with a wide range of behavioural and mental health issues, such as
den 2004), six in Sweden (Sundell 2006), nine in the UK START trial aggression, rule breaking, other antisocial behaviour, serious aca-
(Fonagy 2018), and 14 sites (counties) in rural Tennessee (Glisson 2010). demic difficulty, or dysfunctional relationships. One study explored
Site‐specific results were reported for some outcomes in the effects of MST for youth with Autism spectrum disorder (ASD) and
Leschied 2002 study. The Fonagy 2018 trial and Glisson 2010 study took another examined effects in families with child abuse or neglect.
the nested, multisite structure into account in their data analyses, but to Fourteen studies included cases referred by juvenile justice au-
our knowledge other multisite trials did not do this. thorities, one received referrals from mental health sources, two had
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 21 of 192

T A B L E 2 Summary of characteristics of included studies


Percent of all
Number of (23) included
Variable Value studies (k) studies (%)

Publication status Unpublished reports only 1 4

Published reports only 5 22

Both 17 74

Year enrolment began 1983–1989 3 13

1990–1999 4 17

2000–2009 8 35

2010–2014 2 9

Missing 6 26

Country Canada 1 4

Netherlands 1 4

Norway 1 4

Sweden 1 4

UK 3 1

USA 16 70

Location type Urban 4 17

Suburban 0 0

Rural 2 9

Mixed 12 52

Unclear 5 22

Random assignment method Simple/systematic 10 43

Blocked (by site) 5 22

Stratified (other variables) 3 13

Yoked pairs 3 13

Unclear 2 9

Sample type: presenting problems Juvenile offenders (general) 7 30

Sex offenders 4 17

Substance abuse 2 9

Serious mental health (MH) problems 2 9

Behaviour/MH problems 6 26

Autism Spectrum Disorder 1 4

Child maltreatment 1 4

Service sector (referral source) Juvenile justice 14 61

Mental health 1 4

Child welfare 2 9

Multiple sectors 6 26

Sample size (number of cases <100 9 39


assigned to groups) (mean = 173,
101–150 4 17
SD = 181, min = 15, max = 684)
151–200 5 22

201–250 2 9

251–500 1 4

501–700 2 9

(Continues)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
22 of 192 | LITTELL ET AL.

TABLE 2 (Continued)

Percent of all
Number of (23) included
Variable Value studies (k) studies (%)

Mean age of focal youth, in years <14 5 22

14 to <15 9 39

15 to <16 7 30

16 1 4

Missing 1 4

Gender of focal youth: % male <50% 1 4

50%–64% 4 17

65%–79% 8 35

80%–94% 7 30

95%–100% 3 13

Racial composition: % White <35 7 30

35–49 4 17

50–64 3 13

65–79 5 22

80–94 1 4

95–100 1 4

Missing 2 9

Racial composition: % Black <35 8 35

35–49 3 13

50–64 4 17

65–79 3 13

80–94 1 4

Missing 4 17

MST program type MST original 14 61

MST‐PSB 3 13

MST‐CAN 1 4

MST‐substance 2 9

MST‐psychiatric 2 9

MST‐ASD 1 4

MST: duration (mean number of <120 days 3 13


days of service)
120–139 4 17

140–159 4 17

160–179 3 13

>200 3 13

Missing 6 26

MST: amount (mean number of 21 1 4


hours of direct service)
30–39 3 13

40 1 4

66 1 4

88 1 4
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 23 of 192

TABLE 2 (Continued)

Percent of all
Number of (23) included
Variable Value studies (k) studies (%)

92 1 4

Missing 15 65

Comparison conditions TAU 17 74

TAU = placement/hospital 2 9

TAU = drug court 1 4

Individual therapy 2 9

Enhanced TAU 1 4

Abbreviation: TAU, treatment as usual.

cases involved in the child welfare sector, and six studies included random assignment was violated by court orders in two cases, MST
families referred from multiple service sectors. was never implemented in six cases (families moved or refused to
participate), the youth did not have a felony arrest, or recidivism
Sample size. Several studies had inconsistent reports on the size and data were not available on the state's computerised system
demographic characteristics of their samples. It was not always clear (Henggeler 1992a, p. 954). Thus, at least two and perhaps as many
how many cases were randomly assigned to groups. These problems, as twelve cases were excluded after random assignment. However,
which were most pronounced in several early studies, are discussed subsequent reports described this study as a randomised experi-
below. Study‐level details are also provided in the Characteristics of ment with n = 84 cases, with no mention of excluded cases
included studies. (Henggeler 1993, pp. 286–287; Henggeler 1996, p. 50; MST
Families were enroled into the Borduin 1995 study from 1983 to Services 2020a).
1986 (Mann et al., 1990, p. 337). In 1990, four years after enrolment Initially, the Ogden 2004 study included 100 families who were
ended, investigators wrote, “A total of 210 families of juvenile of- randomly assigned to MST or usual services in four sites. One site
fenders agreed to participate in the assessment and treatment was replaced with another site; four families dropped out of MST and
components of the study. Following the initial assessment session, were replaced with four new MST cases (it is not clear how new cases
each family was randomly assigned to either multisystemic therapy were selected).
or the alternative treatment group. Approximately 84% (n = 88) of Early reports on the Timmons‐Mitchell 2006 study indicated that
the families in multisystemic therapy and 65% (n = 68) of the families 163 families were randomly assigned (82 to MST), and data were
assigned to alternative therapy completed treatment” (Bor- available on 106 families approximately 12 months after referral
duin 1990a, p. 76). These treatment completion rates indicate that (Timmons‐Mitchell et al., 2003b). However, the published report on this
there were 105 families in each group (84% of 105 = 88, 65% of study indicates that the initial sample included only 105 participating
105 = 68; total n = 210). Describing the same study a year later, in- families (64% of the earlier number), with 93 who completed treatment,
vestigators wrote, “Following a pretreatment assessment session, and 12 (11%) who dropped out (Timmons‐Mitchell et al., 2006).
adolescent offenders were randomly assigned to either MST On average, included studies had 173 cases assigned to groups
(n = 100) or IC (n = 100)… Twenty‐four (12%) of the families subse- (SD = 181, min = 15, max = 684). Nine studies had fewer than 100
quently refused to enter treatment” (Henggeler 1991, p. 45). A third cases and two had more than 500 (Table 2).
report on the same study indicated that 200 families were randomly
assigned in this study, with 92 assigned to MST, 84 to individual Age, gender, and ethnicity. The average age of focal youth ranged from
therapy, and 24 refusing to participate in treatment (Henggeler 1996, 13.4 to 16 years. Study samples were predominantly male (44% to
p. 52). A fourth report states that, after 24 families refused to enter 100%); only one study (Swenson 2010) had fewer males than females
treatment, “the remaining 176 families were randomly assigned” (this study included child maltreatment cases). The racial and ethnic
(Borduin 1995a, p. 570). Subsequent reports describe this study as a composition of the samples varied considerably, with White youth
randomised trial with n = 176 cases, with no mention of 34 missing comprising 10% to 95% of the samples, and Black youth comprising
cases (e.g., Schaeffer & Borduin 2005, Sawyer & Borduin 2011, MST 7% to 81% (Table 2).
Services 2020a).
An early report on the Henggeler 1992 study indicated that 96 Intervention characteristics
juvenile offenders were referred to the project and 12 were ex- All studies included licensed MST programmes. One tenet of MST is
cluded from the study for various reasons, including the fact that that interventions are tailored to family needs; thus, the nature of
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
24 of 192 | LITTELL ET AL.

MST interventions varied both within and between studies. For ex- in the USA had control groups that received scant services. In some
ample, in the Borduin 1995 study, 83% of MST cases received family USA studies, usual services consisted of monthly visits with a probation
therapy, 60% received school intervention, 57% had peer interven- officer. In others, “few substantive services were delivered because of
tion, 28% received individual therapy, and 26% had marital therapy. the passive nature of traditional mental health services combined with
Fourteen studies assessed effects of the original MST program family difficulties and resistance” (Henggeler 1992a, p. 955). More ro-
and nine assessed adaptations of MST for specific issues: problem bust TAU conditions were provided to control groups in non‐USA
sexual behaviour (PSB, three studies), child abuse and neglect (CAN, countries.
one study), substance abuse (two studies), psychiatric problems (two
studies) and autism spectrum disorder (ASD, one study). We planned Outcome measures
to assess these adaptations separately, but the number of studies Out‐of‐home placements included incarceration (jail or detention in a
was insufficient for this purpose. secure facility) for juvenile offenders, hospitalisation for youth with
Seventeen studies reported information on the duration of MST serious psychiatric problems, residential treatment for mental health
services. Sample means ranged from approximately 3 months (94 days) and substance abuse issues, and community foster care. Most studies
to 7.6 months (231 days). Seven studies reported information on var- provided composite measures of out‐of‐home placements, including
iations in duration of MST services; in six studies, the difference be- placements of all types. Some provided measures of specific types of
tween the longest and shortest cases was more than 200 days. placements (e.g., incarceration, foster care), as well as composite
Only eight studies reported data on amounts of direct contact measures.
between family members and MST therapists. Reports range from an Most studies used administrative archival data to assess out‐of‐
average of 21 hours per case (in the Borduin 1995 study of sex of- home placements, but some gathered data on the types and duration of
fenders) to an average of 92 hours per case (in the Henggeler 1999b out‐of‐home placements from caregivers' reports (Henggeler 1999b,
study of youth with psychiatric emergencies). Ogden 2004). In one study, caregiver reports of youth hospitalisation
were confirmed with hospital records (Henggeler 1999b).
Comparison conditions Several studies reported data on placements that occurred
Most (20) trials compared MST with treatment as usual (TAU); that within discrete time intervals (e.g., from 0 to 6 months, > 6 to 12
is, services routinely available for these youth in their communities. months, > 12 to 18 months, etc.). These counts are not comparable to
Two studies of juvenile sex offenders compared MST with individual data from other studies that recorded the cumulative number or
therapy (Borduin 1990, Borduin 1995). One study of cases of phy- percentage of cases that experienced placement by a certain end-
sical child abuse compared MST with “Enhanced TAU”, which con- point (0 to 6, 0 to 12, or 0 to18 months). As described in the Methods
sisted of usual services plus outpatient, day, and residential section, we computed longer, comparable intervals when possible.
treatment for youth and a group training program (STEP‐TEEN) for Outcome measures included archival data (police and court re-
parents (Swenson 2010). cords) on arrests and/or convictions for criminal offences in studies of
Very little information was available on services provided to juvenile offenders in the USA, UK, Canada, Sweden, and the Neth-
youth and families in control groups. Only five studies (four in the erlands. These outcomes were not assessed in Norway, where youth
USA and one non‐USA study) reported information on the duration under 15 are not arrested and those under 18 are rarely prosecuted
of services provided to families in the control group (averages ranged (Ogden 2004).
from 187 to 380 days) and only three studies (all in the USA) re- Delinquency was usually assessed with youth reports on the Self‐
ported data on amounts of direct contact with family members Reported Delinquency scale (SRD; Elliott 1983).
(averages of 23 to 76 hours). Self‐reported frequency of substance use was assessed with
Even so, it was apparent that there was considerable variation measures such as the Personal Experience Inventory (PEI; Win-
across studies in terms of the nature of “usual services” (TAU) provided ters 1989), subscales from the Self‐Reported Delinquency scale
to youth and families in control groups. These variations reflected stu- (SRD; Elliott 1983), and items from the Adult Behaviour Checklist
dies' reliance on different referral sources (e.g., referrals from mental (ABC). Two studies used biologic measures of drug use (via urinalysis;
health, juvenile justice, and/or child welfare sources), different partici- Henggeler 1999a, Henggeler 2006). Another study obtained reports
pant characteristics, and variations in the nature of services available in on youth substance use from youth and parents (Fonagy 2018).
different geographic locations. In one study, the TAU for youth with Peer relations were assessed with the Missouri Peer Relations
psychiatric emergencies was psychiatric hospitalisation (Hengge- Inventory (MPRI; Borduin 1989), CBCL social competence and social
ler 1999b). For serious juvenile offenders in Delaware, TAU was pla- problems scales, Social Competence with Peers Questionnaire
cement in a secure residential facility (Miller 1998). In a study of youth (SCPQ), and similar measures.
involved in drug court programmes, the TAU was drug court (Hengge- Youth behaviour and symptoms were assessed via youth and parent
ler 2006). The Ogden 2004 study compared MST to usual services in reports (and sometimes teachers' or other observers' reports) on
the child welfare system (placement, in‐home supervision, etc.). UK standardised measures. Psychiatric symptoms were assessed with
studies that included participants from multiple referral sources (e.g., measures such as the Global Severity Index (GSI) of the Brief Symptom
Fonagy 2018) had multicomponent TAU. Most of the studies conducted Inventory (BSI; Derogatis 1993), the SCL‐90‐R (Derogatis 1983),
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 25 of 192

Revised Problem Behaviour Checklist (RPBC, Quay 1987) and the Child School outcomes include grades, attendance, absenteeism, sus-
Behaviour Checklist (CBCL; Achenbach 1991). Some studies reported pension, and exclusion. Studies collected data on school outcomes
CBCL internalizing and externalizing scales separately, while others from youth, parents, and administrative records. In one study, care-
combined them and/or reported more specific subscales. giver reports on young people's school attendance were confirmed
Measures of parenting behaviour and symptoms included some of with school records (Henggeler 1999b).
the same measures used to assess youth symptoms (e.g., GSI) and the A glossary of measures used in MST trials is provided in
Adult Behaviour Checklist (ABC). Parental supervision was assessed Appendix I. These measures were sometimes adapted to fit the
with the parent version of the Monitoring Index (Patterson 1985), sample (e.g., translated into Dutch, Norwegian, or Swedish).
the Alabama Parenting Questionnaire (APQ), and similar scales.
Other parenting measures included the Parental Authority Ques- Timing of outcome observations
tionnaire (PAQ) and the Loeber Caregiver Survey. As shown in Figure 2, the 23 studies in our review collected data at
Family functioning was assessed with the Family Adaptability and multiple endpoints. Most (21) studies collected data at baseline and im-
Cohesion Evaluation Scales (FACES‐II, Olson 1982; FACES‐III, Ol- mediately after intervention ended (4–8 months after random assign-
son 1985; or FACES‐IV) and the Family Assessment Measure (FAM‐ ment). Sixteen studies collected data at approximately 1 year (9–17
III, Skinner 1983). months), 16 collected data at about 2.5 years (18–40 months), 6 collected

KEY
Full report: all outcomes, all cases
Missing data provided by authors
Partial report: insufficient data for ES calculations on some outcomes
Partial report: missing outcomes or subgroups
No public report, data not available from authors

Study Pre During Post Follow-up: Months post baseline


Asscher 2013 0 6 12 24
Borduin 1990 0 37
Borduin 1995 0 6 53 170 269
Borduin 2009 0 7 107
Butler 2010 0 6 12 18 24 30 42
Fonagy 2017 0 8 14 20
Fonagy 2018 0 6 12 18 24 36 48 60
Glisson 2010 0 6 12 18
Henggeler 1992 0 6 14 28
Henggeler 1997 0 4 12
a
Henggeler 1999a 0 6 12 18 48
Henggeler 1999b 0 2 4 10 16 22 30
b
Henggeler 2006 0 4 12 18 24 36 48 60
Leshied 2002 0 6 12 18 30 42
c
Letourneau 2009 0 6 12 18 24 122
Miller 1998 12 21
Ogden 2004 0 6 24
Rowland 2005 0 6
Sundell 2006 0 7 24 60
d
Swenson 2010 0 2 4 10 16
Timmons-Mitchell 0 6 12 24
Wagner 2019 0 6 12
Weiss 2013 0 3 6 18 30

a b c
Henggeler et al. 2002. Data reported for 70 siblings only; no information on main effects of treatment. Data for 18- and
d
24-month follow-ups were combined. Effect sizes provided for statistically significant results only.

FIGURE 2 Data collection time points: months post baseline


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
26 of 192 | LITTELL ET AL.

data at approximately 4 years (41–60 months), two at 9–10 years, one at T A B L E 3 Investigator independence by location
14 years, and one at 22 years following random assignment. Location MST developers Other investigators Totals
Several studies did not use standardised observation periods in
USA 13 3 16
their data analysis; that is, enrolment into the study occurred over a
Non‐USA 0 7 7
period of months or years, but outcome data (e.g., from administrative
records) were obtained at a fixed point in time. Some studies used Totals 13 10 23
survival analysis to account for varying observation periods or used Note: Fisher exact test p = .0005.
event history data to calculate outcomes within common intervals; but
others did not, leaving unadjusted variations in observation periods (e.g.,
for arrest rates reported in Borduin 1990, the mean length of the ob- criteria shown in Appendix B, Level 2, we recorded one reason for
servation period is 37 months, but the range is 21 to 49 months). We exclusion per study, although there may have been multiple reasons.
classified these observations using the average number of months that As shown in Figure 1, 10 studies were excluded because they were
had elapsed since random assignment, and we captured information on not focused on youth with social, emotional or behavioural problems
the range (width) of the observation period in months. (these studies involved families of youth with Type 1 diabetes, HIV/
Only five studies (22%) provided full reports on all outcomes on AIDs, obesity, and those with family members receiving methadone);
all endpoints (post‐treatment and follow‐up periods), and four studies 35 studies were excluded because they lacked comparison or control
(17%) provided no public data on main effects of treatment at one or groups; 25 did not use random allocation to treatments; seven stu-
more endpoints (see Figure 2). For example, Henggeler 2006 col- dies assessed “multisystemic” interventions that were not licensed
lected outcome data on 161 cases at six distinct follow‐up endpoints MST programmes, or examined treatments that combined MST with
from one to five years after random assignment, but only reported other interventions; and three studies were excluded because they
main results for the first follow‐up at one year. In contrast, Fo- did not meet our age criterion (most participants were younger than
nagy 2018 collected data on 684 cases at seven points over a five‐ 10 or older than 17 years of age).
year period, and provided full reports on all outcomes at all One randomised controlled trial could not be assessed, because
endpoints. we did not have access to the study report (see Schoenwald 2004 in
Characteristics of studies awaiting classification).
Number of outcomes reported We had planned to include a randomised trial in Denmark, but
There was a large imbalance in the number of outcomes reported by this study was cancelled after all five MST teams declined to parti-
MST trials. The number of effect sizes (ES) per trial ranged from 2 to cipate (Pontoppidan 2012).
538. Seven studies (30%) reported fewer than 20 ES, 10 studies
(43%) produced 21 to 40 ES, five (22%) had 41 to 60, and one had
538. The study with the largest number of ES (Fonagy 2018) is the 5.2 | ROB in included studies
only study that was prospectively registered and reported all pre‐
determined outcomes at all endpoints. ROB assessments are documented in Characteristics of included stu-
dies. Results are summarised in Figures 3 and 4 and discussed below.
Independence
Thirteen (57%) of the studies were conducted by MST program de-
velopers, and all of these studies were located in the USA. Ten stu- 5.2.1 | Sequence generation
dies were conducted by independent teams: one in Canada, one in
the Netherlands, one in Norway, one in Sweden, three in the UK and All studies indicated that a random component was used to allocate
three in the USA. families to treatments. Some studies described this process in detail,
others did not. Some approaches were more sophisticated and more
Confounded moderators fool‐proof than others (e.g., computer generated assignments at a
As shown in Table 3, investigator independence and study location remote location are more secure and pose lower risks of bias than
were closely related (Fisher's exact p value = .0005). Further, USA/ coin tosses conducted on‐site).
developer studies tended to be launched earlier and, as we shall see, It is not clear whether randomisation was applied to all cases
had more serious risks of bias than other studies. in some studies. For example, in the Henggeler 1997 study,
146 cases were assigned to MST or usual services in 73 yoked
pairs and nine cases were assigned to MST. The Ogden 2004
5.1.3 | Excluded studies study assigned 62 families to MST and 38 to usual services,
but later replaced four of the cases that were originally
Of 104 studies identified, 80 were excluded because they did not assigned to MST (T. Ogden, personal communication, 4 Octo-
meet eligibility criteria for this review. Specific reasons for exclusion ber 2003). It is not clear whether these additional cases were
are shown in Characteristics of excluded studies. Using the decision randomised.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 27 of 192

FIGURE 3 Risk of bias graph

Two studies have high risks of bias on this indicator, 13 have Available data on baseline equivalence often excluded cases that
unclear risks, and 8 have low risks of bias (Figure 4). refused to participate after random assignment. Whether due to ran-
domisation or early attrition, there was a lack of baseline equivalence in
15 studies (65%). In some studies, the MST group included substantially
5.2.2 | Allocation concealment larger proportions of Whites and female youth (Borduin 1995), parents
with higher levels of education, two‐parent families, and families with
Methods of allocation concealment were not always fool‐proof. Coin greater wealth or higher socio‐economic status (Asscher 2013) than
tosses were used in some studies (Borduin 1995, Timmons‐ their counterparts in the comparison group.
Mitchell 2006), while sealed envelopes were used in others (Heng- Unpublished reports on 176 families that remained in the
geler 1999b, Ogden 2004). Some studies noted when and where Borduin 1995 study showed significant differences between the
randomisation occurred, but did not describe the method of rando- MST and control groups on race, gender and previous arrests.
misation. These details raised concerns, for example, when random Schaeffer 2000 (p. 120) found a higher proportion of Whites (74%
assignment took place in the family home with a MST therapist versus 69%, p < .05) and fewer males (62% versus 77%, p < .05) in
present (Leschied 2002). the MST group (n = 92) compared with the control group (n = 84).
Two studies have high risks of bias on this indicator, 15 have The MST group had an average of 1.65 nonviolent arrests (SD =
unclear risks, and six have low risks of bias. 1.09), compared with 0.98 (SD = 1.96) in the control group (p < .01).
Further racial and gender differences appeared when comparing
youth who completed MST (82% were White and 59% were male)
5.2.3 | Baseline equivalence and those who dropped out of MST (73% were male; Schaef-
fer 2000, p. 123). In a follow‐up study of caregivers of youth in this
As indicated above, we used the standardised mean difference study, Johnides 2015 found that 84% of caregivers of MST youth
(Cohen's d) to assess the magnitude of differences between groups were White, compared with 73% of caregivers in the control group
at baseline, following WWC guidelines 2018. Baseline equivalence (χ2 = 3.94, df = 1, p = .047). In contrast, published reports in this
was rated as unclear for studies that reported insufficient data to study stated that there were no between‐group differences in de-
compute Cohen's d. Note that our assessments of baseline mographic characteristics or criminal histories, but provided no
equivalence differ from some assessments by investigators, who data to support these statements (Henggeler et al., 1991, p. 45;
reported whether between‐group differences were statistically Borduin et al., 1995, p. 570, 572; Klietz et al., 2010, p. 659;
significant. Nonsignificant differences with d > 0.25 occurred in Sawyer & Borduin 2011, p. 644; Dopp et al., p. 314).
small samples and statistically significant differences with d < 0.25 An early report on the Timmons‐Mitchell 2006 study indicated
appeared in very large samples; we treated the former as evidence that there was a significant gender imbalance between groups, with
of “real” differences (high ROB) and the later as evidence of 23 females in the MST group and 14 in the control group; authors
low ROB. noted that this was the result of random assignment, which does not
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
28 of 192 | LITTELL ET AL.

FIGURE 4 Risk of bias summary

always produce equivalent groups (Timmons‐Mitchell 2003b, p. 11). There were substantial differences in race and referral source at
The final report on this study stated that there was no significant baseline in the Fonagy 2017 trial, but these differences were fully
between‐group difference on gender, but did not provide data to reported: There were more Black youth and fewer Whites in the MST
confirm this (Timmons‐Mitchell 2006). Data on between‐group dif- group. Cases in the MST group were more likely to be referred from
ferences are presented in Table 4. social care services and less likely to be referred by youth offender
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 29 of 192

services, compared with cases in the control group (Fonagy 2017, T A B L E 5 Between‐group differences in the Fonagy 2017 study
Table 5). MST (n) MST (%) TAU (n) TAU (%)

Race

White 5 24 12 63
5.2.4 | Performance bias (confounding)
Black 13 62 4 21
Ten studies had high risks of performance bias related to large im- Other race 3 14 3 16
balances in the amount of attention paid to participants and/or
Referral source
clinicians. In these studies, MST cases received substantially more
Social care 15 71 8 42
contact and/or longer treatment, not just different treatment. MST
therapists often received special attention, training, and supervision Youth offender services 6 29 10 53

(sometimes by program evaluators) that was not available for clin- Mental health (CAMHS) 0 0 1 5
icians who provided services to the control group. It can be argued Total (n) 21 19
that this extra or special treatment is part of the MST intervention,
Source: Fonagy 2017, pp. 21–22. For race, χ =7.57, p = .023; d = 0.53, for
2

but then we cannot know whether any differences in outcomes are referral source, d = 0.65.
due to content (MST) or amounts of contact.

as they may have participated in randomisation procedures or


5.2.5 | Detection bias (blinding) received clues during assessment interviews with family mem-
bers regarding the nature of the services these families received.
In MST trials, study participants and therapists could not have been Asscher 2013 reported that their research assistants were blind
blind to the type of treatment they received or provided. Our as- to the study hypotheses, but may have known which treatments
sessments focused on blinding of assessors. families received and might have guessed the study hypotheses.
As mentioned above, we examined risks of detection bias for This is common problem in field trials of complex psychosocial
two types of data: participant reports and administrative data. interventions.
Participants' reports were gathered by researchers, who ad- In most studies the risks of detection bias were unclear. Only one
ministered interviews or questionnaires, usually in families' homes. study blinded assessors for all outcomes (Fonagy 2018).
Administrative data were compiled by professionals in social ser-
vice or government agencies. Hence, these types of data could
have different risks related to blinding (and attrition) within 5.2.6 | Attrition bias
studies.
Collection of archival data (e.g., from juvenile justice records) Overall attrition ranged from 0% to 72% and differential attrition
might be considered to be blind; however, law enforcement officials ranged from 0% to 30%.
were not always blind to group assignment and their knowledge that Few studies conducted thorough analyses of differences be-
a youth was receiving or had received MST could have affected key tween cases retained in the analysis and those lost to follow up, and
decisions about youth (e.g., arrests, convictions, and incarceration; few provided data on differential attrition. The Asscher 2013 study is
Leschied 2002a). an exception, as these authors conducted analyses to determine
In some studies, post‐treatment and follow‐up measures whether data were missing completely at random (MCAR).
were collected by MST therapists or researchers who were not In some studies, attrition was not random (MNAR). Some studies
blind to participants' group allocation. As Letourneau and col- excluded certain participants after random assignment.
leagues (Letourneau 2009, p. 91) noted, it was often difficult for Eligibility decisions should have been made prior to random as-
research assistants to remain unaware of treatment conditions, signment, but some ineligible cases were discovered after the fact.
When that occurred, “the intention to exclude such participants
should be specified before the outcome data are seen” (Hig-
T A B L E 4 Between‐group differences in the Timmons‐ gins 2020). This was not done in several studies (e.g., Ogden 2004).
Mitchell 2006 study Differential attrition changed the composition of groups in some
MST (n) MST (%) TAU (n) TAU (%) Total studies, particularly in the Borduin 1995 study. Although reports
vary, it appears that White families were more likely than nonwhite
Females 23 28% 14 17% 37
families to be retained in this study: 67% (Henggeler 1996a) or 70%
Males 59 72% 67 83%
(Henggeler 1991, Borduin 1995a) of the larger sample of 200 families
Total 82 81 163 were White, compared with 76.1% of the 176 families who remained
Source: Emboldened data are from Timmons‐Mitchell 2003b, p. 6, 11. in the study (Schaeffer & Borduin, 2005; Klietz et al., 2010; Sawyer &
χ2 = 2.69, p = .101; d = 0.34. Borduin 2011). Shown in Table 6, if 70% of the larger sample was
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
30 of 192 | LITTELL ET AL.

White, then the difference between families retained in the study and “cannot exclude the possibility that the reason for exclusion was
those who refused to participate (76.1% vs. 25% White) is statisti- biased by [knowledge of] the results” (Chalmers 2007).
2
cally significant (χ with Yates correction = 23.92, df = 1, p < .00001). There was some confusion in study reports about what ITT
If 67% of the larger sample was White, then all 24 refusers were analysis means. When studies reported the number of cases ran-
nonwhite. (We found no information on 10 cases that were identified domly assigned and reported a smaller number of cases “retained for
in a 1990 report on this study (n = 210; Borduin 1990a) but not intent‐to‐treat analysis” (sic), we used the first number as the basis
included in the 1991 report (n = 200, Henggeler 1991) or in sub- for our assessment of ITT analysis (and as the denominator in cal-
sequent reports.) culating attrition rates).
In the Henggeler 1992 study, dropouts were more likely to be Studies that systematically excluded cases after random assignment,
White, male, and living with neither parent (Table 7). on the grounds that families did not accept or did not complete treat-
Overall, there was less attrition in administrative data than in ment, were considered to have a high ROB in relation to ITT analysis, no
participant reports. Figure 4 shows that three studies have high risks matter how many cases were excluded. (We did not think it wise to set
of bias related to attrition of administrative data, while eight studies an arbitrary threshold for “acceptable” violations of ITT analysis.)
have high risks of bias related to attrition on outcomes based on Assessment of studies' ability to support ITT analysis was com-
participant reports. plicated by conflicting reports on the number of cases randomly as-
signed in some studies. As mentioned above, an early report on the
Borduin 1995 study indicated that 210 families were randomly as-
5.2.7 | Intent‐to‐treat (ITT) analysis signed to groups (Borduin 1990a, p. 76), another reports statesd that
200 cases were randomly assigned (Henggeler 1991), and sub-
To minimise biases introduced by attrition and differential attrition, sequent reports put that number at 176 (Borduin 1995a). Similarly,
ITT analysis includes all participants in the group to which they were an early report indicated that 96 cases were randomly assigned in
randomly assigned, regardless of whether participants received the the Henggeler 1992 study, but subsequent reports described this as
assigned treatment or provided data. Some studies failed to obtain or a study of 84 cases, with no mention of 12 missing or excluded cases
include data on participants who refused services, did not complete (Henggeler 1993, Henggeler 1996).
treatment, or did not complete treatment “successfully” (Schaef- As indicated above, four studies used yoked pairs of MST and
fer 2000, p. 36). These exclusions violated the principle of ITT ana- comparison cases (to link the timing of the second assessment for
lysis. When exclusions were made after outcomes were known, we comparison cases to the post‐intervention assessment for MST cases;
Henggeler 1992, Henggeler 1997, Henggeler 1999a, Hengge-
ler 1999b). However, if one of the cases dropped out of the study, its
T A B L E 6 Differential attrition in the Borduin 1995 study mate was retained in the analysis. Some methodological experts
Group % White n White n nonwhite n of cases thought this undermined the yoked design and argued that the un-
yoked cases should have been dropped to retain the benefits of
Larger samplea 70.0b 140 60 200
random assignment (Littell 2006). In any case, investigators could
Analysis subgroups 76.1c 134 42 176
have used sensitivity analysis to determine whether inclusion of
Refusers subgroup 25.0d 6 18 24 unyoked cases affected results; to our knowledge, this was not done.
Note: Differences between analysis subgroups and refusers: χ = 26.30,
2
The exclusion of MST drop‐outs was problematic, because these
df = 1, p < .00001; d = 1.24. cases tended to have more negative outcomes (e.g., higher rates of
a
Does not include all 210 cases reported by Borduin & Henggeler 1990.
b
arrest or conviction) than those that completed MST (Borduin 1995,
Henggeler et al. (1991, p. 45) and Borduin et al. (1995, p. 570).
c
Schaeffer & Borduin (2005, p. 446); Klietz et al. (2010, p. 658); and
Leschied 2002).
Sawyer & Borduin (2011, p. 644). Full ITT analysis was provided in only 10 studies, and only for
d
Estimated from available data. some outcomes. For example, Leschied 2002 provided full ITT

% living with
T A B L E 7 Differential attrition in the
% male % Black % White neither parent Henggeler 1992 study

Larger sample (n = 84)a 77 56 42 26


b
Completers (n = 56) 75 66 34 18
c
Dropouts (n = 28) 82 36 57 39

Note: Differences between completers and dropouts on gender d = 0.24, race d = 0.69, living with
neither parent d = 0.54.
a
Source: 1992, p. 954. Does not include 12 cases lost.
b
Source: 1992, p. 955.
c
Computed.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 31 of 192

analysis on outcomes derived from archival data, but response rates new findings in favour of MST). Some studies reported selected
on psychosocial measures were below 60%. Asscher 2013 provided subscales and omitted reporting of other subscales in the same in-
full ITT analysis on post‐treatment measures of psychosocial func- strument (e.g, Wagner 2019 reported results for the MPRI bonding
tioning, but not on longer‐term outcomes derived from adminis- scale, but not the MPRI aggression or maturity scales).
trative data. As shown in Figure 2, outcome data were collected, but not re-
ported at several endpoints in some studies. Butler 2011 did not
report data collected at 30 and 42 months, but provided this data to
5.2.8 | Selective reporting (reporting bias) us on request. Henggeler 1999a did not report main outcomes ob-
tained at 18 months; Henggeler 1999b did not report outcomes at 30
Only four studies had protocols in a national or international registry. months; Henggeler 2006 did not report outcomes at 18, 24, 36, 48
All four registered studies were conducted by independent in- and 60 months; Letourneau 2009 did not report outcomes at 18 or
vestigators in the UK or The Netherlands. Only one of these studies 122 months (these authors did not respond to requests for missing
(the START trial, Fonagy 2018) was registered prospectively (before data). There is reason to think that nonreporting of outcomes at
data collection began) and this is the only study that fully reported all these end points may be a biased decision, related to null results: For
planned outcomes at all planned endpoints. example, although the full report on 10‐year follow‐up on the Le-
Selective reporting was evident when authors limited analyses to tourneau 2009 study is not available, the abstract stated that
some cases, sites, outcomes, or endpoints for reasons related to “Between‐groups analyses indicated that MST‐PSB was no more ef-
outcomes. This includes a focus on treatment completers or suc- fective than TAU on most criminal and noncriminal outcomes”
cessful cases (Borduin 1995, Schaeffer 2000) and omission of the site (Sheerin 2017).
with the “poorest” outcomes (Ogden 2004). To investigate the possibility of outcome reporting bias, we
Selective reporting was evident when authors only reported ef- present an outcome matrix, following the work of Dwan 2010. Re-
fects sizes for the outcomes that had demonstrated statistically sig- sults show that partial reporting of outcomes was common in MST
nificant results that favoured MST (e.g., Swenson 2010, p. 504). Some trials (Figure 5).
studies limited follow‐up assessments to outcomes that had pre- To investigate the possibility of publication bias, we produced a
viously favoured MST (e.g., Letourneau 2009), when a fuller assess- funnel plot, using data from the largest pairwise meta‐analysis in our
ment would have provide more completed data (including, perhaps, review. Results, shown in Figure 6, are not easy to interpret. Eleven

FIGURE 5 Reporting status by study and outcome


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
32 of 192 | LITTELL ET AL.

F I G U R E 6 Funnel plot: out‐of‐home


placement, one year after random assignment
(from Analysis 1.1, standard error by odd ratio)

studies provided 12 independent estimates of effects of MST on out‐ We requested fixed‐interval data (1‐year follow‐ups) from
of‐home placements of all types at one year after random assign- authors, and received this for two studies (Butler 2011,
ment. Studies conducted by MST program developers in the USA Leschied 2002).
tended to have effects favouring MST, while those conducted outside One study had somewhat different observation periods for MST
of the USA by independent teams had more negative effects. Given and control cases (Timmons‐Mitchell 2006).
this pattern, it is difficult to tell whether publication bias was an
issue.
5.2.10 | Validated outcome measures

5.2.9 | Standardised follow‐up periods Most self‐report measures were based on standardised instruments
and measures used in previous studies. Questions can be raised
As mentioned above, some studies had unstandardised observation about the suitability of some instruments in certain samples (e.g., the
periods (i.e., cases observed for different lengths of time, with no ad- self‐esteem scale used in the Henggeler 1999b study was developed
justments for variations in the length of observations). In some studies, for use with Mexican‐American youth (Simpson 1992), but this
this range was quite substantial. For example, the length of the ob- study's sample was only 1% Hispanic).
servation period in the Borduin 1990 was an average of 37 months, but Some standardised instruments were adapted for the purposes
the range was 21 to 49 months. The Henggeler 1992 study had a mean of a particular study; thus, there were cross‐study variations in the
observation period of 59 weeks, with a range of 16 to 97 weeks. Authors content of some “standardised” measures. For example, in the Og-
reported the percentage of successes/failures on several measures, in- den 2004 study, back‐translation methods were used for some
cluding all available observations, regardless of the length of observation. measures (e.g., the CBCL) and not others; authors' reports on the
For example, the percentage of recidivists among sex offenders in the internal consistency of these modified scales indicated that this was a
Borduin 1990 study included one case observed for 21 months and one reasonable approach. Letourneau and colleagues modified some
observed for 49 months; we do not know whether the 21‐month case standardised instruments so that all outcome measures referred to
recidivated within the next 28 months, hence its outcome is not participants' experiences in past three months, instead of varying
comparable to cases observed over a longer periods of time. time frames (Letourneau 2009); the adequacy of this approach was
In the Henggeler 1997 study, archival data were collected at a bolstered by data on the internal consistency of the measures used in
fixed point in time (1.7 years after the end of the project) and then this study sample.
annualized to account for variations in the follow‐up observation In earlier studies (those conducted before 2005), authors
period (e.g., by computing number of rearrests per year observed). rarely reported information on the performance (e.g., internal
Since recidivism rates tend to decline over time, cases with longer consistency or inter‐rater agreement) of standardised instruments
follow‐up observation periods are likely to have lower annualized in their study samples. This reporting was more complete in later
rates than those with shorter observation periods. studies.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 33 of 192

T A B L E 8 ROB ratings (not including


N (row %) of ROB ratings (not including COI)
COI), by study location (USA/non‐USA) and
developer involvement Contrast k Low risk Unclear High risk χ2 (df) p value

USA developer 13 33 (22%) 70 (46%) 48 (32%) 9.00 (4) .09

USA independent 3 8 (24%) 14 (42%) 11 (33%)

Non‐USA 10 31 (38%) 34 (41%) 17 (21%)


Independent

Note: COI ratings were omitted because all USA developer studies had high risks of bias on this item.
For differences between USA and non‐USA trials, χ2 = 7.82, df = 2, p = .02. For differences between
developer‐led and independent trials, χ2 = 5.09, df = 2, p = .08.
Abbreviations: COI, conflicts of interest; ROB, risk of bias.

5.2.11 | Conflicts of interest (COI) Results of USA/developer studies indicated that odds of out‐of‐
home placement for MST cases were about half of the odds for
We rate the likelihood of bias related to two types of COI: situations control cases (pooled OR 0.52, 95% CI 0.32 to 0.84), a statistically
in which (1) investigators could benefit if results favoured MST and significant effect (P = .007). Note that there were substantial varia-
(2) investigators could benefit if results favoured a control/compar- tions between the studies within this subgroup in terms of their
ison group. We found conflicts of the first type, but not the second. effect sizes (heterogeneity χ2 = 22.24, df = 7, P = 0.002; I2 = 69%).
More than half (13) of the studies in this review were conducted by The non‐USA/independent studies in this analysis included two
MST developers, board members and shareholders, or former share- smaller trials with wide CIs and two larger studies with more precise
holders of MST Services, Inc. (or MST Services LLC), the for‐profit estimates. Across these four studies, the odds of placement were
consulting firm that promotes, disseminates, and licences MST services slighter greater for MST cases than for controls at one year after
and/or MST Associates LLC, the organisation that provides training for referral (OR 1.14, CI 0.84 to 1.55), but this pooled effect iwas not
MST‐PSB. Some authors served as the clinical supervisor for cases in significantly different from no effect (P = .40). Findings (no evidence
the MST arm of their study (e.g., Borduin 1990, Swenson 2010). of effects) were consistent within this subgroup of studies (χ2 = 1.49,
Ten studies were conducted by independent teams, but six of df = 3, P = 0.69; I2 = 0%).
these teams did not provide COI statements. One independent study Overall effects of MST on out‐of‐home placements at one year
had a high risk of COI (Asscher 2013) and three studies (Butler 2011, were heterogeneous (χ2 = 36.18, df = 11, P = 0.0002; I2 = 70%), with
Fonagy 2018, Sundell 2006) were assessed to have low risk of COI significant differences in results between the two subgroups (USA/
(details are provided in Characteristics of included studies). developer‐involved versus non‐USA/independent studies; χ2 = 7.42,
df = 1, P = 0.006; I2 = 86.5%).
It is important to note that the “base rates” for placement dif-
5.2.12 | ROB and other study characteristics fered in USA and non‐USA studies: In the absence of MST, 40% (248/
619) of all control cases in USA/developer studies experienced out of
Studies conducted in the USA had more serious (high) risks of bias home placement within the first year, compared with 17% (102/599)
than those conducted outside of the USA, even after we excluded of control cases in non‐USA/independent studies (Analysis 1.1), a
ratings of COI (Table 8). substantial difference (d(probit) = 0.70).
Analysis 1.2 shows results for placements at approximately 2.5
years (19–40 months) after random assignment. Analysis of data
5.3 | Effects of interventions provided by two studies conducted in the USA by MST developers
and four studies conducted outside of the USA by independent teams
5.3.1 | Out‐of‐home placements showed no evidence of effects of MST on placements within or
across the two subgroups of studies (pooled OR 0.81, CI 0.54 to 1.41;
Eleven studies provided dichotomous data on out‐of‐home placements P = .29). Overall results were homogeneous, and the test for differ-
at approximately one year (9 to 18 months) after random assignment. ences between subgroups was not significant.
Of these studies, seven were conducted in the USA with MST pro- Only two studies provided data on placement rates at 4–5 years
gram developers and four were conducted outside of the USA by (41–60 months) after random assignment; both were non‐USA/
independent teams. One of the USA studies (Glisson 2010) provided independent studies and both found no evidence of effects of MST on
data on two nonoverlapping samples. Results, shown in Analysis 1.1, out‐of‐home placement rates at this point in time (Analysis 1.3,
were sorted by study date (earliest to latest) within two subgroups, pooled OR 0.91, CI 0.51 to 1.62; P = .75).
and effect sizes (odds ratios) were pooled within subgroups and Ogden 2004 reported data on out‐of‐home placements for only
overall. three of four sites, and some MST cases were replaced with others
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
34 of 192 | LITTELL ET AL.

cases in this study. After outcome data were collected, authors declined CE analysis (with 56 ES from 17 outcomes), the overall effect of MST
to release results for the one site with the “poorest MST outcomes”, was −0.27 (CI −0.43 to –0.12, P < .01; PI, −0.72 to –0.17). Moderator
citing concerns about possible “misinterpretation” of results analysis showed that effects of MST on placement were significantly
(Ogden 2004). To estimate the range of effects that could have been greater in USA/developer studies (ES −0.33, CI −0.57 to −0.10;
detected if data on all cases had been reported in this study, we con- P = .01), and in studies with high risks of bias related to deviations
ducted a best‐case/worst‐case analysis. Results, shown in Analysis 1.4, from ITT analysis (ES −0.33, CI −0.57 to −0.10; P < .01). Again, when
indicate that MST could have had a wide range of effects in this study, these moderators were included in the analysis, overall effects of
from significant reductions (OR 0.16) to substantial increases (OR 1.71) MST (intercepts) were not significant.
in out‐of‐home placements at two years after random assignment. Henceforth, we focus on results of the combined CE models,
These results—and concerns about high risks of bias in this study— because these analyses have more statistical power than separate CE
suggest that results of this study should be viewed with caution. models for ORs or SMDs.
Six USA/developer studies provided data on effects of MST on
the duration of out‐of‐home placements at one year after random
assignment, but effect sizes can be calculated for only three of 5.3.2 | Arrest or conviction of a criminal offence
these studies (others are missing SDs and/or CIs). Shown in Ana-
lysis 1.5, three USA/developer studies showed overall reductions Rates of arrest or conviction approximately one year after random
in the length of placements at one year (pooled SMD −0.43, CI assignment are shown in Analysis 2.1. There are three subgroups
−0.66 to −0.21; P < .001), but these results were not replicated in of studies in this analysis: two USA/developer studies showed
two larger, non‐USA, independent studies (pooled SMD −0.03, CI nonsignificant reductions in the odds of arrest (pooled OR 0.60, CI
−0.16 to 0.11; P = .71). Analysis of “base rates” showed that con- 0.34 to 1.07; P = .08), while one USA/independent study (OR 1.10,
trol cases experienced longer placements in the USA (weighted CI 0.37 to 3.26; P = .86) and four non‐USA/independent studies
mean of 71 days, k = 6) than in non‐USA studies (weighted mean 26 (pooled OR 0.89, CI 0.69 to 1.16; P = .39) found no evidence of
days, k = 2). effects of MST on arrest/conviction. There was little hetero-
At about 2.5 years, one USA/developer study found no sig- geneity of effects within or between subgroups. Overall effects of
nificant effects on length of placements, but an independent study MST and differences between subgroups were not statistically
showed that MST reduced the duration of placements in Sweden significant.
(Analysis 1.6). Analysis of “base rates” provides important contextual informa-
At 4–5 years, two non‐USA/independent studies found no evi- tion: Within one year after random assignment, arrest/conviction
dence of effects of MST on duration of placements (Analysis 1.7, occurred in 49% (61/125) of control cases in three USA studies,
pooled SMD 0.05, CI −0.10 to 0.19; P = .52). compared with 27% (161/588) of control cases in three non‐USA
Correlated effects (CE) models, shown in Table 9, included all studies (Analysis 2.1; d(probit) = 0.57).
available effect sizes for placement outcomes from all endpoints. Analysis 2.2 shows results at 2.5 years. Three USA/developer
Overall, MST reduced the odds that youth were placed outside of studies and three USA/independent studies provided no consistent
their home (OR 0.66, 95% CI 0.43 to 1.03; P = .06), but these results evidence of effects (pooled ORs in these subgroups were not sta-
were greater in USA/developer studies (OR 0.50, CI 0.24 to 1.05; tistically different from no effect). The pooled OR for five non‐USA/
P = .06) and in studies with high risks of bias related to ITT analysis independent studies showed that MST increased rates of arrest or
(OR 0.44, CI 0.22 to 0.87; P = .02). Similarly, MST reduced the conviction at 2.5 years in these studies (OR 1.27, CI 1.01 to 1.60;
duration of out‐of‐home placements (SMD 0.24, CI 0.41 to 0.08; P = .04). Overall results were not significantly different from no effect
P < .01), with larger effects in USA/developer studies (SMD −0.31, CI (OR 0.97, CI 0.71 to 1.31; P = .82).
−0.53 to −0.09; P = .02) and in studies with high risks of bias due to Again, youth in the control groups were more likely to be ar-
deviations from ITT analysis (SMD −0.23, CI −0.50 to 0.04; P = .09). rested in the USA (63%, k = 6) than in other countries (28%, k = 5).
When USA/developer or ITT moderators were included in CE models, Within the USA, arrest rates were similar for control cases in
the main effects of MST on placements (intercepts) were not developer‐led (64%, k = 3) and independent studies (61%, k = 3) at
statistically significant. about 2.5 years.
Our other moderators (time, attrition, other ROB variables) were At 4 to 5 years, results were available for one USA/developer
not related to placement outcomes, or these relationships could not study and three non‐USA/independent studies (Analysis 2.3). The
be reliably estimated (df < 4; Table 9). USA study (Borduin 1995, n = 176) reported dramatic reductions in
Overall, MST effects on placement have wide PIs, indicating that arrests at four years for the MST group (OR 0.14, CI 0.07 to 0.27),
future studies can expect to find substantial decreases or increases in while non‐USA/independent studies (total n = 954) showed no dif-
placement outcomes (PI for OR 0.24 to 1.84, PI for SMD −0.60 to ferences in arrest rates between MST and control groups (OR 1.14,
−0.12; Table 9). CI 0.79 to 1.63; P = .49). Differences between the two subgroups
The same patterns emerged when we converted ORs to SMDs were statistically significant (p < .001), and the overall effect of MST
and included all effect sizes in CE models (Table 10). In the combined was not significant (OR 0.72, CI 0.26 to 2.01; P = .54).
T A B L E 9 CE models: Robust variance estimates for dichotomous and continuous outcomes
LITTELL

Dichotomous Continuous
ET AL.

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Placement outcomes k = 13, nES = 28 k = 11, nES = 28

Overall Mean ES 0.66 1.22 −2.07 10.6 0.43 1.03 + −0.24 0.07 −3.35 8.8 −0.41 −0.08 **

PI 0.24 1.84 −0.60 0.12

Moderator

USA/developers Intercept 0.99 1.17 −0.07 3.6 0.63 1.56 −0.04 0.05 −0.84 1.8 −0.27 0.19

USA/developer 0.50 1.38 −2.14 8.3 0.24 1.05 + −0.31 0.09 −3.54 5.2 −0.53 −0.09 *

Time Intercept 0.67 1.21 −0.25 0.07

Time 1.01 1.01 1.6 <0.01 <0.01 1.5

Source Intercept 0.59 1.25 −2.37 3.4 0.30 1.15 + −0.15 0.01

Source A 1.21 1.44 0.52 7.2 0.51 2.89 −0.12 0.09 1.5

Attrition Intercept 0.88 1.17 −0.22 0.09 −2.36 5.9 −0.45 0.01 +

Attrition 0.01 8.08 2.9 −0.25 0.60 −0.41 5.4 −1.75 1.25

Differential attrition Intercept 0.66 1.22 −2.06 8.6 0.42 1.04 + −0.23 0.10

Diff attrition 0.33 >99 −0.20 4.1 0.00 >99 −0.68 2.33 3.8

Sequence generation Intercept 0.93 1.30 −0.28 3.1 0.41 2.13 −0.20 0.15

Unc/High ROB 0.61 1.44 −1.35 6.0 0.25 1.50 −0.07 0.17 3.3

Baseline equivalence Intercept 0.57 1.45 −1.53 3.9 0.20 1.61 −0.19 0.11 −1.66 3.7 −0.51 0.14

High ROB 1.33 1.53 0.67 9.3 0.51 3.47 −0.12 0.14 −0.86 8.1 −0.44 0.20

Performance bias Intercept 0.77 1.20 −1.39 5.8 0.49 1.22 −0.14 0.09 −1.62 3.7 −0.40 0.11

High ROB 0.65 1.64 −0.87 7.8 0.21 2.03 −0.19 0.11 −1.66 7.5 −0.46 0.08

ITT analysis Intercept 1.04 1.13 0.29 3.9 0.74 1.46 −0.12 0.08 −1.48 3.2 −0.36 0.13

Unc/High ROB 0.44 1.35 −2.74 8.6 0.22 0.87 * −0.23 0.12 −1.96 7.5 −0.50 0.04 +

Selective reporting Intercept 0.68 1.44 −1.07 3.9 0.24 1.89 −0.06 0.25

High ROB 0.95 1.53 −0.13 9.3 0.37 2.45 −0.08 0.10 3.3

(Continues)
| 35 of 192

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
36 of 192

Dichotomous Continuous
|

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Arrest/conviction outcomes k = 14, nES = 71 k = 16, nES = 82

Overall Mean ES 0.72 1.16 −2.21 11.5 0.53 1.00 * −0.15 0.06 −2.41 13.3 −0.29 −0.02 *

PI 0.31 1.69 −0.57 0. 26


b
Moderator

USA Intercept 1.01 1.12 0.05 3.6 0.73 1.38 0.01 0.05 0.15 3.8 −0.13 0.14

USA 0.58 1.26 −2.36 8.6 0.34 0.98 * −0.26 0.10 −2.73 9.4 −0.48 −0.05 *

Developers Intercept 0.91 1.14 −0.68 6.2 0.66 1.26 −0.08 0.10 −0.81 4.7 −0.36 0.19

Developer 0.56 1.34 −2.00 8.3 0.29 1.09 + −0.13 0.13 −1.01 11.5 −0.41 0.15

Time Intercept 0.69 1.15 −0.18 0.06

Time 1.00 1.00 1.4 <0.01 <0.01 1.5

Attrition Intercept 0.77 1.16 −0.08 0.07 −1.13 8.5 −0.25 0.08

Attrition 0.50 3.70 3.3 −0.53 0.51 −1.04 7.1 −1.74 0.68

Differential attrition Intercept 0.71 1.17 −0.17 0.09 −1.92 10.6 −0.36 0.03 +

Diff attrition 1.79 >99 3.0 0.55 1.35 0.41 4.1 −3.17 4.26

Sequence generation Intercept 0.73 1.30 −1.17 4.4 0.36 1.49 −0.19 0.13 −1.45 4.7 −0.53 0.15

Unc/High ROB 0.96 1.37 −0.13 9.9 0.47 1.95 0.05 0.15 0.34 10.4 −0.28 0.38

Baseline equivalence Intercept 0.77 1.25 −1.16 3.2 0.39 1.52 −0.09 0.09 −1.01 4.1 0.34 0.16

High ROB 0.89 1.35 −0.49 6.8 0.44 1.80 −0.11 0.12 −0.85 9.6 −0.39 0.17

Performance bias Intercept 0.89 1.18 −0.73 5.3 0.58 1.35 −0.04 0.07 =−0.65 5.9 −0.20 0.12

High ROB 0.67 1.32 −1.45 10.4 0.36 1.24 −0.23 0.11 −2.00 12.1 −0.47 0.02 +

ITT analysis Intercept 0.83 1.16 −1.24 6.6 0.59 1.18 −0.09 0.07 −1.25 6.7 −0.25 0.08

Unc/High ROB 0.68 1.36 −1.26 7.6 0.33 1.39 −0.15 0.13 −1.19 11.6 −0.43 0.13

Selective reporting Intercept 0.76 1.21 −1.49 5.9 0.48 1.20 −0.13 0.10 −1.27 4.9 −0.39 0.13

High ROB 0.89 1.36 −0.37 10.2 −.45 1.77 −0.05 0.13 −0.35 11.4 −0.33 0.24
LITTELL
ET AL.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
LITTELL

Dichotomous Continuous
ET AL.

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Delinquency outcomes k = 2, nES = 3 k = 14, nES = 58

Overall Mean ES 0.60 1.31 0.35 1.01 −0.26 0.12 −2.06 12.8 −0.53 0.01 +

PI −1.28 0.77
c
Moderator

USA Intercept −0.33 0.25 −1.31 4.9 −0.97 0.31

USA 0.12 0.28 0.44 10.9 −0.49 0.74

Developers Intercept −0.28 0.21 −1.30 5.9 −0.80 0.24

Developer 0.04 0.26 0.16 11.9 −0.52 0.60

Time Intercept −0.25 0.12

Time 0.01 0.01 3.7

Attrition Intercept −0.36 .22 −1.61 8.0 −0.87 0.16

Attrition 0.59 0.74 0.80 7.4 −1.14 2.33

Differential attrition Intercept −0.25 0.17

Diff attrition −0.06 1.53 3.2

Sequence generation Intercept −0.43 0.27 −1.60 4.9 −1.13 0.27

Unc/High ROB 0.31 0.28 1.08 10.8 −0.32 0.93

Baseline equivalence Intercept −0.20 0.12 −1.61 3.0 −0.60 0.20

High ROB −0.08 0.21 −0.38 5.8 −0.61 0.45

Performance bias Intercept −0.34 0.22 −1.57 6.9 −0.85 0.17

High ROB 0.19 0.23 0.79 11.0 −0.33 0.70

ITT analysis Intercept −0.32 0.20 −1.57 7.0 −0.80 0.16

Unc/High ROB 0.15 0.23 0.66 10.6 −0.34 0.66

Selective reporting Intercept −0.14 0.42 −0.33 3.2 −1.42 1.14

High ROB −0.05 0.18 −0.26 4.3 −0.54 0.44

(Continues)
| 37 of 192

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
38 of 192

Dichotomous Continuous
|

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Substance abuse outcomes k = 3, nES = 5 k = 9, nES = 70

Overall Mean ES 0.53 1.20 0.37 0.77 ** −0.05 0.11 −0.48 8.0 −0.31 0.20

PI −1.29 1.18

Moderator

USA Intercept −0.04 0.06

USA −0.02 0.16 1.7

Developers Intercept −0.14 0.15 −0.88 3.0 −0.63 0.36

Developer 0.16 0.23 0.67 6.7 −040 0.71

Time Intercept −0.07 0.08

Time <0.01 0.01 3.3

Source Intercept −0.54 0.11

Source Y 0.57 0.15 1.4

Attrition Intercept −0.08 0.11 −0.70 4.6 −0.36 0.21

Attrition 0.13 1.12 0.12 4.4 −2.87 3.13

Differential attrition Intercept −0.06 0.10

Diff attrition 0.13 0.92 3.0

Sequence generation Intercept −0.22 0.19 −1.16 2.0 −1.03 0.59

Unc/High ROB 0.25 0.24 1.07 4.2 −0.39 0.90

Baseline equivalence Intercept −0.15 0.06

High ROB 0.13 0.16 1.7

Performance bias Intercept 0.09 0.19 0.49 3.0 −0.50 0.68

High ROB −0.25 0.23 −1.08 6.5 −0.82 0.31

ITT analysis Intercept −0.06 0.09 −0.68 4.0 −0.32 0.19

Unc/High ROB 0.02 0.27 0.08 6.5 −0.63 0.68

Selective reporting Intercept 0.04 0.26

High ROB −0.04 0.13 2.4


LITTELL
ET AL.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
LITTELL

Dichotomous Continuous
ET AL.

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Peer relations outcomes k = 0, nES = 0 k = 13, nES = 69

Overall Mean ES 0.19 0.15 1.27 11.9 −0.14 0.51

PI −1.53 1.91

Moderator

USA/Developer Intercept −0.04 0.20 −0.21 5.0 −0.57 0.48

USA/developer 0.45 0.29 1.57 10.7 −0.18 1.08

Time Intercept 0.21 0.13

Time <0.01 <0.01 1.4

Source Intercept 0.29 0.09 3.35 6.6 0.08 0.50 *

Source Y −0.21 0.26 −0.80 11.4 −0.79 0.37

Attrition Intercept 0.41 0.17 2.48 7.6 0.03 0.80 *

Attrition −1.31 0.96 −1.37 4.3 −3.91 1.28

Differential attrition Intercept 0.11 0.12

Diff attrition 1.13 2.42 3.9

Sequence generation Intercept 0.37 0.19 1.99 4.0 −0.15 0.90

Unc/High ROB −0.31 0.28 −1.09 8.8 −0.94 0.33

Baseline equivalence Intercept 0.07 0.28 0.23 4.9 −0.67 0.80

High ROB 0.23 0.32 0.72 10.5 −0.48 0.93

Performance bias Intercept 0.25 0.22 1.17 7.9 −0.25 0.76

High ROB −0.20 0.22 −0.92 5.9 −0.75 0.34

ITT analysis Intercept 0.14 0.28 0.50 5.0 −0.58 0.86

Unc/High ROB 0.10 0.32 0.31 10.7 −0.61 0.80

Selective reporting Intercept 0.32 0.51 0.62 3.8 −1.13 1.76

High ROB −0.05 0.15 −0.34 4.7 −0.45 0.35

(Continues)
| 39 of 192

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
40 of 192

Dichotomous Continuous
|

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Youth behaviour and symptoms k = 3, nES = 18 k = 20, nES = 425

Overall Mean ES 0.59 1.10 0.49 0.71 ** −0.12 0.07 −1.66 18.8 −0.28 0.03

PI −1.48 1.24

Moderator

USA Intercept −0.07 0.05 −1.36 6.0 −0.19 0.05

USA −0.09 0.12 −0.69 12.6 −0.36 0.18

Developers Intercept −0.16 0.09 −1.81 8.0 −0.36 0.04

Developer 0.07 0.15 0.45 17.3 −0.25 0.38

Time Intercept −0.06 0.07

Time 0.01 <0.01 2.7

Source Intercept −0.45 0.23 −1.93 2.7 −1.24 0.34

Youth 0.32 0.24 1.30 4.1 −0.35 0.98

Parents 0.42 0.25 1.66 4.1 −0.28 1.11

Attrition Intercept −0.11 0.13 −0.84 10.7 −0.41 0.18

Attrition −0.06 0.47 −0.12 12.2 −1.07 0.96

Differential attrition Intercept −0.15 0.08 −1.92 11.5 −0.33 0.02 +

Diff attrition 0.53 0.44 1.22 6.1 −0.53 1.59

Sequence generation Intercept 0.17 0.18 0.94 7.0 −0.25 0.58

Unc/High ROB −0.07 0.18 −0.39 15.2 −0.46 0.31

Baseline equivalence Intercept −0.12 0.08 −1.45 4.9 −0.33 0.09

High ROB −0.01 0.13 −0.05 9.2 −0.30 0.29

Performance bias Intercept −0.17 0.08 −2.08 9.8 −0.36 0.01 +

High ROB 0.11 0.16 0.68 17.3 −0.22 0.43

ITT analysis Intercept −0.14 0.10 −1.40 8.0 −0.36 0.09

Unc/High ROB 0.03 0.15 0.17 17.3 −0.29 0.34

Selective reporting Intercept −0.31 0.26 −1.20 4.5 −1.01 0.38

High ROB 0.07 0.10 0.72 5.8 −0.18 0.33


LITTELL
ET AL.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
LITTELL

Dichotomous Continuous
ET AL.

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Parent behaviour and symptoms k = 2, nES = 5 k = 16, nES = 129

Overall Mean ES 1.39 1.20 0.97 2.00 −0.16 0.06 −2.85 13.9 −0.29 −0.04 *

PI −0.79 0.46

Moderator

USA Intercept −0.17 0.07 −2.28 4.8 −0.37 0.02 +

USA 0.01 0.12 0.11 10.9 −0.24 0.27

Developers Intercept −0.13 0.07 −1.83 5.8 −0.31 0.05

Developer −0.06 0.12 −0.51 12.9 −0.31 0.19

Time Intercept −0.16 0.07

Time <0.01 0.01 2.4

Source Intercept −0.15 0.10 −1.55 6.5 −0.39 0.08

Source P −0.02 0.13 −0.12 9.6 −0.31 0.27

Attrition Intercept −0.17 0.08 −1.97 9.3 −0.35 0.02 +

Attrition 0.02 0.37 0.05 7.6 −0.84 0.88

Differential attrition Intercept −0.18 0.06 −2.90 7.5 −0.32 −0.03 *

Diff attrition 0.23 0.87 0.26 6.1 −1.89 2.34

Sequence generation Intercept −0.23 0.10 −2.25 5.7 −0.48 0.02 +

Unc/High ROB 0.11 0.12 0.93 12.4 −0.15 0.38

Baseline equivalence Intercept −0.14 0.09 −1.44 4.5 −0.39 0.12

High ROB −0.04 0.12 −0.36 9.6 −0.32 0.23

Performance bias Intercept −0.25 0.07 −3.56 8.0 −0.41 −0.09 **

High ROB 0.22 0.11 2.04 10.8 −0.02 0.45 +

ITT analysis Intercept −0.18 0.10 −1.78 5.9 −0.43 0.07

Unc/High ROB 0.03 0.12 0.26 13.0 −0.23 0.29

Selective reporting Intercept −0.07 0.22 −0.31 3.7 −0.71 0.58

High ROB −0.04 0.07 −0.53 4.8 −0.23 0.15


|

(Continues)
41 of 192

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
42 of 192

Dichotomous Continuous
|

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

Family functioning outcomes k = 1, nES = 1 k = 15, nES = 95

Overall Mean ES 0.10 0.05 1.83 13.7 −0.02 0.21 +

PI −1.08 1.27

Moderator

USA Intercept 0.09 0.08 1.11 5.0 −0.12 0.31

USA 0.01 0.11 0.09 11.0 −0.24 0.26

Developers Intercept 0.08 0.07 1.17 6.0 −0.09 0.26

Developer 0.03 0.11 0.27 12.7 −0.21 0.27

Time Intercept 0.10 0.08

Time <0.01 0.01 1.6

Source Intercept 0.09 0.07 1.35 11.5 −0.06 0.25

Source P 0.01 0.08 0.12 12.2 −0.16 0.18

Attrition Intercept 0.04 0.09 0.51 8.2 −0.15 0.24

Attrition 0.29 0.26 1.12 7.3 −0.31 0.89

Differential attrition Intercept 0.01 0.06 0.12 8.7 −0.12 0.14

Diff attrition 1.36 0.55 2.45 4.4 −0.12 2.84 +

Sequence generation Intercept 0.14 0.11 1.37 5.0 −0.13 0.42

Unc/High ROB −0.08 0.12 −0.65 10.9 −0.34 0.19

Baseline equivalence Intercept 0.14 0.06 2.32 3.8 −0.03 0.32 +

High ROB −0.07 0.10 −0.69 7.7 −0.29 0.16

Performance bias Intercept 0.12 0.07 1.71 8.8 −0.04 0.27

High ROB −0.05 0.12 −0.41 8.3 −0.32 0.22

ITT analysis Intercept 0.13 0.06 2.08 5.0 −0.03 0.30 +

Unc/High ROB −0.06 0.10 −0.57 11.2 −0.29 0.17

Selective reporting Intercept 0.32 0.17 1.89 3.7 −0.17 0.81

High ROB −0.09 0.07 −1.23 4.8 −0.28 0.10


LITTELL
ET AL.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 9 (Continued)
LITTELL

Dichotomous Continuous
ET AL.

a
Outcome/moderator OR SE t df LB UB sig SMD SE t df LB UB Sig

School outcomes k = 2, nES = 4 k = 7, nES = 21

Overall Mean ES 0.99 2.71 0.81 1.20 0.38 0.25 1.54 6.0 −0.23 0.97

PI −2.22 2.98
d
Moderator

Developers Intercept 0.59 0.33 1.81 2.0 −0.82 2.01

Developer −0.37 0.50 −0.73 4.5 −1.71 0.97

Time Intercept 0.45 0.21

Time 0.01 0.01 1.5

Source Intercept 0.38 0.30 1.27 5.0 −0.38 1.14

Source Y 0.06 0.30 0.21 5.0 −0.70 0.82

Attrition Intercept 0.46 0.32 1.43 2.2 −0.83 1.75

Attrition −0.34 1.26 −0.27 4.2 −3.75 3.08

Differential attrition Intercept 0.39 0.32

Diff attrition −0.11 8.69 2.0

Sequence generation Intercept 0.96 0.27 3.62 2.0 −0.18 2.10 +

Unc/High ROB −1.00 0.33 −3.02 4.5 −1.89 −0.12 *

Performance bias Intercept 0.41 0.34 1.18 3.0 −0.69 1.50

High ROB −0.05 0.56 −0.08 4.5 −1.55 1.46

ITT analysis Intercept 0.59 0.32 1.82 2.0 −0.80 1.97

Unc/High ROB −0.35 0.50 −0.70 4.5 −1.70 0.99

Selective reporting Intercept 0.81 0.65

High ROB −0.18 0.29 2.5

Note: Analysis assumed 0.8 correlations among dependent ES. Corrections were made for small samples. Results are not reliable if df < 4, so these reports were truncated. Where 1 > k < 5, we used a fixed effect model
to estimate mean effect size and moderator analysis was not possible. Separate (bivariate) models are provided for each moderator. Time = months since random assignment. Source A: administrative data = 1,
other = 0; Source P: parent = 1, other = 0; Source Y: youth = 1, other = 0. Attrition = proportion of cases with missing data. Differential attrition = difference between groups in proportion of cases with missing data. For
sequence generation and ITT analysis: 0 = low risk, 1 = unclear/high ROB; for baseline equivalence, performance bias, and selective reporting: 0 = low/unclear, 1 = high risk of bias. Sig codes: **<.01, *<.05, +<.10.
Abbreviations: CE, correlated effects; k, number of studies; nES, number of effect sizes; PI, prediction interval; ROB, risk of bias.
a
ES were adjusted so that benefits of MST appear as negative effects (reductions) in: placement, arrest, delinquency, substance abuse, youth behaviour, and parent behaviour; and positive effects on peer
|

relations, family functioning, and school outcomes.


b
All continuous ES were derived from administrative data, so there is no variation on data source.
c
All but 2 continuous ES were reported by youth, so we could not analyse differences between data sources.
d
For continuous ES, only one study was conducted outside the USA, so we could not conduct moderator analysis for the USA/non‐USA contrast. Only one study had low/unclear ROB on baseline equivalence,
43 of 192

so we could not assess baseline equivalence as a moderator.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
44 of 192 | LITTELL ET AL.

T A B L E 10 CE models: Robust variance estimates from all effect sizes


Outcomea ES SE t df LB UB Sig

Placement k = 17, nES = 56

Overall Mean ES −0.27 0.07 −3.72 13.5 −0.43 −0.12 **

PI −0.72 0.17

Moderator

USA/developers Intercept −0.04 0.06 −0.74 2.9 −0.24 0.15

USA/developer −0.33 0.10 −3.39 6.7 −0.57 −0.10 *

Time Intercept −0.28 0.08

Time <0.01 <0.01 1.4

Source Intercept −0.29 0.09 −3.13 3.2 −0.57 −0.01 *

Source A 0.02 0.13 0.12 5.9 −0.31 0.34

Attrition Intercept −0.21 0.09 −2.33 9.8 −0.41 −0.01 *

Attrition −0.61 0.54 1.12 7.1 −1.89 0.67

Differential attrition Intercept −0.28 0.08 −3.69 8.7 −0.45 −0.11 **

Diff attrition 0.17 1.49 0.11 5.1 −3.66 3.99

Sequence generation Intercept −0.18 0.13 −1.39 3.6 −0.58 0.21

Unc/High ROB −0.13 0.16 −0.80 6.6 −0.51 0.26

Baseline equivalence Intercept −0.29 0.13 −2.25 4.8 −0.62 0.05 +

High ROB 0.02 0.16 0.13 11.8 −0.32 0.36

Performance bias Intercept −0.23 0.09 −2.46 6.9 −0.45 −0.01 *

High ROB −0.10 0.15 −0.68 11.8 −0.43 0.23

ITT analysis Intercept −0.07 0.07 −0.96 4.1 −0.26 0.13

Unc/High ROB −0.33 0.10 −3.21 9.8 −0.57 −0.10 **

Selective reporting Intercept −0.14 0.31 −0.46 3.1 −1.09 0.81

High ROB −0.06 0.12 −0.49 4.3 −0.37 0.26

Arrest/conviction K = 18, nES = 153

Overall Mean ES −0.15 0.06 −2.42 15.0 −0.27 −0.02 *

PI −0.55 0.26
b
Moderator

USA Intercept <0.01 0.05 −0.02 3.6 −0.15 0.15

US −0.21 0.09 −2.30 7.6 −0.43 <0.01 +

Developers Intercept −0.07 0.08 −0.84 6.3 −0.27 0.13

Developer −0.15 0.12 −1.29 13.9 −0.40 0.10

Time Intercept −0.18 0.06

Time −0.01 0.01 1.4

Attrition Intercept −0.11 0.06 −1.72 10.7 −0.25 0.03

Attrition −0.29 0.47 −0.61 8.0 −1.38 0.80

Differential attrition Intercept −0.15 0.07 −2.17 13.1 −0.31 <0.01 *

Diff attrition 0.32 1.37 0.23 4.4 −3.38 4.02

Sequence generation Intercept −0.17 0.14 −1.23 4.4 −0.54 0.20

Unc/High ROB 0.04 0.15 0.24 9.6 −0.30 0.38

Baseline equivalence Intercept −0.12 0.09 −1.37 4.1 −0.37 0.12


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 45 of 192

TABLE 10 (Continued)

Outcomea ES SE t df LB UB Sig

High ROB −0.04 0.12 −0.32 8.7 −0.31 0.24

Performance bias Intercept −0.05 0.07 −0.73 6.7 −0.21 0.11

High ROB −0.19 0.11 −1.69 13.8 −0.42 0.05

ITT analysis Intercept −0.09 0.06 −1.38 7.6 −0.25 0.06

Unc/High ROB −0.12 0.12 −1.00 13.4 −0.38 0.14

Selective reporting Intercept −0.13 0.08 −1.58 6.6 −0.32 0.07

High ROB −0.04 0.12 −0.30 14.1 −0.30 0.23

Delinquency outcomes k = 14, nES = 60

Overall Mean ES −0.27 0.12 −2.18 12.8 −0.54 <0.01 *

PI −1.31 0.77
c
Moderator

USA Intercept −0.33 0.25 −1.31 4.9 −0.97 0.31

US 0.10 0.28 0.36 10.9 −0.51 0.71

Developers Intercept −0.28 0.21 −1.30 5.9 −0.80 0.24

Developer 0.01 0.25 0.05 11.9 −0.54 0.57

Time Intercept −0.26 0.11

Time 0.01 0.01 3.2

Attrition Intercept −0.39 0.22 −1.72 8.0 −0.90 0.13

Attrition −0.66 0.75 0.88 7.4 −1.10 2.43

Differential attrition Intercept −0.27 0.17 −1.54 7.5 −0.67 0.14

Diff attrition −0.09 1.48 −0.06 3.4 −4.51 4.34

Sequence generation Intercept −0.43 0.27 −1.60 4.9 −1.13 0.27

Unc/High ROB 0.28 0.28 1.00 10.8 −0.34 0.91

Baseline equivalence Intercept −0.24 0.11 −2.14 3.0 −0.60 0.12

High ROB −0.04 0.21 −0.19 5.7 −0.56 0.48

Performance bias Intercept −0.36 0.21 −1.71 6.9 −0.87 0.14

High ROB 0.21 0.23 0.91 11.0 −0.30 0.72

ITT analysis Intercept −0.32 0.20 −1.57 7.0 −0.80 0.16

Unc/High ROB 0.12 0.23 0.53 10.5 −0.39 0.63

Selective reporting Intercept −0.11 0.41 −0.27 3.2 −1.36 1.14

High ROB −0.06 0.18 −0.36 4.3 −0.55 0.42

Substance abuse outcomes k = 9, nES = 75

Overall Mean ES −0.08 0.11 −0.68 8.0 −0.34 0.19

PI −1.35 1.20

Moderator

USA Intercept −0.05 0.04

US −0.03 0.16 1.7

Developers Intercept −0.14 0.15 −0.94 3.0 −0.63 0.34

Developer 0.13 0.24 0.53 6.7 −0.44 0.70

Time Intercept −0.10 0.08

Time <0.01 0.01 2.7

(Continues)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
46 of 192 | LITTELL ET AL.

TABLE 10 (Continued)

Outcomea ES SE t df LB UB Sig

Source Intercept −0.49 0.13

Source Y 0.50 0.17 2.0

Attrition Intercept −0.10 0.12 −0.88 4.6 −0.41 0.21

Attrition 0.15 1.11 0.13 4.6 −2.80 3.08

Differential attrition Intercept −0.04 0.12

Diff attrition −1.17 1.28 3.4

Sequence generation Intercept −0.23 0.18 −1.25 2.0 −1.01 0.56

Unc/High ROB 0.23 0.24 0.98 4.2 −0.41 0.88

Baseline equivalence Intercept −0.20 0.11

High ROB 0.16 0.19 1.7

Performance bias Intercept 0.07 0.20 0.34 3.0 −0.57 0.71

High ROB −0.26 0.24 −1.06 6.5 −0.84 0.33

ITT analysis Intercept −0.09 0.09 −0.99 4.0 −0.33 0.16

Unc/High ROB 0.02 0.28 0.08 6.5 −0.65 0.69

Selective reporting Intercept 0.06 0.26

High ROB −0.05 0.13 2.4

Peer relations outcomes k = 13, nES = 69

Overall Mean ES 0.19 0.15 1.27 11.9 −0.14 0.51

PI −1.53 1.91

Moderator

USA/Developer Intercept −0.04 0.20 −0.21 5.0 −0.57 0.48

USA/developer 0.45 0.29 1.57 10.7 −0.18 1.08

Time Intercept 0.21 0.13

Time <0.01 <0.01 1.4

Source Intercept 0.29 0.09 3.35 6.6 0.08 0.50 *

Source Y −0.21 0.26 −0.80 11.4 −0.79 0.37

Attrition Intercept 0.41 0.17 2.48 7.6 0.03 0.80 *

Attrition −1.31 0.96 −1.37 4.3 −3.91 1.28

Differential attrition Intercept 0.11 0.12

Diff attrition 1.13 2.42 3.9

Sequence generation Intercept 0.37 0.19 1.99 4.0 −0.15 0.90

Unc/High ROB −0.31 0.28 −1.09 8.8 −0.94 0.33

Baseline equivalence Intercept 0.07 0.28 0.23 4.9 −0.67 0.80

High ROB 0.23 0.32 0.72 10.5 −0.48 0.93

Performance bias Intercept 0.25 0.22 1.17 7.9 −0.25 0.76

High ROB −0.20 0.22 −0.92 5.9 −0.75 0.34

ITT analysis Intercept 0.14 0.28 0.50 5.0 −0.58 0.86

Unc/High ROB 0.10 0.32 0.31 10.7 −0.61 0.80

Selective reporting Intercept 0.32 0.51 0.62 3.8 −1.13 1.76

High ROB −0.05 0.15 −0.34 4.7 −0.45 0.35


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 47 of 192

TABLE 10 (Continued)

Outcomea ES SE t df LB UB Sig

Youth behaviour and symptoms k = 20, nES = 443

Overall Mean ES −0.13 0.07 −1.72 18.8 −0.28 0.03

PI −1.52 1.26

Moderator

USA Intercept −0.07 0.05 −1.42 6.0 −0.19 0.05

US −0.09 0.12 −0.71 12.6 −0.36 0.18

Developers Intercept −0.16 0.09 −1.84 8.0 −0.36 0.04

Developer 0.06 0.15 0.42 17.3 −0.25 0.37

Time Intercept −0.07 0.07

Time 0.01 <0.01 2.7

Source Intercept −0.46 0.24

Youth 0.32 0.24 3.9

Parents 0.42 0.25 3.9

Attrition Intercept −0.12 0.13 −0.90 10.8 −0.41 0.17

Attrition −0.04 0.47 −0.09 12.2 −1.05 0.97

Differential attrition Intercept −0.16 0.08 −1.96 11.5 −0.34 0.02

Diff attrition 0.52 0.44 1.18 6.1 −0.55 1.59

Sequence generation Intercept −0.17 0.18 −0.96 7.0 −0.58 0.25

Unc/High ROB 0.07 0.18 0.38 15.3 −0.32 0.45

Baseline equivalence Intercept −0.12 0.08 −1.46 4.9 −0.34 0.09

High ROB −0.01 0.13 −0.07 9.2 −0.30 0.29

Performance bias Intercept −0.18 0.08 −2.15 9.8 0.36 0.01 *

High ROB 0.11 0.16 0.71 17.3 −0.22 0.44

ITT analysis Intercept −0.14 0.10 −1.43 8.0 −0.37 0.09

Unc/High ROB 0.02 0.15 0.15 17.3 −0.30 0.34

Selective reporting Intercept −0.31 0.26 −1.20 4.5 −1.01 0.38

High ROB 0.07 0.10 0.70 5.8 −0.18 0.33

Parent behavior and symptoms k = 16, nES = 134

Overall Mean ES −0.16 0.06 −2.76 13.7 −0.28 −0.04 *

PI −0.77 0.45

Moderator

USA Intercept −0.17 0.07 −2.28 4.8 −0.37 0.02 +

US 0.02 0.12 0.15 11.0 −0.24 0.27

Developers Intercept −0.13 0.07 −1.83 5.8 −0.31 0.05

Developer −0.06 0.12 −0.47 12.9 −0.32 0.20

Time Intercept −0.16 0.07

Time <0.01 0.01 2.2

Source Intercept −0.15 0.10 −1.51 7.2 −0.37 0.08

Source P −0.02 0.13 −0.15 10.0 −0.30 0.26

Attrition Intercept −0.16 0.09 −1.87 9.2 −0.36 0.03 +

(Continues)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
48 of 192 | LITTELL ET AL.

TABLE 10 (Continued)

Outcomea ES SE t df LB UB Sig

Attrition <0.01 0.37 0.01 7.6 −0.86 0.87

Differential attrition Intercept −0.17 0.06 −2.84 7.3 −0.32 −0.03 *

Diff attrition 0.21 0.86 0.25 6.1 −1.89 2.31

Sequence generation Intercept −0.23 0.11 −2.14 5.6 −0.49 0.04 +

Unc/High ROB 0.12 0.13 0.92 12.1 −0.16 0.39

Baseline equivalence Intercept −0.13 0.09 −1.43 4.4 −0.38 0.12

High ROB −0.04 0.12 −0.36 9.7 −0.32 0.23

Performance bias Intercept −0.25 0.07 −3.56 8.0 −0.41 −0.09 **

High ROB 0.22 0.10 2.14 10.3 −0.01 0.45 +

ITT analysis Intercept −0.18 0.10 −1.79 5.8 −0.42 0.07

Unc/High ROB 0.04 0.12 0.30 12.9 −0.22 0.30

Selective reporting Intercept −0.08 0.22 −0.34 3.6 −0.71 0.56

High ROB −0.04 0.07 −0.49 4.7 −0.23 0.16

Family functioning outcomes k = 15, nES = 96

Overall Mean ES 0.10 0.05 1.77 13.7 −0.02 0.21 +

PI −1.08 1.27

Moderator

USA Intercept 0.09 0.08 1.11 5.0 −0.12 0.31

US 0.01 0.11 0.06 11.1 −0.24 0.26

Developers Intercept 0.08 0.07 1.17 6.0 −0.09 0.26

Developer 0.03 0.11 0.24 12.7 −0.22 0.27

Time Intercept 0.09 0.09

Time <0.01 0.01 1.6

Source Intercept 0.09 0.07 1.28 11.5 −0.07 0.25

Source P 0.01 0.08 0.19 12.1 −0.16 0.19

Attrition Intercept 0.04 0.09 0.46 8.2 −0.16 0.24

Attrition 0.29 0.26 1.13 7.3 −0.32 0.91

Differential attrition Intercept <0.01 0.06 0.08 8.7 −0.13 0.14

Diff attrition 1.36 0.56 2.45 4.4 −0.12 2.85 +

Sequence generation Intercept 0.14 0.11 1.30 5.0 −0.14 0.42

Unc/High ROB −0.07 0.12 −0.60 10.9 −0.35 0.20

Baseline equivalence Intercept 0.14 0.06 2.32 3.8 −0.03 0.32 +

High ROB −0.07 0.10 −0.71 7.7 −0.30 0.16

Performance bias Intercept 0.12 0.07 1.71 8.8 −0.04 0.27

High ROB −0.05 0.12 −0.44 8.3 −0.33 0.23

ITT analysis Intercept 0.13 0.06 2.08 5.0 −0.03 0.30 +

Unc/High ROB −0.06 0.11 −0.59 11.2 −0.29 0.17

Selective reporting Intercept 0.33 0.17 1.89 3.7 −0.17 0.82

High ROB −0.09 0.07 −1.24 4.8 −0.29 0.10


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 49 of 192

TABLE 10 (Continued)

Outcomea ES SE t df LB UB Sig

School outcomes k = 8, nES = 25

Overall Mean ES 0.31 0.22 1.39 7.0 −0.22 0.84

PI −1.92 2.55

Moderatord

USA Intercept 0.13 0.23

US 0.24 0.38 1.8

Developers Intercept 0.40 0.29 1.39 3.0 −0.52 1.32

Developer −0.18 0.48 −0.38 6.0 −1.35 0.99

Time Intercept 0.33 0.17

Time <0.01 0.01 1.7

Source Intercept 0.31 0.26 1.19 6.0 −0.33 0.94

Source Y 0.05 0.26 0.21 6.0 −0.58 0.69

Attrition Intercept 0.30 0.27 1.09 3.2 −0.54 1.13

Attrition 0.09 1.16 0.08 5.2 −2.86 3.04

Differential attrition Intercept 0.27 0.25

Diff attrition 1.14 8.70 1.9

Sequence generation Intercept 0.67 0.33 2.04 3.0 −0.38 1.72

Unc/High ROB −0.72 0.38 −1.87 6.0 −1.66 0.22

Performance bias Intercept 0.38 0.34 1.12 3.0 −0.71 1.48

High ROB −0.14 0.48 −0.29 6.0 −1.31 1.04

ITT analysis Intercept 0.39 0.28 1.40 3.0 −0.50 1.29

Unc/High ROB −0.16 0.48 −0.33 6.0 −1.33 1.02

Selective reporting Intercept 0.54 0.56

High ROB −0.09 0.27 3.3

Note: Analysis assumed 0.8 correlations among dependent ES. Corrections were made for small samples. Results are not reliable if df < 4, so these reports
were truncated. Where 1 > k < 5, we used a fixed effect model to estimate mean effect size and moderator analysis was not possible. Separate (bivariate)
models are provided for each moderator. Time = months since random assignment. Source A: administrative data = 1, other = 0; Source P: parent = 1,
other = 0; Source Y: youth = 1, other = 0. Attrition = proportion of cases with missing data. Differential attrition = difference between groups in proportion
of cases with missing data. For sequence generation and ITT analysis: 0 = low risk, 1 = unclear/high ROB; for baseline equivalence, performance bias, and
selective reporting: 0 = low/unclear, 1 = high ROB. Sig codes: **<.01, *<.05, +<.10.
Abbreviations: CE, correlated effects; k, number of studies; nES, number of effect sizes; ROB, risk of bias; PI, prediction interval.
a
ES includes SMDs and dichotomous outcomes converted to SMDs. ES were adjusted so that benefits of MST appear as negative effects (reductions) in:
placement, arrest, delinquency, substance abuse, youth behaviour, and parent behaviour and symptoms; and positive effects on peer relations, family
functioning, and school outcomes.
b
All ES were derived from administrative data, so there is no variation on data source.
c
All but 2 ES were reported by youth, so we could not analyse differences between data sources.
d
Only one study had low/unclear risk of bias on baseline equivalence, so we could not assess baseline equivalence as a moderator.

In the Borduin 1995 study, MST provided an average of 21 hours group, and 25% of refusers (Table 11). There were gender differences
of treatment. Investigators attributed between‐group differences in between subgroups as well, with males making up 59% of MST
arrest rates (at 4, 14, and 22 year follow‐ups) to MST program par- completers, 73% of MST dropouts, and 77% of control cases
ticipation, but there are other plausible explanations. The groups (Table 12). Thus, the control group had higher proportions of non-
were not equivalent at baseline, and differential attrition appears to white youth and males than the MST group. Nonwhite youth and
have exacerbated between‐group differences on race and gender. males are more likely to be arrested in the USA than their White,
Non‐white youth were more likely to refuse treatment (Table 6). female counterparts (Puzzanchera 2009). In the Borduin 1995 study,
White youth comprised 74% of the MST group, 69% of the control arrests rates are related to the proportion of nonwhites and males in
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
50 of 192 | LITTELL ET AL.

T A B L E 11 Racial composition and arrests at 4‐year follow‐up for data from Analysis 2.5: for USA control cases, the weighted average
subgroups in the Borduin 1995 study number of arrests = 1.54, pooled SD = 2.07, k = 5; for non‐USA control
Subgroup % White % arrested n of cases cases, the weighted average = 1.03, pooled SD = 1.75, k = 3; d
(probit) = 0.27).
MST cases 74 26.1 92
At 2.5 years after random assignment, two USA/developer studies
control cases 69 71.4 84
provided very different estimates of MST effects on numbers of arrests
refusers 25 87.5 24 (SMDs −0.84 and −0.06), with pooled effects that were not significantly
Note: N = 200, originally 100 MST and 100 control cases (not including different from zero (SMD −0.29, CI −0.99 to 0.41; P = .42; Analysis 2.6).
10 cases dropped earlier, five in each group). Data on racial composition One independent/US study showed that MST reduced the number of
of MST and control groups are from Schaeffer (2000, p. 120); racial
arrests (SMD −0.56, CI −0.98 to −0.15; P < .001). Four non‐USA/
composition of refusers is derived from Table 6; arrest data are from
Borduin et al. (1995a, p. 573). independent studies provided data at this point in time, but only three
included data sufficient to compute SMDs; pooled results provided no
evidence of effects of MST on number of arrests at this endpoint (SMD
T A B L E 12 Gender and arrests at 4‐year follow‐up for subgroups −0.07, CI −0.20 to 0.05; P = .25).
in the Borduin 1995 study Four years after random assignment, one USA/developer study
Subgroup % male % arrested n of cases and two non‐USA/independent studies provided results showing no
evidence of effects of MST on the number of arrests or convictions
MST completers 59 22.4 77
(SMD 0.00, CI −0.17 to 0.18; P = .97; Analysis 2.7).
MST dropouts 73 45.5 15
CE models showed that MST reduced the odds of arrest (OR 0.72,
control cases 77 71.4 84 CI 0.53 to 1.00; P = .05) and number of arrests/convictions (SMD
Sources: Data on gender are from Schaeffer (2000, pp. 120, 123); arrest −0.15, CI −0.29 to −0.02; P = .03; Table 9), but these effects were
data are from Borduin (1995a, p. 573). greater in USA studies (OR 0.58, CI 0.34 to 0.98; P = .04; SMD −0.26,
CI −0.48 to −0.05; P = .02) and were not significant once this mod-
erator was taken into account.
each group (Tables 11 and 12). We found no published analyses of In the combined CE analysis (with 153 ES from 18 studies,
main effects of race or gender on outcomes, or analyses that con- Table 10), MST reduced arrests by −0.15 standard deviations (CI −0.27
trolled for these potential influences on outcomes in published re- to −0.02; p = .03) with a wide prediction interval (−0.55 to 0.26), in-
ports on the Borduin 1995 trial. An unpublished analysis of 13.7‐year dicating that future studies can expect to find that MST increases or
follow‐up data showed that race was a significant predictor of arrest decreases arrests. Again, moderator analysis showed that effects on
rates and duration of confinement, and gender was associated with arrests are greater in the USA than in other countries (ES −0.21, CI
arrest severity and length of probation in this study (Schaeffer 2000, −0.42 to < 0.01; P = .05; Table 10) and main effects of MST were not
p. 141). We requested but did not receive additional data from significant once this moderator was included in the CE model.
authors. Developer‐led studies produced larger effects on arrests, but
Given concerns about differential attrition, confounding factors, these were not significantly different from effects obtained by in-
and other sources of ROB in the Borduin 1995 study, we conducted dependent studies (ES −0.15, CI −0.40 to 0.10; P = .22; Table 10).
best case/worse case analysis for data on arrests at approximately 22.4 Analyses of relationships between the timing of measurement
years after random assignment. Shown in Analysis 2.4, the range of and effect sizes were unreliable. None of our indicators of ROB were
possible long‐term outcomes for this study includes significant po- related to arrest outcomes (Table 10).
sitive results (reductions in arrests of OR 0.16) and nonsignificant
negative results (increased likelihood of arrest OR 1.34). For these
reasons, results of this study should be interpreted with caution. 5.3.3 | Self‐reported delinquency
Eight studies provided data on the number of arrests or convictions
at one year after referral (see Analysis 2.5). Of these, four USA/ Five studies provided data on self‐reported delinquency (SRD) scales
developer studies showed that MST reduced the mean number of at one year. Two USA/developer studies, one USA/independent
arrests (SMD −0.19, CI −0.38 to 0.00; P = .05). One USA/independent study, and two non‐USA/independent studies all showed that effects
study showed substantial reductions in arrests (SMD −0.89, CI −1.32 of MST on SRD were not significantly different from zero (Analysis
to −0.45; P < .001), and three non‐USA/independent studies showed 3.1). Results were consistent within and across subgroups and
that effects of MST in Canada and the UK were not different from the overall effect was not significant (SMD −0.05, CI −0.17 to
zero (SMD −0.05, CI −0.22 to 0.13; P = .61). Overall effects were 0.07; P = .44).
significant (P = .04), as were differences between the three subgroups Similarly, at 2.5 years, one USA/developer study and three non‐
of studies (P = .002). USA/independent studies consistently showed no evidence of MST
Within one year, youth in the USA control groups were arrested effects on SRD (pooled SMD 0.01, CI −0.13 to 0.15; P = .93;
more often than those in control groups outside of the USA (based on Analysis 3.2).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 51 of 192

At four years after random assignment, one USA/developer assignment. Pooled estimates were not computed, because these
study found that MST reduced SRD scores (SMD −0.40, CI −0.85 to studies used measures of different constructs (bonding, conventional
0.04; P = .07) and one non‐USA/independent trial showed that MST friends, social competence, etc.) at different endpoints.
increased SRD (SMD 0.10, CI −0.05 to 0.25; P = .21; Analysis 3.3). Combined CE analysis (with 69 ES from 13 studies) showed that
Differences between results of these two studies were statistically overall effects of MST on peer relations were not significantly dif-
significant (P = .04), but their pooled effects were not (P = .66). ferent from zero (ES 0.19, CI −0.14 to 0.51; PI −1.53 to 1.91;
Our combined CE analysis showed that, over all measures (60 ES Table 10). None of our moderators were associated with effects on
from 14 studies), MST reduced delinquency (ES −0.27, CI −0.54 to peer relations.
<0.01; P = .05). This estimate was based on reports obtained from
four to 48 months after random assignment, and the CE analysis has
more statistical power (ability to detect effects) than pairwise meta‐ 5.3.6 | Youth behaviour and symptoms
analysis. Neverhteless, this effect has a wide prediction interval (PI
−1.31 to 0.77), indicating that future studies can expect MST to in- Four USA/developer studies reported data on youth externalizing
crease or decrease self‐reported delinquency (Table 10). behaviours at one year, but effect sizes could be computed for only
None of our moderators were related to delinquency outcomes two of these studies, due to missing data in the other two. One USA/
in the CE models (Table 10). independent study also provided data on this outcome. Pooled re-
sults provided no evidence of effects of MST (SMD −0.09, CI −0.56 to
0.38; P = .70; Analysis 6.1).
5.3.4 | Substance use Three non‐USA/independent studies provided data on ex-
ternalizing behaviours at 2.5 years. Although none of these study‐
Five studies provided data on young people's use of substances other level effects was statistically significant, their pooled effectwas sig-
than marijuana or alcohol at approximately one year after referral nificantly different from zero and favoured MST (SMD −0.13, CI
(Analysis 4.1). Results of two USA/developer studies were dissimilar −0.26 to −0.00; P = .04; Analysis 6.2).
(with SMDs of 0.09 and −0.50), as were results of two USA/in- At four years, results of one USA/developer study and one non‐
dependent studies (SMDs −0.59 and 0.29). Pooled results within and USA/independent study both indicated that effects of MST on ex-
across subgroups showed that effects of MST were not significantly ternalising behaviours were not significantly different from zero
different from zero (pooled SMD −0.08, CI −0.38 to 0.23, P = .62). (pooled SMD −0.04, CI −0.18 to 0.10; P = .57; Analysis 6.3).
At 2.5 years, two non‐USA/independent studies showed non- As above, four USA/developer studies provided data on youth
significant results, favouring the control groups (Analysis 4.2). These reports of internalizing behaviours at one year, but effect sizes could
results approached statistical significance when pooled across the be computed for only two of these studies. Results provided no
two studies (SMD 0.13, CI −0.00 to 0.27; P = .05). This suggests that evidence of effects of MST on this outcome (SMD 0.06, CI −0.84 to
MST could increase substance use. 0.96; P = .89; Analysis 6.4).
At four years, MST had no significant impact on substance abuse At 2.5 years, available data on internalizing behaviours came
in one USA/developer trial (SMD −0.03 CI −0.47 to 0.41; P = .89) or from three non‐USA/independent studies. The Ogden 2004 study
in the Fonagy 2018 trial in the UK (SMD 0.10, CI −0.05 to 0.25; reported significant benefits for a subset of (66 of 104) MST cases in
P = .21) (Analysis 4.3). Norway. Results of the Sundell 2006 study (in Sweden) and Fo-
Combined CE analysis (of 75 ES from nine studies) showed no nagy 2018 (in the UK) showed nonsignificant differences between
overall evidence of effects of MST on substance abuse outcomes (ES groups. Pooled results were not significantly different from zero
−0.08, CI ‐0.34 to 0.19; PI −1.35–1.20; Table 10). (SMD −0.27, CI −0.57 to 0.03; P = .07; Analysis 6.5).
Our moderators were not related to effects on substance use At four years, one USA/developer study and one non‐USA/
outcomes, or moderator effects could not be reliably computed. With independent study found no evidence of effects of MST on inter-
only nine studies in these analyses, there was little statistical power nalizing behaviours among youth (pooled SMD 0.02, CI −0.12 to 0.17;
to detect moderator effects. P = .74; Analysis 6.6).
Our combed CE model included 20 studies with 443 measures of
youth behaviour and symptoms. The overall effects of MST on these
5.3.5 | Peer relations outcomes were not significantly different from zero (SMD −0.13, CI
0.28 to −0.03; PI −1.52 to 1.26; Table 10).
Eight studies reported data on peer relations at four to seven months Moderator analyses showed that these ES were not related to:
after random assignment. Shown in Analysis 5.1, two USA/developer USA/non‐USA location, developer/independent researcher, or any of
studies provided results at one year, two non‐USA/independent the ROB variables. Moderator effects related to the timing of out-
studies reported data at about 2.5 years, and one of the non‐USA/ come measurement and data source (youth, parent, or other) could
independent studies reported results at four years after random not be reliably computed.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
52 of 192 | LITTELL ET AL.

5.3.7 | Parent behaviour and symptoms Overall results were not significantly different from zero (SMD 0.11,
CI −0.04 to 0.27; P = .14; Analysis 8.2).
At one year, three studies provided self‐report data on parents' Only one study (the Fonagy 2018 trial) provided data on family
mental health problems, using the GSI‐BSI or GHQ. Results for one adaptability and cohesion beyond one year.
USA/developer study could not be calculated, due to missing data; The combined CE model (96 ES from 15 studies) showed small,
another included so few cases that results were imprecise. Using the positive effects of MST on family functioning which approached statis-
GHQ, the Fonagy 2018 trial showed that MST reduced parental tical significance (ES 0.10, CI −0.02 to 0.21, P = .10; Table 10). The
mental health problems at one year (SMD −0.20, CI, −0.39 to −0.02; prediction interval ranged from −1.08 to 1.27, indicating that large
P = .03; Analysis 7.1). positive and large negative effects on family functioning are possible.
Two non‐US trials provided data on parents' mental health Greater differential attrition was related to more positive family
problems at 2.5 years. The Sundell 2006 trial found no impact of functioning outcomes (ES 1.36, CI −0.12 to 2.85; P = .06; Table 10).
MST, but the Fonagy 2018 study continued to show that MST re- When differential attrition was included in the CE model, main ef-
duced parental mental health problems (Analysis 7.2). Pooled results fects were not significant. None of our other moderators were re-
were not significantly different from zero (SMD −0.11, CI −0.32 to lated to effects on family outcomes.
0.10; P = .31). The Fonagy 2018 trial was the only study that reported
parent outcomes at 4–5 years.
Measures of parental support at one year after random assign- 5.3.9 | School outcomes
ment were provided by two studies. One USA/developer study pro-
vided insufficient data to compute an ES. The Fonagy 2018 trial Only seven studies provided data on school outcomes and these
showed positive, significant impacts of MST on parental support at measures were too diverse for use in pairwise meta‐analysis.
one year (SMD 0.19, CI 0.01 to 0.37; P = .04; Analysis 7.3). One USA study (Henggeler 1999a) provided data on within‐
Studies provided data on a wide range of parenting behaviours, group changes in school attendance, but did not provide data on
including parental control, supervision, monitoring, styles (e.g., au- between‐group comparisons (Brown 1999, pp. 88–89). Another USA
thoritarian, permissive), skills, discipline, communication, consistency, study (Henggeler 1999b) noted that between‐group differences in
and involvement. Pairwise meta‐analysis was not performed because school attendance were not significant at one year, but authors did
studies measured different constructs. not provide data on this outcome (Henggeler 2003). Weiss 2013
Our combined CE model included all measures of parent func- found no significant differences in school attendance at one year
tioning (mental health, support, and parenting behaviours) at all (SMD 0.09, CI −0.21 to 0.40; P = .55; Analysis 9.1).
points in time. With 134 effect sizes from 16 studies in the model, Only one study provided data on school school grades at one year:
overall results showed that MST reduced negative outcomes for Weiss 2013 reported between‐group results that were not significantly
parents (ES −0.16, CI −0.28 to −0.04; P = .02) with a wide prediction different from zero (SMD −0.16, CI −0.46 to 0.15; P = .31; Analysis 9.2).
interval (−0.77 to 0.45; Table 10). Combined CE analysis included eight studies with 25 effect sizes
Moderator analysis showed that high risks of performance bias related to school outcomes. Overall, effects of MST were not sig-
were related to worse outcomes for parents (ES 0.22, CI −0.01 to nificantly different from zero (ES 0.31, CI −0.22 to 0.84; PI −1.92 to
0.45; p = .06). None of our other moderators related to effects of 2.55; Table 10). Moderator analyses were under‐powered.
parent outcomes.

6 | D IS C U S S I O N
5.3.8 | Family functioning
Our review challenges often‐repeated claims about the strength of the
Several studies used the Family Adaptability and Cohesion Evaluation evidence base for MST. Many MST trials have high risks of bias, and
Scales (FACES, version II, III, or IV) to assess family functioning. Three these issues were not adequately addressed in numerous research re-
studies provided data on family adaptability at one year: One USA/de- ports and reviews. Results of MST are not consistent within or across
veloper study provided insufficient data to compute an ES; another had studies. Positive effects are limited to some outcomes and are not re-
very few cases and imprecise results. One USA/independent study liably replicated outside of the USA. Independent replications show that
(Weiss 2013) showed that effects of MST on family adaptability were not MST has fewer benefits and may have some harmful effects when
significantly different from zero. Overall results were not significantly compared with more robust usual services outside of the USA.
different from zero (SMD −0.04, CI −0.35 to 0.27; P = .80; Analysis 8.1). Our pairwise meta‐analyses provide easily interpreted summa-
Four studies provided data from parent reports on family cohe- ries of effects of MST on specific outcomes within and across sub-
sion. As before, USA/developer studies had missing data or imprecise groups at particular endpoints. A limitation of this analysis is that few
estimates. Again, one USA/independent study (Weiss 2013) found no studies measured the same outcomes at the same endpoints. In
significant differences between groups on this measure. The Fo- contrast, our correlated effects (CE) models provide summaries of
nagy 2018 trial also found no significant effects on family cohesion. results across all measures in an outcome domain, including all
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 53 of 192

measured endpoints. We think both approaches are useful, because 6.1.3 | Possible sources of heterogeneity
they provide different lenses on the data. We would not expect these
two approaches to produce identical results, but the fact that they Included studies varied in terms of their geopolitical contexts, sample
often converge and point to the same conclusions is a strength of this characteristics, comparison conditions, and methodologies. As is often
analysis. the case in meta‐analysis, some of these differences are confounded.
Study location, independence, comparison condition, and risks of bias
are confounded, making it impossible to separate the potential mod-
6.1 | Summary of main results erating effects of these factors. Studies conducted in the USA by MST
developers had weaker comparison conditions and higher risks of bias
6.1.1 | Intervention effects than studies conducted by independent investigators in other countries.
Studies conducted in the USA by MST developers tended to suggest
At one year after referral, available evidence shows that MST re- that MST had more favourable benefits than studies conducted outside
duced out‐of‐home placements in studies conducted in the USA, but of the USA by independent investigators. Below we suggest some
not in other countries; effects on other primary outcomes were not possible explanations for these variations.
significant at one year (Table 13). Analyses that include all available It may be that developer involvement improves implementation,
(correlated) effects show that MST reduced placements and arrests/ which in turn results in better outcomes (the high‐fidelity hypothesis,
convictions, but only in the USA. Overall, MST had positive effects on Petrosino 2005). Or developers' conflicts of interest might increase
self‐reported delinquency, parenting behaviour, and family func- likelihood of selective reporting or other biases, which lead to
tioning, but not on youth behaviour and symptoms, substance use, overestimates of treatment effects (the cynical view, Petrosi-
peer relations, or school outcomes. Moderator analyses showed that no 2005). But we think that more plausible explanations lie in un-
placement outcomes may have been affected by departures from derstanding differences in the social service systems in which these
intention‐to‐treat analysis, and parent and family outcomes may have studies were embedded.
been affected by performance bias and differential attrition. Studies conducted in the USA provided more stark contrasts
A closer look shows that results were inconsistent within MST between MST and TAU, as youth in most of the USA TAU control
studies. That is, most trials collected data on multiple outcomes and groups received relatively little attention and few services. TAU
obtained a mixture of positive, negative, and null results. Thus, groups in Canada, the Netherlands, Norway, Sweden, and the UK had
available evidence does not support the hypothesis that MST is access to universal healthcare and social services that were not
consistently more effective than usual services or other interventions publicly available in the USA. Larger MST effects in the USA could be
for youth with social, emotional, or behavioural problems. However, explained by wider between‐group differences in attention and
it is not appropriate to conclude that MST has no effects. In sum, treatment. Under these conditions, it is unclear whether effects of
evidence about the effectiveness of MST is mixed. MST are due to sheer differences in amounts of treatment or to MST
interventions per se.
Different juvenile justice policies and practices might also explain
6.1.2 | Heterogeneity and statistical power divergent findings in the USA and non‐USA contexts. In the USA,
young people can be arrested, convicted, and incarcerated for of-
Given their different populations, problems, settings, and methods, fences that are not treated as criminal acts when committed by youth
we expected MST trials to produce heterogeneous results. We used in Norway and other countries. The frequency and duration of de-
random effects models to take this heterogeneity into account tention, incarceration, and arrest are higher for juveniles in the USA
whenever possible. There is statistical evidence of heterogeneity in than in other high‐income countries (Hazel 2008). When arrest and
results for some outcomes, indicating that different studies point to placement rates are relatively high, there is greater opportunity for
different conclusions. treatment to affect these outcomes. Conversely, there may be “floor
The statistical power of our pairwise meta‐analysis (ability to effects” in settings where these outcomes are relatively rare.
detect significant differences between MST and other services) is We found little evidence that study risks of bias were associated
limited; hence, confidence intervals for some pooled effects are fairly with effect sizes, with a few exceptions: Studies that deviated from
wide. We used CE models to improve statistical power and produce ITT analysis reported significantly greater effects on out‐of‐home
robust variance estimates. However, with only eight to 20 studies in placement than studies with low risks of bias on this indicator. At-
the analysis, some of the CE models are also under‐powered. trition bias was associated with larger effects on family functioning,
Given low statistical power, it is possible that MST has effects and performance bias was associated with poorer outcomes for
that cannot be detected in this set of studies. However, the wide parents. The paucity of associations between ROB variables and ES
prediction intervals in all outcome domains suggest that effects of could be due to low statistical power in the CE models.
MST are uncertain and we cannot rule out the possibility that MST is We found no evidence that different data sources (e.g., admin-
not more effective than other services. istrative data, parent reports, youth reports) produced systematically
T A B L E 13 Summary of findings
54 of 192

Anticipated absolute effects


|

Quality of
Outcomes (1 year post Risk Risk difference with Relative effects Number of evidence
random assignment) without MST MST [95% CI] [95% CI] participants (studies) (GRADE)a Comments

Out‐of‐home placements 287 per 1000 89 fewer per 1000 RR 0.69 [0.63 to 2489 (11) Moderate Significant effects in USA/developer trials; no evidence of
(all types) [3 to 106 fewer] 0.99], OR 0.67 effects in non‐USA/independent trials (Analysis 1.1). Higher
[0.45 to 0.99] risks of bias in USA/developer trials compared with
non‐USA/independent trials.

USA, Developer‐led trials

401 per 1000 120 fewer per 1000 RR 0.70 [0.53 to 1267 (7) Moderate
[36 to 188 fewer] 0.91], OR 0.52
[0.32 to 0.84]

Non‐USA, Independent trials

170 per 1000 15 more per 1000 RR 1.09 [0.87 to 1222 (4) High
[22 fewer to 63 more] 1.37], OR 1.14
[0.84 to 1.55]

Criminal offences (arrests/ 311 per 1000 37 fewer per 1000 RR 0.88 [0.75 to 1445 (6) Moderate Effect is not significantly different from zero (Analysis 2.1).
convictions) [78 fewer to 9 more] 1.03], OR 0.84
[0.67 to 1.06]

Self‐reported delinquency n/a SMD 0.05 lower SMD −0.05 1048 (5) Moderate Effect is not significantly different from zero (Analysis 3.1).
[0.17 lower to 0.07 [−0.17 to 0.07]
higher]

Youth externalising n/a SMD 0.09 lower SMD −0.09 280 (3) Low Effect is not significantly different from zero (Analysis 6.1).
behaviours [0.56 lower to 0.38 [−0.56 to 0.38] Few estimates.
higher

Youth internalising n/a SMD 0.06 higher SMD 0.06 127 (2) Low Effect is not significantly different from zero (Analysis 6.4).
behaviours [0.84 lower to 0.96 [−0.84 to 0.96] Few estimates.
higher]

Family adaptability n/a SMD 0.04 lower SMD −0.04 164 (2) Low Effect is not significantly different from zero (Analysis 8.1).
[0.35 lower to 0.27 [−0.35 to 0.27] Few estimates.
higher]

Family cohesion n/a SMD 0.11 higher SMD 0.11 676 (3) Low Effect is not significantly different from zero (Analysis 8.2).
[0.04 lower to 0.27 [−0.04 to 0.27] Few estimates.
higher]

Abbreviations: CI, confidence interval; OR, odds ratio; RR, risk ratio; SMD, standardised mean difference.
a
Ratings based on: number of studies, study design, ROB, inconsistency, indirectness, and imprecision (see gradepro.org/). High: Further research is very unlikely to change confidence in the estimate of effect.
LITTELL

Moderate: Further research is likely to have an important impact on confidence in the estimate of effect and may change the estimate. Low: Further research is very likely to have an important impact on our
confidence in the estimate of effect and is likely to change the estimate. Very low: Any estimate of effect is very uncertain.
ET AL.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 55 of 192

larger or smaller effect sizes. Again, this could be due to insufficient outcomes that were fully reported by study authors) are an unbiased
statistical power to detect differences. sample of all the evidence obtained in MST trials.
Our analyses did not have enough statistical power to detect MST trials were conducted in WEIRD countries: those that are
effects of trends over time (changes in effect sizes over multiple Western, Educated, Industrialised, Rich and Democratic. Results may
endpoints) in CE analyses. Thus, we could not test the hypotheses not be applicable to other countries. Within their WEIRD settings,
that effect sizes diminished over time. there is little evidence about whether or how participants in MST
It has been suggested that studies of the efficacy of MST pro- trials differed from larger populations of youth with serious social,
duced larger effects than studies of its effectiveness. But, in practice, emotional, and behavioural problems. MST trials did not claim to
the distinction between these two types of trials is not clear (most have representative samples.
MST trials can be thought of as “hybrids”; Schoenwald 2003, p. 224). In all outcome domains, prediction intervals for main effects in-
Some observers classified early RCTs conducted by MST program clude a wide range of positive and negative values, indicating that
developers as efficacy trials, but these studies also had more serious predictions based on available evidence are uncertain. Future studies
methodological problems than later studies. We did not classify MST can expect MST to demonstrate a wide range of positive and nega-
trials in terms of efficacy or effectiveness. tive results on all outcomes.
It has been suggested that between‐study differences in effect
sizes may be accounted for by variations in fidelity to MST
(Henggeler 2004). In most studies, fidelity to MST is measured 6.3 | Quality of the evidence
with the TAM. However, as mentioned above, the TAM taps
constructs–such as engagement, treatment participation, therapeutic The quality of evidence from MST trials is remarkably uneven across
alliance, and client satisfaction–that are not unique to MST. (Sample studies. One trial (Fonagy 2018) has low risks of bias on 12 of our 13
TAM items are: “the sessions were lively and energetic”, “my family indicators. Two studies have high risks of bias on all of these in-
and the therapist worked together effectively”, and “the therapist dicators (Figure 4). Half of the trials have high risks of bias related to
recommended that family members do specific things to solve our baseline equivalence, selective reporting, and conflicts of interest
problems”.) The TAM has not been shown to discriminate between (Figure 3).
MST and other interventions. Although the TAM has demonstrated Some authors did not adequately consider alternative explanations
predictive validity in some studies, it is not clear whether that is due for differences between MST and control groups on outcome measures,
to fidelity to MST or to engagement, treatment participation, alliance, attributing effects to MST when other causes were plausible. For ex-
or other constructs. At least two studies found no relationship be- ample, between‐group differences on important background char-
tween TAM scores and primary outcomes (Butler 2011, p. 1231; acteristics, such as race and gender, could have accounted for different
Leschied 2002). Given concerns about the face validity of the TAM, outcomes observed in some treatment and control groups. Randomi-
we did not assess fidelity to MST as a potential moderator. sation does not always create equivalent groups, and differential attri-
FInally, although we cannot fully untangle effects of confounded tion can alter group composition. When the MST group included
moderators, our pairwise meta‐analysis and CE models provided proportionately fewer Blacks and fewer males than the control group
multiple opportunities to explore associations between effect sizes (as is the case in the Borduin 1995 study), these differences could
and three sets of moderators: location (USA or other), developer account for observed differences in outcomes. Race and gender are
involvement or investigator independence, and several risks of bias. related to criminal justice outcomes in the USA, where Blacks and males
Of these, location emerged as the most consistent moderator of ef- are arrested and incarcerated at higher rates than Whites and females
fects. Of course the USA/non‐USA location contrast represents (Puzzanchera 2020, Sawyer 2019).
several other confounds, including inter‐country differences in po- As mentioned above, attention is another factor that is often con-
licies and practices related to the treatment, placement, arrest, and founded with MST treatment. This was particularly salient in USA stu-
conviction of youth. dies, wherein MST cases received much more attention from clinicians
than cases in TAU control groups, and MST clinicians received more
training and supervision than clinicians who provide service to TAU
6.2 | Overall completeness and applicability of cases. In these studies, it is not clear whether different outcomes should
evidence be attributed to the different amounts of attention provided to clients,
the training and supervision of workers, or MST itself.
Unfortunately, complete reporting of the results of MST trials has not Selective reporting leads to over‐representation of positive and
been the norm. Most trials (83%) were missing data on some sub- statistically significant results and under‐representation of negative
groups, outcomes, or endpoints, and there was evidence of selective and null results in study reports. Again, we found evidence of se-
reporting in more than half (52%) of these trials. Thus, we are not lective reporting in more than half of the MST studies. This means
confident that the effect sizes available for meta‐analysis (i.e., that available evidence may not be a fair representation of all
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
56 of 192 | LITTELL ET AL.

evidence obtained from MST trials. Of course, selective reporting can problems with selective reporting (of successful cases, positive out-
introduce bias in meta‐analysis and we cannot be sure that our re- comes, and favourable endpoints) and nonpublication of reports with
sults are free of this bias. null and negative results. We were aware of and unable to obtain a
MST trials made use of a wide array of outcome measures. number of relevant research reports and data points.
Questions about the reliability and validity of some of these mea- We struggled with inconsistencies and lack of transparency in
sures were not fully addressed in MST trials. Some trials relied on the reporting of some trials. To counteract our inclination to
results of reliability or validity assessments that had been conducted downgrade ROB ratings because of poor quality reporting, we
in other studies, which may or may not have had similar samples. refined our ROB standards to include observable metrics and in-
Some MST investigators assessed the internal consistency of the dicators that could be applied fairly across all studies, and to
measurement instruments they used in their own sample; this can minimise the number of inferences needed to make judgments
enhance confidence in results, although it does not fully address about ROB. For example, when studies did not mention blinding of
questions about validity. Some MST trials provided multiple mea- assessors, we did not make inferences about whether this had or
sures of key constructs (which can also enhance confidence in re- had not been done, and rated the ROB as “unclear”. Inevitably,
sults) and some used the best available “standardised” measures of qualities of reporting likely affected our understandings of the
outcomes such as internalising and externalising behaviours (e.g., conduct of included studies.
CBCL, ABC) and family functioning (FACES). Data produced by these
instruments can provide compelling evidence of relevant outcomes. It
is not easy to assess the validity of outcomes derived from admin- 6.5 | Agreements and disagreements with other
istrative data, as the quality of these reporting systems varies, but studies or reviews
these data (e.g., on placements, arrests) are assumed to have face
validity. Compared with our previous review, we found more evidence regarding
Blinding of assessors was rare and our ROB ratings likely un- effects of MST and are able to show more clearly how these effects vary
derestimate this problem (if blinding was not mentioned by in- across studies, outcomes, and contexts. Recent advancements in re-
vestigators, we rated the ROB as unclear, not high). Outcome search synthesis methods helped us conduct more thorough assess-
assessments would have been strengthened by blinding assessors to ments of study risks of bias and treatment effects. With a more detailed
participants' group assignments. ROB tool, we found more compelling evidence of risks of bias. This leads
In sum, the quality of the evidence for MST is mixed. GRADE to better understanding of the evidence base. With a larger number of
ratings were low to moderate, with one exception: there was studies in the analysis and better statistical tools (CE models), there is
high‐quality evidence (but no evidence of effects) on placements more statistical power to detect effects and assess potential mod-
in non‐USA studies at one year after random assignment. Even erators. Yet, as in the 2005 version of this review, we found that effects
though random assignment was used in all of the included stu- of MST are inconsistent within and across studies.
dies, these procedures were not always transparent or foolproof, Our conclusions differ from those of many other previous re-
and substantial differences between groups were apparent in views, which suggested that the quality of the evidence for MST was
more than half of the trials. Deviations from random assignment superb and the effectiveness of MST was well‐established. Below, we
were not well documented in some studies. Lack of between‐ examine plausible explanations for discrepancies between our review
group comparability, differential attrition, and differences in at- and others.
tention could account for between‐group differences in outcomes Different review methods often produce different results. Most
in some studies, yet these factors were rarely mentioned by prior reviews of research on effects of MST relied on narrative
investigators. summaries of convenience samples of published studies (Lit-
At the high end of the study‐quality spectrum, there is one large tell 2008). These nonsystematic reviews used methods that were not
MST trial (Fonagy 2018) that can serve as a model of best practice in transparent and they omitted steps needed to reduce bias and error
the conduct of field trials on complex, community‐ and home‐based in the review process, by limiting searches to electronic databases,
psychosocial interventions: This trial was prospectively registered, it excluding unpublished reports, failing to establish reliability of deci-
provided full reports on all planned outcomes and all endpoints over sions and data extraction, and failing to consider risks of bias. Instead
five years, and has a complete data sharing plan. of using meta‐analysis, many reviews used “vote counting” or focused
selectively on results that favoured MST. The plain fact that most
MST trials have mixed results was missed in many narrative reviews
6.4 | Potential biases in the review process that mentioned only the statistically significant, positive effects from
primary studies. In previous work, we have shown that this problem
Our attempts to obtain missing data from study investigators were (confirmation bias) is more readily apparent in reviews than in
not all successful; hence, we were not able to overcome systemic reports on primary studies (Littell 2008).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 57 of 192

For example, from 1998 to 2015, Kazdin produced a series of attention of service providers and qualify for an intensive interven-
narrative reviews of research on treatments for children and youth, tion, such as MST, they are not likely to be functioning at their best.
including MST. He concluded that, “MST is superior in reducing de- Improvement over time is expected, even in the absence of treat-
linquency, drug use, and emotional and behavioural problems and ment, due to maturation and the tendency to revert to more typical
increasing school attendance and family functioning, in comparison to to levels of functioning. The central question is whether changes over
other procedures, including 'usual services' provided to such youths time are due to MST or to other factors. RCTs use control groups to
(e.g., probation, court‐ordered activities that are monitored such as separate effects of maturation, motivation, and other influences from
school attendance), individual counselling, and community‐based treatment effects—yet these basic features of research design were
eclectic treatment. Apart from the sheer number of controlled stu- ignored in some MST studies and reviews. For example, regarding the
dies, the strength of this literature stems from the breadth of ap- Henggeler 1992a study, Brown 1999 showed that the MST group
plication across age groups and clinical problems… Follow‐up had lower school attendance at baseline and made greater gains over
examinations have repeatedly supported the impact of [MST] treat- the next six months, catching up with control cases. It is not clear
ment” (Kazdin 2015, p. 150). We find that the empirical evidence whether there were significant between‐group differences at six
does not support these conclusions. Kazdin's 2015 review appeared months, because no test for between‐group differences was re-
to be based solely on published studies conducted in the USA by MST ported. Authors interpreted within‐group changes as evidence “MST
developers (no independent trials were cited). It included no critical was more effective than [TAU] at promoting school involvement”
appraisal of risks of bias in primary studies, so there was no mention (Brown 1999, p. 88). Of course, other explanations are possible (e.g.,
of problems such as lack of baseline equivalence, differential attri- statistical regression could account for these observations). In any
tion, selective reporting of outcomes, and other factors that could case, direct post‐treatment comparisons between MST and control
affect interpretation of results. Nor was there any systematic coding, cases are needed to draw conclusions about the relative effective-
analysis, or synthesis of results across studies. ness of MST, but authors did not report these comparisons.
The systematic exclusion of unpublished studies tends to in- Results of studies that found few benefits, no benefits, or even
troduce confirmation bias in reviews, because studies with null or harmful effects of MST were misrepresented in some reviews. For
negative findings are less likely to be published than those with po- example, MST Services 2020a characterised “treatment effects” from
sitive results (the “file drawer” problem; cf. Rothstein 2005). In MST the Fonagy 2018 trial as follows: “At 6 months: extensive improve-
reviews, the inclusion or exclusion of unpublished data may account ments in youth emotional and behavioural functioning as well as
for some of the differences in reviewers' conclusions. For example, parental mental health and family functioning. At 12 months: con-
Curtis 2004 found positive results in a “meta‐analysis” limited to tinued improvement in youth emotional functioning, caregiver mental
published studies conducted in the USA by MST developers, while health, and family satisfaction. At 18 months: some continued youth
our review of the same studies plus unpublished studies produced and caregiver improvements but no decreases in arrests or place-
largely null results (Littell 2005a). However, even when we limit ments”. Again, these statements appear to be based on within‐group
analyses to published USA developer studies, we do not find the comparisons for MST cases only, ignoring the central contrasts be-
pattern of “consistent positive effects” described in many reviews. As tween the MST and TAU control groups. Recall that Fonagy 2018
demonstrated in our forest plots, USA developer studies found null found little evidence of the superiority of MST and some evidence of
and negative effects on some outcomes. negative effects of MST compared with TAU.
Some published summaries of research on effects of MST ex- Calculation errors led some reviewers to derive implausibly
plicitly limited their focus to “statistically significant findings fa- large, positive effect sizes from studies that reported mostly null
vouring MST” (Rowland 2019, p. 196). A primary goal of some of this findings and some negative results. Misuse of meta‐analytic pro-
work is “to present evidence that MST works” (Swenson 2005, p. 88). cedures led some reviewers to claim evidence for large, positive
In service of this goal, important scientific standards are sometimes effects based on double‐counting of some studies, inappropriate
ignored. use of corrections for small sample bias, and failure to use ap-
Some proponents of MST ignored the central contrasts between propriate weights (e.g., inverse variance weights) in meta‐analysis
treatment and comparison conditions in randomised controlled trials (for discussion of these problems in the Curtis 2004 review, see
(Littell 2006), even though these contrasts were the main results of Littell 2008).
those trials, and are needed to support the causal inferences that In one of the best reviews conducted to date, van der
trials aimed to make (Shadish 2002). For example, a review by MST Stouwe 2014 included both published and unpublished, randomised
Services 2020a focuses on within‐group changes as evidence of “MST and nonrandomised studies of MST. Given our concerns about the
effects” even though the same changes (or more favourable changes) integrity of many MST trials, we think their inclusion of non‐RCTs is
were observed in parallel control or comparison groups. This reflects defensible; however, van der Stouwe et al. did not report detailed
a common misunderstanding or misrepresentation of empirical evi- assessments of risks of bias. Their review did not include three stu-
dence in this literature: When youth and families come to the dies that were included in our review (Fonagy 2018, Glisson 2010,
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
58 of 192 | LITTELL ET AL.

Henggeler 2006, total n = 1434). They used hierarchical models, consistent within or across studies, and have not been reliably re-
which do not fully account for correlated effects within studies plicated outside of the USA. Youth are removed from their homes
(Pustejovsky & Tipton 2020). Van der Stouwe and colleagues found and arrested at higher rates in the USA, yet they receive fewer
that positive effects of MST on delinquency, substance use, and services than youth with similar problems in other high‐income
placement dissipated once publication bias was taken into account; countries. TAU is more robust in other high‐income countries and
positive effects on psychopathology and family outcomes withstood cases in the TAU groups in Sweden and Norway made greater im-
tests for publication bias. As in our review, Van der Stouwe and provements over time compared with TAU cases in the USA MST
colleagues found that MST had larger effects in the USA than in other trials (Sundell 2014). One obvious implication for practice is the need
countries. They found that studies led by developers reported larger to improve usual services for youth with social, emotional, and be-
effects than independent trials, but this difference did not hold up in havioural problems in the USA.
multivariate analyses that included the USA/non‐USA contrast (in Because USA studies compare MST to the scant services routi-
our dataset, these two moderators are confounded and multivariate nely available to these youth, positive results of MST in the USA
analysis is unreliable). Differences between the van der Stouwe re- might be attributed to the provision of additional attention and care,
view and ours seem to be driven by inclusion of different studies and rather than specific effects MST. Multisite trials in Canada, Sweden,
use of different analytic methods. and the UK show that MST is not more effective than TAU for
Few previous reviews conducted careful appraisals of study adolescents with moderate‐to‐severe antisocial behaviour in those
methods or potential risks of bias. We are not aware of any previous countries (Leschied 2002, Sundell 2006, Fonagy 2018).
reviews that examined baseline equivalence, differential attrition, Evidence of the effects of MST is only one element in the cal-
performance bias, or selective reporting in MST trials. It appears that culus that policy makers and practitioners must make about whether
many reviewers did not consider multiple reports from trials, thus to adopt or continue to use MST. When there is no compelling evi-
they may have been unaware of problems with selective reporting, dence of the superiority or inferiority of different approaches, the
attrition, and “post hoc sample refinement” (Gorman 2003). Indeed, choice between them must be based on other considerations. For
many MST reviews reported sample sizes and outcome sets that are example, consider the decisions made in Canada and Sweden after
demonstrably incomplete (e.g., Aos 2001, Brosnan 2000, Curtis 2004, large, multisite trials found virtually no differences in outcomes be-
Farrington 2003, van der Stouwe 2014, Woolfenden 2004). Some tween MST and TAU in each country: In Ontario, where the Canadian
authors noted that different MST reviews reached different conclu- trial (Leschied 2002) was conducted, the decision was made to dis-
sions, without considering potential reasons for these discrepancies. continue use of MST because it was more costly and no more ef-
The limitations of narrative reviews of multiple studies have fective than usual services. In Sweden, use of the MST program was
been considered at length for several decades, as has the importance retained, despite findings of no evidence of effects (Sundell 2006), on
of transparency in systematic reviews and meta‐analysis. The pur- the grounds that MST provides useful structures for service delivery
pose of a systematic review (as that term is used by Cochrane and and supervision, and this intervention is compatible with client and
the Campbell Collaboration) is to minimise the biases that are com- staff preferences. The Swedish decision reflects interest in investing
mon in narrative reviews, while conducting research synthesis in a in service delivery structures, even if they do not result in direct
manner that is clear and open to critical assessment. Reviewers who improvements in outcomes. Both decisions are based on evidence. It
take a close look at evaluations of complex interventions often un- is important for decision makers to weigh evidence on a variety of
cover the kinds of methodological problems we describe in MST trials topics, including policy goals, values and preferences, costs and re-
(e.g., Gorman 2003, Gorman 2017). As Gandhi and her colleagues sources, as well as evidence of effectiveness.
wrote, regarding the evidence for school‐based drug abuse preven- Decisions based on the assumption that the effectiveness of MST
tion programmes, “the devil is in the details” (Gandhi 2006). is well established are ill‐founded. In fact, our data show that pre-
dictions based on available evidence are uncertain and future studies
are likely to find both positive and negative effects of MST.
7 | A UT H O RS ' CO N CL US I O NS In April 2018, the UK NICE announced plans to update its clinical
guidance for treatment of antisocial behaviour and conduct disorders
7.1 | Implications for practice in children and young people, focusing “on the role of multisystemic
therapy (MST), as part of a multimodal intervention for the treatment
Decisions can and should be informed by rigorous systematic reviews of antisocial behaviour and conduct disorder in children and young
of relevant empirical evidence. Narrative, haphazard reviews are not people” (NICE 2018, p. 3). The NICE report observed that new evi-
up to the task of assessing a complex body of evidence, and non- dence from the Fonagy 2018 trial “shows that MST could be detri-
systematic reviews can lead to the wrong conclusions. mental for some populations (e.g., for younger people with an early
Our systematic review and meta‐analysis of the best available onset of conduct disorder)” and “MST does not provide any long‐term
evidence suggests that benefits of MST are not well established, not benefits in terms of clinical and cost‐effectiveness compared to usual
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 59 of 192

care in children and young people with moderate‐to‐severe antisocial demonstrate successful implementation of large RCTs involving a
behaviour” (NICE 2018, p. 8). complex, home‐ and community‐based intervention, and show that
Concerns have been expressed about the cultural sensitivity of these trials can produce valuable information. At the same time, our
MST and whether this approach was “oversold” to clinicians review points to improvements that can and should be made in the
(Rosenblatt 2001). Other concerns involve perceived limitations of conduct and reporting of RCTs. As noted above, most (96%) of the
the short time frame of MST intervention and its single‐worker trials in this review have high risks of bias on one or more indicators.
model (Fonagy 2017, p. 7). Decision makers must consider whether Future RCTs should use more advanced methods of sequence gen-
an intervention fits the local culture and preferences of local au- eration and allocation concealment, which create centralised and perma-
thorities, clinicians, and consumers. nent electronic records of case assignments to treatment and
Another issue is that MST services are costly. Costs of direct ser- comparison groups. Computer‐generated assignments managed by re-
vices, supervision, quality assurance, administration, and court activity searchers at a remote location are more credible and foolproof than on‐
are about $6416 per case in the USA (Barnoski 2009) and £7312 per site coin tosses or opening of envelopes containing assignments.
case in the UK (NICE 2013). The cost of MST‐PSB per case is £10,000 To minimise bias in data collection, blinded assessments should be
to £12,000 in the UK (Fonagy 2017). Early projections in the USA and used whenever possible. Outcome data should be collected by re-
the UK indicated that MST could reduce costs related to public services, search staff who are unaware of participants' group assignments.
education, and crime (Aos 2006, Barnoski 2009, NICE 2013). However, This is preferable to data collection by program staff or interviewers
if MST does not reduce incarceration, hospitalisation, recidivism, and who are aware of group assignments.
problem behaviours, it will not be cost‐effective compared with less Assessment of baseline equivalence should focus on the magni-
expensive alternatives. In Delaware, the average costs of MST services tude of differences between groups (e.g., as assessed with Cohen's d),
were $11,513 USD per case, compared with average costs of $25,850 and not on tests of statistical significance. This is particularly im-
USD for secure placements; however, almost one‐third of MST cases portant in small studies, where clinically meaningful differences may
required secure placements following MST services; when those costs not be statistically significant.
are included, the average cost was $17,388 per MST case (Miller 1998). RCTs should be designed to support intent‐to‐treat analysis on at
The Fonagy 2018 trial showed that MST increased overall service costs least some outcomes. In many countries, archival or administrative
by an average of £1623 per case over a period of 18 months (95% CI − data can be used to support full ITT analysis, including data on par-
£4,439 to £7,684), so there is little chance that MST is a cost‐effective ticipants who do not complete treatment or do not participate in
option in the UK (NICE 2018, p. 6). follow‐up assessments.
MST does have several distinct advantages over other services Researchers should carefully document the flow of cases through
for troubled youth and families. It is a comprehensive intervention, a trial, documenting reasons for exclusion or attrition whenever
based on current knowledge and theory about the problems and possible. The CONSORT statement (www.consort‐statement.org)
prospects of youth and families. MST has been documented and provides a useful template and guidance for this purpose. Analyses of
studied more than many services for youth and families. It has well‐ potential for biases due to attrition should be conducted to determine
developed protocols for training and supervision. There is no evi- whether missing data affects the comparability of groups.
dence that any known interventions are consistently more effective There is ample room for improvement in the transparency of
than MST across problems, populations, and settings. However, there reporting on research methods and results. The omission of basic
are still gaps in knowledge about the widespread implementation of facts about studies—dates when participants were enroled; when,
MST, its long‐term effects, and mechanisms of change. where, how, and by whom data were collected; and sample, com-
Finally, it is important to recognise that there may be real limits munity and service characteristics—hampers interpretation and
to the kinds of outcomes that can be achieved with short‐term, usefulness of study reports. Guidelines for reporting on clinical trials
individual‐ and family‐focused interventions for serious, persistent, and other types of studies are available (see www.equator‐network.
and systemic problems—no matter how well‐designed and well‐ org). Full description of interventions (both treatment and control
intentioned these interventions may be. More robust, longer‐lasting conditions) is important for interpretation and replication; and
interventions and/or more consistent economic, educational, medical, guidelines for describing treatments are available (e.g., see TIDieR,
and therapeutic supports for youth and families may be needed to Hoffmann 2014). When article length limitations are an issue, re-
achieve lasting improvements in youth and family functioning. searchers can provide full study details in online appendices.
Researchers must understand that under‐reporting and selective re-
porting of results are forms of scientific misconduct (Chalmers 1990, ori.
7.2 | Implications for research hhs.gov/selective‐reporting‐results). These practices distort the evi-
dence base, deprive decision makers and consumers of potentially
7.2.1 | Primary studies useful information, and waste valuable research resources (time, money,
and information). When all results cannot be included in journal articles,
The use of randomised controlled trials to test intervention effects is researchers should report additional data as supplemental appendices
one of the strengths of the MST research base. MST studies on the journal website or on other websites. Full reporting includes data
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
60 of 192 | LITTELL ET AL.

needed to calculate effect sizes (e.g., proportions, means, SDs, and valid reporting of all measures and all endpoints, and a full data sharing
ns for each treatment and comparison group) for every outcome mea- agreement (Fonagy 2018). These investments yield different returns.
sure, at every endpoint. Further, researchers should strive for accuracy In summary, to enhance the transparency and credibility of primary
when summarising results, avoiding spin (over‐interpretation or distor- studies and to avoid wasting research resources, investigators should:
tion of results; Chiu 2017, Boutron 2018).
Prospective registration of study protocols is an important step in • Complete detailed protocols at the inception of a study, including
improving the transparency and reporting of clinical trials. Important explicit plans for participant recruitment, eligibility criteria, data
advances in this area have been made in relation to medical research, collection measures and methods, and analysis (follow established
but the same principles apply in the social sciences: Early registration guidelines for study protocols).
of study plans in a public registry ensures that the public is aware of • Register protocols in a public registry before the first participant is
ongoing trials; this facilitates recruitment and collaboration, mini- enroled.
mises unnecessary duplication of effort, and prevents (or allows for • Document interventions in sufficient detail to support analysis and
the detection of) selective reporting of research outcomes (WHO replication.
trial registration 2021). The World Medical Association's Declaration • Use secure methods of sequence generation and allocation
of Helsinki states, “Every clinical trial must be registered in a publicly concealment.
accessible database before recruitment of the first subject” • Take adequate steps to reduce bias, including blinding of assessors.
(WMA 2013). Thus, as a condition of publication, top medical journals • Assess equivalence of groups at baseline, focusing on the magni-
require registration in a public trials registry before the first parti- tude of differences between groups, not statistical significance.
cipant is enroled (ICMJE trial registration). Nevertheless, few con- • Use ITT analysis, including all participants in the group to which
trolled trials are prospectively registered (Chan 2017). Some they were originally assigned. Consider use of data from archival
protocols appear in public registries (such as clinicaltrials.gov or the or administrative sources, to minimise missing data.
International Clinical Trials Registry Platform, www.who.int/clinical‐ • Carefully document the flow of cases through the study and rea-
trials‐registry‐platform) after results have been reported, which de- sons for attrition.
feats the purposes of trial registration. • Assess the extent to which missing data affects the comparability
Researchers should develop data sharing plans at the inception of of groups.
a study, and include plans for data sharing in prospectively registered • Follow established guidelines for reporting on research methods.
protocols. Investigators have an ethical obligation to share anon- • Fully disclose potential conflicts of interest.
ymised participant‐level data data, so that others can “verify the • Report results in full, including group data sufficient to support
substantive claims through reanalysis” (www.apa.org/ethics/code). effect size calculations for all planned outcomes at all endpoints.
Data sharing also supports individual participant data (IPD) meta‐ • Develop and follow data sharing plans that protect participants'
analysis and new investigations on questions not addressed in the identities, preserve investigators' right to publish, fulfil re-
original study. Data sharing can be done at some point (e.g., 18 searchers' ethical obligation to allow independent verification or
months) after the study ends, to protect investigators' right to pub- results, and support responsible use of data in other studies (e.g.,
lish. Guidelines for data sharing and data anonymisation are available IPD meta‐analysis and new investigations).
(Institute of Medicine 2015, Keerie 2018), as are sample data sharing
plans (ICMJE data sharing).
Full disclosure of potential conflicts of interest (COI) is an im- 7.2.2 | Reviews
portant ethical responsibility. COI are not prima facie evidence of bias,
but they do bear watching because financial and professional conflicts To produce comprehensive, accurate, and useful reviews of research,
can affect the design, conduct, and reporting of research, leading to reviewers must understand the content area, research methodology,
over‐interpretation or misinterpretation of results (Lundh 2020). and the dynamics of dissemination. The dissemination of research re-
Research waste is an ongoing concern in this area (Chalmers 2009, sults is a biased process, because statistically significant and positive
Ioannidis 2014). Investments in high‐quality primary studies are badly results tend to be over‐represented in research reports and publications
needed, but these investments are not always well spent. Research re- (Song 2009, Song 2010, Dwan 2013). Research methods and proce-
sources are wasted when researchers conduct poorly designed studies, dures are not always clearly described. Hence, reviewers have to work
fail to take adequate steps to minimise biases, fail to describe inter- hard to counteract dissemination biases and to fully understand es-
ventions in sufficient detail, and fail to fully report all outcomes at all sential qualities of primary studies. Systematic review methods provide
endpoints for all cases (Chalmers 2009). For example, the USA govern- useful procedures for minimising bias and error in reviews, but these
ment spent approximately $4 million for a 5‐year study of 161 partici- guidelines are not always followed. Nonsystematic reviews tend to re-
pants (Henggeler 2006), yet there are no public reports on this study peat prominent results and conclusions of others; thus, the value of
after the first year. In comparison, the UK NIHR spent £1 million for a their contributions to the scientific evidence base is unclear.
2–5 year follow‐up of a prospectively registered trial with: 684 partici- Reviewers should complete a detailed protocol for a review, and
pants, high quality standards (including a research advisory board), full deposit it in a public registry before they begin the review process.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 61 of 192

Prospective registration of protocols for reviews promotes public • Follow consensus standards for the conduct and reporting of sys-
awareness and collaboration, reduces unnecessary duplication of ef- tematic reviews and meta‐analysis.
fort, increases transparency, and prevents (or supports detection of) • Search for and include unpublished studies that meet the review's
selective reporting of results (Stewart 2012). Guidelines for the inclusion criteria.
protocol development are available (e.g., Moher 2015), as are inter- • Search for and include all published and unpublished reports on
national registries devoted to protocols for systematic reviews (e.g., included studies.
PROSPERO, www.crd.york.ac.uk/prospero). Unfortunately, many • Contact investigators to request missing data on studies, and
systematic reviews do not have protocols, including reviews published document these contacts.
in high impact journals (Tsujimoto 2017). • Conduct and report detailed assessments of risks of bias for each
To improve the rigour of reviews and transparency of reporting, study.
reviewers should follow consensus standards for the conduct and • Focus on effect sizes, not statistical significance.
reporting of systematic reviews and meta‐analysis (e.g., PRISMA, • Use meta‐analysis whenever possible, and avoid vote‐counting.
www.prisma‐statement.org; Cochrane MECIR, https://methods. • Use methods that account for dependent effect sizes within stu-
cochrane.org/methodological‐expectations‐cochrane‐intervention‐ dies, when possible.
reviews#accessMECIR; Campbell MECCIR, https://onlinelibrary. • Fully disclose potential conflicts of interest.
wiley.com/page/journal/18911803/homepage/author‐guidelines). • Update reviews periodically, as new studies and data become
Reviewers should consider all available reports on included available.
studies, given inconsistencies across reports on important study
characteristics, such as sample sizes and outcome measures. Re- Finally, reviewers need be more concerned about research waste.
views that focus only on final reports are likely to miss problems The publication of more than 400 narrative reviews of research on
with attrition and selective reporting. Given concerns about pub- MST is a waste of valuable research resources (as noted earlier, the
lication bias, reviews should never be limited to published studies. number of published reviews of research on MST is 4 to 5 times
Concerns about study qualities should be taken into greater than the number of primary outcome studies and 15 times
account when setting eligibility criteria (at the protocol stage) and greater than the number of RCTs). Systematic reviews and meta‐
unpublished studies that meet the eligibility criteria should always analysis are very labour intensive, because it takes time and effort to
be included. “study the studies” well. However, if the time and effort used to
Reviewers should contact investigators to request missing data on produce a few dozen narrative reviews could be reallocated to pro-
studies, and document these contacts. We obtained useful, un- duce one good systematic review, this would be a far better invest-
published information from some investigators and this allowed us to ment of resources, with greater benefits for science and society.
include more data in our analyses.
It is important to conduct detailed assessment of risks of bias (ROB) in A C K N O W L E D GE M E N TS
included studies. Available ROB tools can be modified to fit the needs of This updated review was funded by the National Institute for
a review. Inclusion of non‐RCTs may be desirable, but reviewers should Health Research (NIHR) Incentive Award Scheme 2019 Reference
carefully assess risks of bias that may arise in different designs, and 130851. The views expressed are those of the authors and not
examine potential moderating effects of research design if studies with necessarily those of the NIHR or the Department of Health and
different designs are included. Assessments of baseline equivalence, Social Care. Thanks to Brandy Maynard and Audrey Portes for their
confounding factors, and potential affects of attrition should focus on indefatigable support and editorial assistance on behalf of the
effect sizes (e.g., d) not statistical significance. Campbell Collaboration's Social Welfare Coordinating Group.
Reviewers should avoid vote‐counting and use meta‐analysis Melania Popa Mabe (MPM) and Burnee' Forsythe (BF) contributed
whenever possible. Larger reviews can use advanced methods that to the coding and analysis of data for the initial version of this
account for multiple, dependent effect sizes within studies. review. Margo Campbell (MC), Barbara Toews (BT), and Jessica
Full disclosure of reviewers' potential conflicts of interest is an im- Schaffner Wilen (JSW) contributed to coding and analysis of data
portant ethical responsibility. As in primary studies, financial and non- for an unpublished update in 2010. Sammantha Dunnum (SD)
financial conflicts (including author allegiance to a particular treatment) contributed to coding in 2019. We are very grateful for their help
can affect the conduct and reporting of reviews (Lieb 2016). in organising the data. Geraldine MacDonald, Jane Dennis, and
Systematic reviews need to be updated periodically, as new stu- others at the Cochrane Developmental Psychosocial and Learning
dies and additional data become available. Problems Group steered the development of the 2005 version of
In sum, to produce a comprehensive, accurate, and useful review this review. Thanks to Jo Abbott, Margaret Anderson, Eileen Brunt,
of a body of research, reviewers should: and Julie Millener for assistance in developing the initial search
strategy and for executing searches in 2003 and 2010; and to Julie
• Use systematic reviews and meta‐analysis to reduce bias and error. Millener for her work on the reference section. We are grateful for
• Develop and register a detailed protocol for a review and deposit it thoughtful suggestions from anonymous peer reviewers and
in a public registry in advance of data collection. methodological experts.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
62 of 192 | LITTELL ET AL.

CO NTRIB UTIO NS OF A UTHO RS For the present review, these categories were further defined and
Julia Littell updated plans for the review, updated screening and data additional categories were added, based on the Cochrane ROB tool
extraction forms, conducted pairwise meta‐analysis, and wrote the (version 1, Higgins 2011) and What Works Clearinghouse standards
text of the review. Terri Pigott developed and conducted meta‐ for assessing baseline equivalence and attrition (WWC attrition;
analysis of correlated effects, and contributed to writing of the text. WWC baseline). Instead of ranking studies “in terms of their ability to
Karianne Nilsen updated the search strategy and conducted electro- support ITT analysis and use of standardised or objective outcome
nic searches. Julia Littell, Stacy Green, and Olga Montgomery measures” (Littell 2004), we rated each study on 11 risk‐of‐bias
screened studies, participated in inclusion/exclusion decisions, ex- variables and we documented reasons for each of these ratings.
tracted and coded data, and participated in the analysis of data. All When multiple reports on a single outcome were available (e.g.,
authors have access to the report and raw data. parent and youth reports on family cohesion), our protocol indicated that
we would average results across sources and pool their standard errors.
DIF FE REN C E S B E T W E E N P R O T O C OL A N D R E V I EW With the advent of newer statistical methods, we are able to include
Our protocol (Littell 2004) guided development of the first version of multiple dependent measures in the same meta‐analysis, using correlated
this review, published in 2005 (Littell 2005a, Littell 2005b). Changes effects (CE) models (described below). Since reports from different data
in this update are due to (1) advances in the science and practice of sources do not always agree, we selected the most direct measure for use
systematic reviews and meta‐analysis, and (2) efforts to answer in pair‐wise meta‐analyses (i.e., youth reports on youth outcomes, parent
questions raised by our earlier review. reports on parent and family outcomes). Where possible, we assessed
As described in the text and in Appendix A, we updated the search potential moderating effects of different data sources.
strategies to reflect changes in databases and interfaces, generate Our protocol did not anticipate studies' use of various measures
more sensitive and specific searches, and add new studies and addi- for imputing missing data. Some of these approaches are very robust,
tional data to our previous review. others are not. We added methods for handling imputed data in pri-
Given previous findings that results of MST are not consistent across mary studies, and for assessing effects of missing data.
trials, we added two new objectives: (1) assess the consistency (hetero- Our protocol indicated that we would examine both fixed and
geneity) of results across studies and (2) assess potential moderators of random effects models. More recently, experts have argued that the
effects. We focused on moderators identified in our protocol and previous choice between these model should be made a priori, based on con-
review: investigators' independence, comparison conditions, and metho- ceptual considerations (Borenstein 2010). Given the differences be-
dological quality (or risks of bias). A central contrast emerged in relation to tween MST trials (in terms of their methods, sample characteristics,
these moderators: studies conducted in the USA differed from those comparison conditions, etc.) we did not expect all studies to be esti-
conducted in other countries in multiple ways. MST developers only mating a common effect size. Thus, whenever possible we used ran-
conducted studies in the USA, where comparison conditions were rela- dom effects models, which provide a better fit for distributions of
tively weak, and studies were of lower methodological quality than studies effect sizes that are affected by real‐world differences between
conducted outside of the USA. Thus, we used study location (USA or other studies.
country) as a potential moderator of treatment effects. In the protocol, we articulated plans for sensitivity analysis to
Our 2004 protocol did not specify primary and secondary out- assess potential effects of deviations from ITT analysis and issues
comes. Before updating this review, we identified primary and sec- related to the blinding of assessors. However, there is little variation
ondary outcomes and selected seven primary outcomes for a in blinding in the set of studies under review. We assessed potential
Summary of Findings Table. moderating affects of variations in ITT analysis, overall attrition, dif-
Our protocol indicated that, when studies provided multiple ferential attrition, and four other types of ROB. We provide graphic
measures of the same construct at different points in time, we would displays of all risks of bias (including those related to blinding) in
use the endpoint closest to one year post random assignment. We are forest plots. We conducted analyses to assess the sensitivity of CE
now able to assess effects at multiple endpoints. We divided endpoints models to variations in assumptions about underlying correlations
into several discrete intervals, and created explicit rules for handling between effect sizes within studies.
studies with multiple measures of the same effect within one of these We added a statement in the methods section on unit of analysis
intervals. We did not use data collected during or immediately after issues, describing our approach to working with studies with multiple
treatment (4–8 months after intake) in pairwise meta‐analysis (be- arms and cluster‐randomised trials.
cause many cases were still receiving services at this time), but did We used CE models with small sample corrections to produce
include these data in CE models. We assessed the timing of outcome robust variance estimates (RVE) of effects across multiple endpoints.
measurement as a potential moderator of effects. This allowed us to analyse multiple, dependent effect sizes and assess
We adopted newer and more explicit procedures for assessing potential moderators of treatment effects. We also conducted pair‐
risks of bias (ROB) in included studies. Our protocol incorporated wise meta‐analysis, to maintain consistency with the original review
ratings for allocation concealment (from the Cochrane Handbook and provide clear referents for understanding potential impacts of
version 4.2.1) and indicated that included studies would “also be as- MST at specific endpoints. The use of these two analytic models in
sessed on: adequate implementation of random assignment, stan- tandem provides a more robust assessment of effects than either
dardisation and blinding of assessments, attrition, and ITT analysis”. approach would have alone.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 63 of 192

Published notes

CHARACTERISTICS OF STUDIES
Characteristics of included studies
Asscher 2013

Methods From 2006 to 2010, 256 chronic juvenile offenders were randomly assigned to MST (n = 147) versus treatment as usual (TAU,
n = 109) in Utrecht and Amsterdam (The Netherlands). Youth were referred from multiple community sources to the Child
Protection Council or the Bureau of Youth Care.
Research assistants collected data from youth and caregivers before treatment began, immediately after treatment (approximately 6
months), and 6 months after the end of treatment (approximately 12 months after intake).
Analysis of attrition patterns showed that data were missing completely at random (MCAR); authors conducted multiple imputation
using the expectation‐maximisation algorithm.

Participants Participants showed serious, violent, and chronic antisocial behaviour. Most (51%) were court‐ordered to treatment, 39% were
referred by primary health care providers or social workers, and 11% were self‐referred.
Their average age was 16.0 years (range 12 to 18); 73.4% were boys; 55% were Dutch, 45% were ethnic minorities (15% Moroccan,
14% Surinamese). Half lived in single‐parent homes, 56% lived below minimum income standards and 45% of the families
indicated that they experienced financial strains.
Most (71%) of the youth had been arrested at some time in the past and 64% reported contact with the police in the past year.
Exclusion criteria included: sexual offending, autism, psychosis, imminent risk of suicide, and engagement in ongoing treatment
elsewhere.

Interventions MST services were provided over average of 5.7 months (SD 1.9) by six teams with 30 therapists from three agencies. Most (86%) of
the therapists had Master's degrees.
Youth in the TAU group were referred to an alternative treatment: 21% received individual treatment (counselling or supervision by
a probation officer or case manager), 53% received family‐based interventions (family therapy, parent counselling, parent groups,
or home‐based social services), 7% received a combination of care (e.g., individual treatment and family counselling), 4% were
placed in a juvenile detention facility, and 14% received no treatment (due to moves or nonattendance).
A study of a subsample of 116 youth (including 91 cases from the original trial plus 25 new cases; J. Asscher, personal
communication, 14 September 2020) reported that most of the young people in the TAU group received Functional Family
Therapy (Vermeulen et al., 2017).

Outcomes Primary outcomes were: recidivism (arrests), antisocial behaviour (aggression, delinquency, YSR, CBCL, SRD, DBD), internalizing
problems (CBCL, YSR), and substance use. Secondary outcomes were: adolescent competence and self esteem (CBCK, CATQ),
school attendance and grades, parent and family functioning (subscales and items adapted from the CII, PACS, PSI, PDI, PPQ,
IPPA, NPQ, NRI, RDT, PCSYR), and peer relations (IPPA, FFS, BPQ). Data were obtained by research assistants, using
questionnaires, interviews, and observations of participants' behaviour. Arrest data were obtained from police records.

Country The Netherlands

Funding The Netherlands Organisation for Health Research and Development (ZonMw), Netherlands Organisation for Scientific Research
(NWO), and the Ministry of Health, Welfare and Sport.

Notes Registered in 2008 in the Netherlands Trial Register (NTR1390, now Trial NL 1332). A more detailed proposal was submitted in
2007 to ZonMW. Some primary outcomes (internalizing problems, substance use) were not reported, some outcomes were
reported at post‐treatment but not follow‐up (see Risk of Bias table, Selective reporting).
Based on analysis of data on 83 cases, costs of MST were 13,430€ per adolescent and costs of TAU were 15,201€ per adolescent
(Vermeulen et al., 2017).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk Research staff used a computer‐generated randomisation sequence (2010, p. 576) that was
generation executed separately for each site (2013, p. 172).
(selection bias)

Allocation concealment Low risk Referrals were received by the MST service provider (De Waag), who then contacted the principal
(selection bias) research to request random assignment. The researcher performed computer‐generated
random assignment off site and relayed results to the service provider by telephone, usually
within 5 minutes (Asscher et al., 2007, p. 125).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
64 of 192 | LITTELL ET AL.

Baseline equivalence High risk The MST group included fewer males (71% vs. 77%, d = 0.18) and were slightly younger on average
than youth in the control group (mean ages of 15.9 vs. 16.2 years, d = 0.23). The MST group
included higher proportions of ethnic minorities (48% vs. 42%, d = 0.13), single parent families
(54% vs. 41%, d = 0.29), and families with financial problems (47% vs. 41%, d = 0.13) compared
with controls (Manders et al., 2013a, p. 112).

Performance bias Unclear risk No information on amounts of service provided to MST and TAU groups.
(confounding)

Detection bias (blinding) Unclear risk No discussion of blinding of assessors of administrative data.
Administrative data

Detection bias (blinding) Unclear risk "The majority of the research assistants who visited the families at home were not informed of the
Participant reports family's randomly assigned condition" (Asscher et al., 2013, p. 172).

Attrition bias Unclear risk Low risk at 6 months, with 21% attrition and 6% differential attrition. High risk of bias at 2 years,
Administrative data with 25% attrition and 14% differential attrition.

Attrition bias Unclear risk Attrition at 6 months was 13%, differential attrition was 3%. Analyses showed that data were
Participant reports missing completely at random; multiple imputation procedures were used to estimate missing
data for participant reports and observational measures. High risk of bias in the cost‐
effectiveness subsample, with 55% missing data overall, and 25% differential attrition.

Intention to treat analysis Low risk No systematic exclusions of drop‐outs or refusers.

Standardized observation Unclear risk Data were collected before treatment began, after treatment ended (mean of 5.7 months, SD 1.9
periods months after the first assessment), and 6 months after treatment ended (about 12 months
after referral).

Validated outcome measures Unclear risk Standardised instruments, selected subscales, selected items, and new composite scales, some
with α's < .7 (αs range from .61 to .94; Decovic et al., 2012, p. 577).

Selective reporting High risk A protocol for the study was registered in the Netherlands Trial Register (NTR1390) in 2008 (2
years after enrolment began). A more detailed plan for outcome measurement was included in
a 2007 proposal to ZonMw. Most outcomes were reported for the endpoint immediately after
treatment; fewer outcomes were reported at the 6 month follow‐up. We found no reports
some primary and secondary outcomes (listed in Decovic et al., 2007, p. 20) including:
internalizing problems (depression, anxiety, withdrawal, somatic complaints measured by the
YSR), school functioning (attendance, grades), and substance use (alcohol, hard and soft drug
use). Outcomes reported at post‐treatment but not follow‐up included parenting (competence,
positive discipline, inept discipline), youth self‐esteem, youth cognitions (failure, hostility),
family functioning (quality of relationships), and peer relations (involvement with deviant
peers, prosocial peers).

Conflicts of interest High risk No conflict of interest statement. Sander van Arum (co‐author of articles published in 2007 and
2014) was head of the MST programme at De Waag, an agency that provided MST services to
study participants.

Borduin 1990

Methods Simple random assignment to MST or individual therapy (IT) from 1983‐1985. Variable observation periods, ranging from 21 to 49
months post assignment (mean = 37 months).

Participants 16 male adolescents previously arrested for sexual offences. Mean age of 14 years; 62.5% White and 37.5% Black.

Interventions MST provided by doctoral students in clinical psychology; average of 37 hours of service (ranging from 21 to 49 hours).
Individual therapy (blend of psychodynamic, humanistic, and behavioural approaches) provided by Master's level professionals at
local mental health agencies (average of 45 hours of treatment per case).

Outcomes Re‐arrests for sexual and nonsexual offences, based on court and police records.

Country USA

Funding Not disclosed

Notes
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 65 of 192

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk Randomisation is mentioned, but methods are not described (1990, p. 108).
generation
(selection bias)

Allocation concealment Unclear risk Randomisation is mentioned, but methods are not described (1990, p. 108).
(selection bias)

Baseline equivalence Unclear risk No information provided

Performance bias High risk Between‐group differences in therapist training, treatment settings, and supervision. MST was
(confounding) provided by doctoral students in clinical psychology, IT was provided by Master's level
therapists in community agencies. Supervision of MST therapists was provided by the study's
first author and included discussion of videotaped family sessions; no information was
provided on supervision of IT therapists. MST group received an average of 37 hours of service
(range 21 to 49 hours; p. 108), IT group recieved an average of approximately 45 hours of
therapy (no range provided; p. 110).

Detection bias (blinding) Unclear risk No discussion of blinding. Arrest records were searched following referral to treatment.
Administrative data

Detection bias (blinding) Not applicable


Participant reports

Attrition bias Low risk No attrition. All 16 subjects were included in analysis.
Administrative data

Attrition bias Not applicable


Participant reports

Intention to treat analysis Low risk All 16 subjects were included in the analysis in the group to which they were initially assigned.

Standardized observation High risk The length of follow‐up ranged from 21 to 49 months (1990, p. 110); there were no controls for
periods variable lengths of observation.

Validated outcome measures Low risk Arrest records were obtained from juvenile court, adult court, and state police.

Selective reporting Unclear risk There is no public protocol for this study.

Conflicts of interest High risk Dr. Borduin and Dr. Henggeler were board members and shareholders of MST Services, Inc. Dr.
Borduin provided clinical supervision for the MST cases in this study (1990 p. 108).

Borduin 1995

Methods Random assignment to MST or individual therapy (and possibly to therapists within conditions; Johnides et al., 2017) from 1983‐
1986. Post‐treatment assessments of instrumental outcomes for treatment completers. Assessment of criminal outcomes from
court and police records at 4, 13.7 and 21.9 years after treatment ended.

Participants Conflicting reports on number of cases randomly assigned to MST or individual counseling/therapy (IC or IT; note that random
assignment ended in 1986, according to Schaeffer & Borduin, 2005, p. 446):
1) "A total of 210 families of juvenile offenders agreed to participate in the assessment and treatment components of the study.
Following the initial assessment session, each family was randomly assigned to either multisystemic therapy or the alternative
treatment group" (Borduin & Henggeler 1990, p. 76).
2) "Following a pretreatment assessment session, adolescent offenders were randomly assigned to either MST (n = 100) or IC
(n = 100)" (Henggeler et al., 1991, p. 45).
3) "Of the 200 families who completed pretreatment assessments, 24 (12%) subsequently refused to participate in treatment… The
remaining 176 families were randomly assigned…to MST (n = 92) or individual therapy (IT; n = 84)" (Borduin et al., 1995, p. 570).
Conflicting reports on the number of cases that completed treatment and conflicting definitions of treatment completion:
1) 156 cases: "Approximately 84% (n = 88) of the families in multisystemic therapy and 65% (n = 68) of the families assigned to
alternative therapy completed treatment" (Borduin & Henggeler 1990, p. 76).
2) "140 (79.5%) completed treatment…and 36 (21.5%) dropped out, defined as unilaterally terminating after the first session (with
the youth or family) and before the seventh" (Borduin et al., 1995, p. 570; also see Henggeler et al., 1991, p. 45).
3) Schaeffer (2000, p. 36) wrote that "youth had been classified an MST completer (n = 77) if he or she participated in MST and was
judged by the therapist and clinical supervisor (Charles Borduin) to have completed treatment successfully; successful
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
66 of 192 | LITTELL ET AL.

completion was defined as having participated in a minimum of 5 treatment sessions…and having met at least some treatment
goals. The second group, Usual Services (US) completers (n = 63), consisted of those youth who had been randomly assigned to
the individual therapy condition and who had been judged by their therapists to have completed treatment successfully."
Participants were families of youth age 12‐17 who had 2+ prior arrests and no evidence of psychosis or dementia. Youth were living
with at least one parent or parent figure in two rural counties in Missouri. Average age at baseline (for n = 200) was
approximately 14.5; 67% to 69.3% were male; 67% to 76% were White, 22% to 32% were Black (conflicting reports).

Interventions MST was provided by 2nd and 3rd year doctoral students in clinical psychology with an average of 20.7 hours of service.
Interventions varied in the MST group (83% received family therapy, 60% school intervention, 57% peer intervention, 28%
individual therapy, 26% marital therapy).
Individual therapy (IT) for youth provided by Master's level therapists at local social service agencies, mean of 22.5 hours, with brief
monthly contact with parents in 66% of cases, psychiatric evaluation and medication in 10% of IT cases.

Outcomes Post‐treatment assessments included self‐report and behaviour ratings obtained from parents, teens, and next youngest sibling on
psychiatric symptoms, behaviour problems, family functioning, and peer relationships. Observational measures were obtained via
video recordings of parents, adolescents, and siblings as they completed a family interaction task. Data on peer relations and
school grades were obtained from one of the teen's teachers. Criminal outcomes after the end of treatment (or after the end of
probation) included subsequent arrests, days incarcerated, and length of probation, based on official juvenile records, adult court,
and state police records.

Country USA

Funding Missouri Department of Social Services, University of Missouri‐Columbia Research Council, US National Institutes of Mental Health

Notes Participants who refused treatment (n = 24) were more likely to be nonwhite than those in the main analysis sample (see Table 6). In the
main analysis subsample (n = 176), there were more Whites (74% vs. 69%) and fewer males (62% vs. 77%) in the MST versus IT group
(Schaeffer 2000, p. 120). A follow‐up report showed a significantly higher proportion of White caregivers in the MST versus IT group
(83.8% vs. 73%, p = .047; Johnides 2015). Further, MST dropouts were disproportionately (73%) male, reducing the proportion of
males in the MST completers group to 59% (Schaeffer 2000, p. 123). MST cases had significantly more pretreatment arrests for
nonviolent offences than IT cases (means of 1.65 vs. 0.98; Schaeffer 2000, p. 120). Participant's race was associated with arrest rates
and duration of confinement; gender was associated with arrest severity and length of probation (Schaeffer 2000, p. 141).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk Conflicting reports: (1) Families were assigned to treatments using "simple randomization with a
generation coin toss by a court administrator" (Schaeffer & Borduin, 2005, p. 446) versus (2) "families
(selection bias) were randomized to treatment conditions and to therapists within each condition" (Johnides
et al., 2017, p. 325).

Allocation concealment Unclear risk Families were randomly assigned using a coin toss (Borduin et al., 1995, p. 570).
(selection bias)

Baseline equivalence High risk Missing data on 10 cases excluded after the first (Mann et al., 1990) report. For remaining cases,
there were significant between‐group differences in adolescents' gender (d = 0.41), race
(d = 0.13), and prior nonviolent arrests (d = 0.45), with more Whites, females, and nonviolent
prior offences in the MST versus IT group (Schaeffer 2000, p. 120).

Performance bias High risk Between‐group differences in training and supervision of therapists: MST therapists were doctoral
(confounding) students in clinical psychology; IT therapists were Master's level professionals in community
agencies. MST therapists were supervised by the principal study author in 3‐hour weekly
group sessions, including review of video‐ or audio‐taped family sessions; IT therapists
attended weekly case reviews with a juvenile court treatment coordinator (Borduin
et al., 1995).

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias High risk There were significant differences in racial composition between the initial sample (~70% White),
Administrative data the main analysis subgroups (~76% White), and refusers (25% White, d = 1.24, see Table 6).
With 176/210 cases in most analyses, overall attrition is 16% at 4 years (8% differential
attrition); 21% attrition (9% differential) at 13.7 years; 30% attrition (2% differential) at 21.9
years after treatment ended.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 67 of 192

Attrition bias High risk 126/210 cases completed post‐treatment assessments (40% missing, 13% differential attrition).
Participant reports

Intention to treat analysis High risk 10 cases were excluded after the 1990 report. Families who refused treatment (n = 24) and those
who dropped out or did not complete treatment "successfully" (Schaeffer 2000, p. 36) were
excluded from some analyses.

Standardized observation High risk In some reports, follow‐up observation periods began when juvenile court supervision (probation)
periods ended, which was within 2 weeks of the end of treatment for 96% of treatment completers and an
average of 6 months after referral for dropouts (Borduin et al., 1995, p. 572). In other reports,
follow‐up began after treatment ended (Schaeffer 2000, p. 44). The length of observations in the 4
year follow‐up range from 2.0 to 5.4 years; the 13.7 year follow‐up ranges from 2.0 to 15.9 years
(Schaeffer 2000); the 21.9 year follow‐up ranges from 2.0 to 23.8 years (Sawyer 2008). On
average, control cases were observed 235 days longer than MST cases in the 21.9 year follow‐up
(Sawyer 2008). Statistical controls for varying observation periods were included in some analyses,
but not in overall recidivism rates (e.g., data on recidivism rates at the 13.7 year (2005) and 21.9
year (2008) follow‐ups include 11 cases that were lost to follow‐up within 2–3 years (6 had been
arrested, 5 had not; Borduin et al., 1995, p. 572)).

Validated outcome measures Unclear risk Use of administrative data, measures that had been validated in other samples, and observational
measures. Some observational measures had αs or inter‐rater reliability coefficients below 0.7.

Selective reporting High risk There is no public protocol for this study. Some outcomes are only reported for subgroups of
program completers. "At least some post‐treatment data were obtained from 65% of the
families who dropped out" (Schaeffer 2000, p. 39), but dropouts were not included in published
post‐treatment assessments (e.g., Borduin 1995).

Conflicts of interest High risk Two authors were board members and shareholders of MST Services, which licenses and
disseminates MST. Dr. Borduin is a board member of MST Associates. Dr. Borduin provided
3 hours of weekly group supervision to the MST therapists who participated in this study
(Henggeler et al., 1991, p. 46).

Borduin 2009

Methods From 1990 to 1993, 48 adolescents were randomly assignment to MST‐PSB or usual juvenile justice services for sexual offenders in
two counties in Missouri (US). This midwestern region included both rural and urban areas. Research assistants collected data
from caregivers and youth and their individual and family functioning; these measures were collected before treatment began
and within one week after the termination of treatment. Data on post‐treatment arrests, convictions, and incarceration were
collected from juvenile and adult court records over a follow‐up period averaging 8.9 years.

Participants Participants were 48 juvenile sex offenders with an average of 4.33 previous arrests for sexual and nonsexual felonies. Most (96%)
were male; their average age at intake was 14; 73% were White and 27% Black; 2% identified as Hispanic; and 31% lived with a
single parent. All of the youths remained under court jurisdiction during the treatment phase of this study.

Interventions MST services were provided by graduate students in clinical psychology. Services lasted an average of 216 days (SD 86, range 100 to
446 days). Therapists received an initial orientation, 3 hours of weekly group supervision, plus individual supervision provided by
the principal study author. MST therapists provided an average of approximately 3 hours of service per family per week.
Youths in the Usual Services group received cognitive–behavioural group treatment for 90 minutes twice a week (in groups of 4‐6
youths) plus individual treatment for 60–90 minutes once a week through a treatment services branch of the local juvenile court.
Therapists were certified juvenile sexual offender counsellors, who held Master's degrees in psychology or social work, and had
approximately 6 years of clinical experience with youth. Mean length of service was 30.1 weeks (SD 18, range 17 to 90 weeks).
All therapists in both conditions were White.

Outcomes Research staff gathered data from youth and caregivers on psychiatric symptoms (GSI of the BSI), behaviour problems (RBPC),
delinquency (SRD, peer relations (MPRI), family relations (FACES‐II), school grades (teacher and parent reports). Substantiated
arrests for sexual and nonsexual crimes (index offences) were identified in juvenile and adult criminal records at 7.31 to 10.64
years after treatment ended (mean 8.9 years, SD 1.02 year).
"Juvenile incarceration was measured as the number of days that a youth was placed by the Department of Youth Services in a
residential facility. Adult incarceration was measured as the number of days that a participant was sentenced to serve in an adult
correctional facility" (Borduin et al., 2009, p. 30).

Country USA

Funding Missouri Department of Social Services, University of Missouri Research Council

Notes Youth in the MST group had more caregiver‐reported behavioural problems (d = 0.70) and those in the usual services group had
more self‐reported property crimes (d = 0.23) at referral.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
68 of 192 | LITTELL ET AL.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk Families were randomised "using a random‐number table" (Borduin et al., 2009, p. 27).
generation
(selection bias)

Allocation concealment Unclear risk Assignments were contained in sealed envelopes. Unclear whether envelopes were opaque or
(selection bias) sequentially numbered.

Baseline equivalence High risk Youth in the MST group had more caregiver‐reported behavioural problems (d = 0.70) and those in
the usual services group had more self‐reported property crimes (d = 0.23) at referral.

Performance bias Unclear risk MST services ranged from 14 to 64 weeks in length (mean of 30.8 weeks), with about 3 hours of
(confounding) direct contact with family members per week. Usual services ranged from 17 to 90 weeks
(mean 30.1) with approximately 4 to 4.5 hours of individual and group therapy per week.

Detection bias (blinding) Low risk "Youths’ criminal arrest data were obtained yearly from juvenile office records by research
Administrative data assistants who were uninformed as to each participant's treatment condition. Adult criminal
arrest data were obtained from a computerized database by a state police employee (also
uninformed as to treatment condition) who conducted a search by participant name" (Borduin
et al., 2009, p. 30).

Detection bias (blinding) Unclear risk Research assistants were uninformed about participants treatment group at the initial assessment
Participant reports (Borduin et al., 2009, p. 29). Unclear if blinding was maintained for the post‐treatment
assessment.

Attrition bias Low risk No attrition.


Administrative data

Attrition bias Low risk 4% attrition overall, 8% differential attrition.


Participant reports

Intention to treat analysis Low risk Archival data were available for all cases.

Standardized observation High risk Archival data were obtained at an average of 8.9 years after treatment was completed (SD 1.02,
periods range 7.31 to 10.64 years). The starting point for the follow‐up period (end of treatment)
ranged from 14 to 64 weeks post randomiation for MST cases and 17‐90 weeks for usual
services cases.

Validated outcome measures Low risk Use of standardised measures (with αs > .7) and archival data.

Selective reporting Unclear risk There is no public protocol for this study. Reporting of selected SRD subscales; missing valid ns
(e.g., for fathers' reports).

Conflicts of interest High risk Dr. Borduin provided supervision for MST therapists in this study. He was a board member and
shareholder of MST Services, Inc., and currently serves as a board member of MST Associates.

Butler 2011

Methods Consecutive referrals from two youth offending offices in North London were randomly assigned to MST (n = 56) or usual services
(n = 52) in 2003‐2009. Researchers assessed criminal offending (the primary outcome) at 6, 12, 18, 24, 30, 36 months post
random assignment; data on offences and custodial sentences were obtained from police records; pre‐ and post‐treatment
measures of individual and family functioning were obtained from youth and caregivers.

Participants Participants were 108 youth, ages 13‐17 (average 15.1 years), with delinquent, aggressive, or antisocial behaviour. With an average
of 2.5 prior offences, all of the youth were under a court order or supervision order. Most youths (82.4%) were male; 34% were
White, 32% were Black, 5% were Asian, and 24% were other or mixed race. Only one‐third were attending mainstream schools.
More than two‐thirds lived in single parent families. The study excluded youth who were involved in other ongoing care, those
with sexual offences, psychotic illness, problems limited to substance misuse, and those who posed a risk to trial personnel.

Interventions MST therapists held Master's degrees in counselling psychology or social work and had a minimum of two years of professional
experience with families. MST services lasted 11 to 30 weeks (mean of 20.4 weeks). Families in the MST group could also receive
usual services, often including contact with a social worker. Therapists received standard MST training, weekly on‐site
supervisor, a weekly 1‐hour phone consultation with an MST expert, quarterly booster sessions, and biannual implementation
reviews.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 69 of 192

Usual services, organised by Youth Offending Teams, included "a tailored range of interventions aimed at preventing reoffending"
(2011, p. 1224). These services focused on education, substance misuse, anger management, problem solving, victim awareness,
and reparations. Youths in the control group received an average of 21 professional appointments over 20 weeks; 67% of these
meetings were with social workers, 7% with reparations workers, 7% with parenting workers, 6% with group workers, and 6%
with substance abuse workers.
Youth in the control group had more appointments with professionals than those in the MST group. Key differences between MST
and services provided to the control group were that the latter "interventions are not normally organized to be delivered in a
family context by a single person. No overarching model governs the selection of treatments, and there is no set of principles
comparable to those of MST to organize the therapies offered; rather, interventions are offered on an 'as needed' basis by
specialist agencies to which the young person is referred" (Butler 2011, p. 1224).

Outcomes Data on criminal offences and custodial sentences were obtained from police computer records and the National Young Offender
Information System database at 6 month intervals. Secondary outcomes were: parent and youth reported symptoms of antisocial
behaviour, delinquency, and aggression (ABAS, APSD, SRYB, YSR, CBCL); age‐appropriate autonomy (SFIT); involvement with
delinquent peers (IDP); and parenting practices (Loeber). These outcomes were assessed at baseline and approximately 6 months
after randomisation for both groups.

Country UK

Funding Atlantic Philanthropies, the Tudor Trust, UK Department of Health

Notes The MST group included more Whites (49% vs. 26%, d = 0.45), fewer Blacks (27% vs. 39%, d = 0.30), and families with higher
socioeconomic status (mean 2.5, SD 1.6 vs. mean 2.0, SD 1.7, d = 0.30) than the usual services group (2011, p. 1223). Youth in the
MST group were more likely to have committed offences in the six month period prior to referral (82% vs. 67%, d = 0.45), but
there were no between‐group differences in the total number of prior offences (MST mean 2.5, SD 1.6; usual services mean 2.4,
SD 1.8; d = 0.06).
"TAM scores did not make a significant contribution to the primary outcome variable (all offenses) either as main effect… or in
interaction with the rate of change of offense frequency… More adherent treatments appeared no more likely to reduce the
likelihood of offenses" (Butler 2011, p. 1231).MST cost an average of £5,687 per case, compared with usual services of £4,619 (in
FY2008‐2009 values; Cary et al., 2013).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk "Treatment allocation was made offsite using a stochastic minimisation program (MINIM)
generation balancing for type of offending (violent vs. nonviolent), gender and ethnicity" (Butler 2011,
(selection bias) p. 1223).

Allocation concealment Low risk After random assignment was performed at a remote location "the MST supervisor informed
(selection bias) patients of their assignment" (Butler 2011, p. 1223).

Baseline equivalence High risk The MST group included more Whites (49% vs. 26%, d = 0.45), fewer Blacks (27% vs. 39%,
d = 0.30), and families with higher socioeconomic status (mean 2.5, SD 1.6 vs. mean 2.0, SD 1.7,
d = 0.30) than the usual services group (Butler 2011, p. 1223). Youth in the MST group were
more likely to have committed offences in the six month period prior to referral (82% vs. 67%,
d = 0.45), but there were no between‐group differences in the total number of prior offences
(MST mean 2.5, SD 1.6; usual services mean 2.4, SD 1.8; d = 0.06).

Performance bias Unclear risk Available data on amounts and types of service were not comparable across groups.
(confounding)

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Low risk Overall attrition was 1% to 8%, differential attrition 2% to 3%.
Administrative data

Attrition bias Low risk 4% attrition overall, 3% differential attrition.


Participant reports

Intention to treat analysis Low risk No systematic exclusions.


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
70 of 192 | LITTELL ET AL.

Standardized observation Low risk Archival data were obtained in 6 month intervals, beginning 6 months before random assignment
periods and ending 36 months after random assignment. Assessments of individual and family
functioning (secondary outcomes) were conducted at baseline and 6 months after
randomisation for cases in both groups.

Validated outcome measures Unclear risk Use of standardized measures, some with αs < .7.

Selective reporting High risk Outcome data for the 30 and 36 month follow‐ups are not publicly available. The study was
registered in clinicaltrials.gov in 2012 (1 year after publication of initial results).

Conflicts of interest Low risk Authors declared that they had no financial interests or potential conflicts of interest.

Fonagy 2017

Methods STEPS‐B trial used random assignment of participants to treatment conditions, stratified by age, gender, conduct disorder, and age
difference between victim and perpetrator.

Participants 40 youth with problematic sexual behavior

Interventions MST‐PSB versus TAU

Outcomes Sexual and nonsexual offenses, out of home placement, mental health, peer relations, family functioning, costs incurred in 20 months
following randomisation

Country UK

Funding UK Dept of Health/Development for Children, Schools, and Families, and National Institute of Health Research

Notes Between‐group differences in race and referral source are shown in Table 4. The MST group included more Blacks and youth
referred from social care, while the MAU group included more Whites and adolescents referred from youth offender services
(2017, pp. 21–22).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk "A computer‐generated adaptive minimization algorithm that incorporates a random element is
generation used with the following stratification factors: gender, age (10–14, 15–17), conduct problems
(selection bias) presenting with problem sexual behaviour (yes or no) and age differential between the
perpetrator and victim (< 4 years or ≥ 4 years)" (2015, p. 9).

Allocation concealment Low risk "Eligible consenting participants are randomized to MST‐PSB or MAU on a 1:1 basis by the Trials
(selection bias) Unit at UCL through the use of a secure randomization service that ensures allocation
concealment" (2015, p. 9).

Baseline equivalence High risk Between group differences on race (d = 0.53) and referral source (d = 0.65) are shown in Table 4.

Performance bias Unclear risk There is little information on amounts and types of services received by participants.
(confounding)

Detection bias (blinding) Unclear risk Research assistants (RAs) were blind to treatment allocation at randomisation and there was no
Administrative data direct communication between RAs and therapists (2015, p. 9), but it is possible that RAs
learned about assignments in interviews with caregivers and youth.

Detection bias (blinding) Unclear risk Research assistants (RAs) were blind to treatment allocation at randomisation and there was no
Participant reports direct communication between RAs and therapists (2015, p. 9), but it is possible that RAs
learned about assignments in interviews with caregivers and youth.

Attrition bias Low risk Data on educational placements and police records on criminal offences were obtained "for the
Administrative data entire sample" (2017, p. 14).

Attrition bias High risk Differential attrition was 12% and reasons for attrition were somewhat different in the two
Participant reports groups. "Refusals from the MAU condition also often focused on the frustrations of receiving
little or no actual treatment, and therefore participants did not feel motivated to take part in
research" (2017, p. 14).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 71 of 192

Intention to treat analysis Unclear risk Attempts were made to keep participants in their assigned groups, despite differential attrition
(2017, p. 14).

Standardized observation Low risk Assessments were conducted at baseline, 8, 14, 20 months. As of 2017, some participants had not
periods completed the 20 month assessment.

Validated outcome measures Unclear risk Use of standardized measures; no data provided on their consistency or reliability in this sample.

Selective reporting High risk Primary outcome was changed from offenses (in the 2015 protocol) to placements (in the 2017
research report). "Selected secondary outcomes" were reported in 2017 (p. 24).

Conflicts of interest Unclear risk Protocol authors declared no competing interests (2015), but two authors managed or supervised
the MST team. No statement of competing interests in the 2017 report.

Fonagy 2018

Methods Conducted in two phases (START I and START II). From 2010 to 2012, participants were randomly assigned to MST or management
as usual (MAU) in nine sites in the UK. Referrals came from multiple sources and were screened at three gateway points. Random
assignments were stratified by centre, gender, age, and age at onset of conduct problems. Research assistants collected
psychosocial data from youth and caregivers at baseline and 6, 12, 18 months (START I) and 24, 36, and 48 months (START II)
post random assignment. Teachers also provided some data on youth behaviour. Data on criminal offending and service use were
collected through 60 months after referral.

Participants Participants included 684 youth (11–17 years old) with antisocial behaviour and mental health problems (persistent aggressive
behaviour, risk of harm to self or others, prior criminal conviction, unsuccessful outpatient treatment, or permanent school
exclusion). Youth who were living away from home at baseline were excluded. Average age at referral was 13.8 years; 64% were
male; 78% White, 10% Black. Referral sources included social services (43%), youth offending teams (17%), child and adolescent
mental health services (16%), a family intervention project (6%), and police (2%) (2020a).

Interventions MST services lasted 3‐5 months, with 3 meetings per week with therapists, and 24/7 availability of therapists. MAU services included
multiple components, depending on available local resources and individual family needs; MAU was "no less resource‐intensive" than
MST (2013, p. 3). MST therapists had weekly group supervision. Interventions in both groups varied with family needs.

Outcomes Data on long‐term placements (3 months or longer in local authority care, incarceration, long‐term hospitalisation, or residential
schooling; the primary outcome in START I) were reported at 18 and 48 months post referral. Data on criminal convictions
(primary outcome in START II) were obtained from police computer records, with follow‐ups lasting 5 years. Secondary outcomes
included youth antisocial behavior, attitudes, health and mental health symptoms (SDQ, ICUT, ABCL, SRD, ABAS, ASR, SMF,
ARQ, Connors, SF‐36, EQ‐5D‐3L), school participation, parent health and mental health (GHQ, EQ‐5D‐3L), parenting behavior
(APQ, Loeber), and family functioning (FACES‐IV, CTS2, LEE) were obtained at 0, 6, 12, 18, 24, 36, and 48 months post random
assignment. Additional data were used to analyse service and criminal justice costs and health economics. Reports include results
based on observed data, as well as multiple imputation of missing data.

Country UK

Funding START I: UK Department for Children, Schools and Families (now the Department for Education), UK Department of Health
START II: National Institute for Health Research (NIHR) Health Services and Delivery Research Program (2014‐2018, £1,080,728)

Notes Prospective registration of the study protocol in 2009. Plans for data sharing have been published (2020). Investigators planned to collect
data on school outcomes from a national database, but this information was not available to them. Economic analyses showed "higher
costs and poorer outcomes in the multisystemic therapy group than in the management as usual group" (2020, p. 427).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk Randomisation occurred after consent and baseline assessment had been completed, using a secure
generation telephone randomisation service that ensured allocation concealment. A computer‐generated
(selection bias) adaptive minimisation algorithm was used to create random assignments with the following
stratification factors: centre, gender, age, and age at onset of conduct problems (2013, p. 9).

Allocation concealment Low risk Randomisation occurred after consent and baseline assessment had been completed, using a
(selection bias) secure telephone randomisation service that ensured allocation concealment (2013, p. 9).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
72 of 192 | LITTELL ET AL.

Baseline equivalence Low risk Full reporting of group‐level data on >30 baseline characteristics (2018, p. 7); no baseline
differences greater than d = 0.19.

Performance bias Low risk MST and MAU groups received similar amounts of health, mental health, education, and other
(confounding) social services (2018a, p. 52).

Detection bias (blinding) Low risk Research assistants "are blind to treatment allocation" (2013, p. 9). "Investigators and objective
Administrative data assessors were strictly masked to treatment allocation… Masking was maintained through the
follow‐up period, with clinical and research staff located separately to avoid leakage of
information. All coding, data entry, and data cleaning were done by individuals masked to
treatment allocation" (2020, p. 422).

Detection bias (blinding) Low risk Research assistants "are blind to treatment allocation" (2013, p. 9). "Investigators and objective
Participant reports assessors were strictly masked to treatment allocation… Masking was maintained through the
follow‐up period, with clinical and research staff located separately to avoid leakage of
information. All coding, data entry, and data cleaning were done by individuals masked to
treatment allocation" (2020, p. 422).

Attrition bias Low risk 1% attrition at 6, 12, and 18 months post random assignment; 11% attrition at 60 months, with 3%
Administrative data differential attrition.

Attrition bias Unclear risk 15%‐20% attrition at 6 months, 24%–31% at 1 year, 22%–36% at 18 months, 30% at 2 years, 37%
Participant reports at 3 years, 49% at 4 years. Differential attrition < 10% at all points in time. At each point in
time, comparisons of responders and nonresponders show small to moderate differences
(d < 0.20). Multiple imputation was used to adjust results for missing data.

Intention to treat analysis Low risk There were no systematic exclusions. All analyses were performed with participants in the group
to which they were originally assigned.

Standardized observation Low risk Data were collected at baseline and 6, 12, 18, 24, 36, 48, and 60 months after random assignment.
periods

Validated outcome measures Unclear risk Use of standardised measures, some with αs < .7.

Selective reporting Low risk The study was prospectively registered and a detailed protocol is available. "All analyses, except where
noted, were pre‐specified" (2020. p. 423). All planned measures and endpoints were fully reported.

Conflicts of interest Low risk Authors declared no competing interests (2013, p. 16).

Glisson 2010

Methods Cluster‐randomised trial conducted 2003–2007 in 14 poor, rural Appalachian counties in Eastern Tennessee. Communities were
randomly assigned to an organisational and community intervention, called Availability, Responsiveness, and Continuity (ARC),
designed to support effective children's services. Within each community, court‐involved youth who had behavioural and
psychiatric problems randomly assigned to MST versus “the usual array of intensive services” (2005, p. 249).

Participants 674 youth were randomly assigned to MST (n = 349) or usual services (n = 325); 59 cases were lost before baseline. n = 615 (2010).
Participants had been referred to the juvenile court for status offences or delinquent behaviour and met diagnostic criteria for
one or more mental disorders; 53% had two or more mental health diagnoses. Their average age was 14.9, 69% were male, 91%
were White (n = 615).

Interventions MST services lasted an average of 105 days (no difference between ARC and non‐ARC communities). MST therapists were employed
by a large, private mental‐health service organisation. Seven treatment teams served the 14 counties in the study. Therapists had
an average of 4 years of professional mental health services and 45% held Master's degrees. Therapists received a 5‐day
orientation, on‐site supervision, weekly consultation with a MST expert, and quarterly booster training sessions.
Usual services lasted an average of 187 days (with no difference between ARC and non‐ARC communities) and included inpatient
(24%), outpatient (90%), and family/parent‐focused (50%) mental heath treatment provided by individual practitioners (43%),
mental health centers (41%), in‐home therapists (39%), and physicians (5%) (2010, p. 541).

Outcomes In interviews with research staff, caregivers were asked to rate youth behaviour problems on the CBCL at baseline and 6, 12, and 18
months after baseline. These interviews and monthly phone calls with caregivers were used to identify youth placements in out‐
of‐home state custody.

Country USA

Funding US National Institute of Mental Health, US National Institute on Drug Abuse, John D. & Catherine T. MacArthur Foundation

Notes
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 73 of 192

Risk of bias table

Bias Authors' judgement Support for judgement

Random sequence generation Unclear risk Methods of sequence generation are not entirely clear. "Assignment to MST or
(selection bias) usual services was determined by a predetermined, concealed randomisation
of sequence numbers based on the order of recruitment" (2010 p. 539).

Allocation concealment Low risk Following informed consent and initial assessment, "the mental health clinician
(selection bias) informed the research specialist who then contacted the data manager for
the random assignment to treatment condition and informed the families"
(2010, p. 539).

Baseline equivalence Unclear risk No group level data provided.

Performance bias (confounding) Unclear risk Usual services tended to last longer than MST, but there is no information on
amounts of service or contacts provided to the two groups.

Detection bias (blinding) Not applicable


Administrative data

Detection bias (Blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Not applicable


Administrtative data

Attrition bias Low risk 12% to 23% attrition overall, 2% to 5% differential attrition
Participant reports

Intention to treat analysis High risk Excluded 59 cases lost before the baseline assessment, cases that never
received treatment.

Standardized observation periods Low risk Observations at 6, 12, and 18 months, plus monthly phone contacts.

Validated outcome measures Low risk Use of CBCL scores (alphas .94 to .96 in this sample) and monthly telephone
surveys with caregivers to obtain data on children's living arrangements.

Selective reporting Unclear risk No public protocol is available. Reporting of results does not support meta‐
analysis (no means or SDs are available).

Conflicts of interest High risk Dr. Schoenwald is a board member and stakeholder of MST Services, the
organisation that licenses and disseminates MST.

Henggeler 1992

Methods The Family and Neighborhood Study (FANS) was conducted in Simpsonville, SC beginning in 1989. Families were referred by the
Department of Youth Services (DYS) in yoked pairs, with one youth "randomly selected" by the Department of Mental Health to
receive MST and the other to receive usual services (1992, p. 954). Research assistants collected data on psychosocial measures
at baseline and shortly after MST treatment ended. Archival data on arrests and incarceration were obtained at an average of 59
weeks and 120 weeks after referral.

Participants Conflicting reports on sample size. The 1992 report indicates that 96 youth were referred by DYS in (48) yoked pairs; 12 cases were
excluded because MST was never implemented, random assignment was violated, or archival data were not available (1992, p.
954). Subsequent reports refer only to 84 of the cases randomly assigned to treatments (1993, p. 287; 1996, p. 50). At referral,
participants were judged to be at imminent risk of out‐of‐home placement for recent, serious offences. The mean age at referral
was 15.2; 77% of participants were male; 56% were Black, 42% White, 2% Hispanic; 26% lived with neither biological parent.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
74 of 192 | LITTELL ET AL.

Interventions MST was delivered by 3 Master's level therapists, who received 3 days of training plus 1‐hour weekly phone consultations with Dr.
Henggeler, in addition to weekly on‐site supervision, and one‐day booster training sessions every two months. Therapists had
caseloads of 4 families and were available 24/7. Sessions were frequent (sometimes daily), usually occurred in the family home,
and lasted 15‐90 minutes. Average duration of MST services was 13.4 weeks (range 5 to 23) including an average of 33 hours of
direct contact (SD 29).
Usual services (US) in juvenile justice included court orders (e.g., curfew, school attendance), monitoring and monthly meetings with
probation officers (POs), and passive referrals for other services. Meetings with POs emphasised compliance with court orders.
"Few substantive services were delivered because of the passive nature of traditional mental health services combined with
family difficulties and resistance" (1992, p. 955).

Outcomes Data on subsequent arrests and incarceration were available for 84 cases at an average of 59.6 weeks after referral (SD 25.4, range
16 to 97 weeks) and at approximately 120 weeks. Research assistants administered pre‐ and post‐treatment assessments in
family homes. Parents and youth completed standardised measures of psychiatric symptoms, delinquency, family functioning,
and peer relations. Post‐treatment assessments were conducted for both cases in the yoked pair shortly after MST services
ended for the MST case (unless the US case was incarcerated). A subsample of program completers (n = 56) completed post‐
treatment assessments.

Country USA

Funding US National Institute of Mental Health, National Institute of Drug Abuse

Notes Yoked design was not retained, 12 cases were excluded, and another 28 cases were not included in post‐treatment assessments.
Variable observation period (16 to 97 weeks, mean = 59.6, SD = 25.4) after referral.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk No information on sequence generation methods. Conflicting reports on random assignment.
generation From the 1992 report: "Eligible youth were referred by the DYS in yoked pairs, with one youth
(selection bias) randomly selected by the Department of Mental Health to receive MST and the other to
receive the usual services" (p. 954).

Allocation concealment Unclear risk No information on allocation concealment.


(selection bias)

Baseline equivalence Unclear risk Group‐level data on background characteristics were not provided.

Performance bias High risk MST therapists had Master's degrees, 1.5 years post‐Master's experience, 3 days of training,
(confounding) weekly phone consultations with Dr. Henggeler, weekly on‐site supervision, 1‐day booster
trainings every 2 months, and periodic feedback on case notes by the program director,
supervisor, and Dr. Henggeler (1992, p. 955). US cases received court‐ordered supervision,
monthly meetings with probation officer, and passive referrals for social services. "Few
substantive services were delivered" to the usual services group (1992, p. 955).

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of research assistants, who administered informed consent and collected
Participant reports data in family homes.

Attrition bias Low risk 13% attrition, 4% differential attrition.


Administrative data

Attrition bias High risk 42% attrition with 21% differential response. Compared with completers (n = 56), noncompleters
Participant reports (n = 28) were more likely to be male (82% vs. 75%, d = 0.24), White (57% vs. 34%, d = .69), and
live in households with neither parent present (39% vs. 18%, d = .54) (see Table 6).

Intention to treat analysis High risk 12 youth were excluded after random assignment because they did not have a felony arrest (n = 2),
never received MST (n = 6), random assignment was violated (n = 2 cases randomly assigned to
usual services that were court‐ordered to receive MST), or recidivism data was not available
(n = 2) (1992, p. 954). Another 28 cases were not included in post‐treatment assessments.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 75 of 192

Standardized observation High risk The first follow‐up occurred at an average of 59 weeks post referral, but included cases observed
periods as few as 16 weeks and those observed for 97 weeks (1992). There were no controls for
variable lengths of observation.

Validated outcome measures Unclear risk Outcome measures were developed and tested in other samples. No information was provided on
reliability of outcome measures in the study sample.

Selective reporting Unclear risk There is no public protocol for this study. Measures of self‐reported substance use were reported
in 1991, but not mentioned in 1992.

Conflicts of interest High risk Dr. Henggeler, Dr. Borduin, and Dr. Schoenwald were board members and shareholders of MST
Services, the organisation that licenses and disseminates MST. MST therapists who
participated in this study received training, weekly phone consultation, and feedback on their
case notes on study participants from Dr. Henggeler (1992, p. 955).

Henggeler 1997

Methods Study conducted in two sites in South Carolina. Each site included three counties. One site was both rural and urban, and
predominantly White; the other was rural and predominantly Black. Referrals were from the state Department of Juvenile
Justice. Research assistants (RAs) were state employees, who randomly selected/assigned cases from lists of eligible participants
to MST or usual services (US). Most families were assigned to treatment conditions in yoked pairs, to match the timing of
posttests for both cases to the end of MST services in the MST case. RAs conducted pretest and posttest interviews with youth
and parents, and gathered data from schools and courts.

Participants 155 youth, ages 11 to 18 (mean age 15.2), who had committed a violent criminal offence or had 3 prior arrests. The sample included
73 yoked pairs plus 9 MST cases. 81.9% of youth were male; 80.6% were Black and 19.4% were White.

Interventions MST was provided by state‐employed, Master's level mental health professionals with backgrounds in social work or pastoral
counselling (1–15 years prior therapy experience). MST lasted an average of 4 months (2‐7 months in one site, 2‐9 months in the
other). MST therapists received a 6‐day training in MST, weekly individual supervision, weekly staff meetings, and quarterly
booster training sessions. Sessions were audiotaped and MST therapists kept detailed logs of daily activities.
Usual juvenile justice services included a minimum of 6 months probation. Probation officers visited youth at least once a month,
monitored their school attendance, and made referrals for other social services.

Outcomes Post‐treatment measures of youth and parent psychiatric symptoms, problem behaviours, self‐reported delinquency, parental
monitoring, family functioning, and peer relations, gathered in interviews with parents and youth shortly after MST services
ended. Data on criminal activity and incarceration were collected from a Department of Juvenile Justice (DJJ) database
approximately 1.7 years after the project ended. Offences were coding using a severity index.

Country USA

Funding US National Institute of Mental Health; Center for Mental Health Services, Substance Abuse and Mental Health Services
Administration

Notes Yoked design was not retained, 9 MST cases were not paired with US cases, 13 (MST?) cases were dropped due to violations of the
(MST?) treatment protocol. Outcome data were aggregated across sites. MST costs estimated at $4,000 USD per case.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence High risk No information on the method of sequence generation. Conflicting information on the timing of
generation random selection/assignment, and whether this occurred before or after court approval and
(selection bias) informed consent.
From the 1994 report: "A list of all juveniles meeting the selection criteria was obtained from the
DJJ intake personnel in each county. Youth were randomly selected from this list and assigned
to receive either [MST] services or the usual DJJ services. Following random assignment…
family members…were asked to participate in the study. If they agreed, a member of the
project appeared in court with them and the DJJ probation officer to ask that the judge allow
the youth to be placed in the project." Informed consent, intake, and baseline assessments
were completed several days later, in a home visit (1994, p. 202).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
76 of 192 | LITTELL ET AL.

From the 1997 report: "Youths were randomly assigned to a treatment condition only after the
court's consent" (1997, p. 822). "After we received written informed consent, the family was
randomly assigned to a treatment condition" (1997 p 823).

Allocation concealment High risk No information on methods of allocation concealment. Discussion of recruitment of comparison cases:
(selection bias) "To further control for historical and related threats to validity, we temporarily yoked the families
assigned to receive MST services with families assigned to receive the usual services… Because…
rates of referral were very low at certain times… four of the families in the MST condition could
not be yoked to a comparison family. An additional five MST families were not yoked, either
because of a family's acceptance at the end of the project, when there remained insufficient time
to recruit a comparison family, or because of participant attrition" (1997, p. 822).

Baseline equivalence Unclear risk No group‐level data on demographic characteristics.

Performance bias High risk MST therapists were Master's level professionals with minimum of 1–15 years of therapy
(confounding) experiene. They received received an intensive 6‐day training program, quarterly booster
sessions, weekly individual supervision, and weekly staff meetings. MST therapists carried
small caseloads. Control cases were seen at least once a month by probation officers, who had
high caseloads.

Detection bias (blinding) High risk Research assistants were involved in treatment assignments and they collected data from courts,
Administrative data schools, and participants.

Detection bias (blinding) High risk Research assistants were involved in treatment assignments and they conducted interviews with
Participant reports parents and youth.

Attrition bias Low risk 0%–3% overall attrition, 0%–2% differential attrition.
Administrative data

Attrition bias High risk 11% attrition, 3% differential attrition. Dropouts were not included in post‐treatment
Participant reports assessments. Mothers of dropouts were better educated than mothers of completers F
(1, 153) = 6.81, p < .02, 1997, p. 825; d = 0.71).

Intention to treat analysis High risk 13 MST cases were excluded because their therapist did not follow the treatment protocol (1994,
p. 203); these exclusions were not mentioned in subsequent reports. It is not clear why 9 MST
cases remained in the sample addition to yoked pairs. The study's yoked design was not
maintained when one member of the pair dropped out. Dropouts were not included in post‐
treatment interviews.

Standardized observation Unclear risk Administrative data were collected 1.7 years "after the project ended". No information on the
periods average length or range in lengths of observation periods for participants. Outcome rates were
annualized, but methods were not described.

Validated outcome measures Unclear risk Use of standardised measures, some with αs < .7. No information on reliability and validity of some
outcome measures.

Selective reporting Unclear risk There is no public protocol for this study. Missing data on severity of re‐arrests and valid ns.

Conflicts of interest High risk Dr. Henggeler is a board member and shareholder of MST Services, the organisation that licenses
and disseminates MST.

Henggeler 1999a

Methods Random assignment to MST or usual services for juvenile offenders with substance abuse problems in Charleston County, South
Carolina. Data collection at baseline, post treatment, 6 months, 1 year, and 4 years post treatment, including parent and youth
reports, biologic tests for substance use, and administrative records on arrests and placements.

Participants Participants were 118 juvenile offenders, who were on probation at the time of referral and met diagnostic criteria for substance
abuse (54%) or dependence (46%). Most youth (72%) also met diagnostic criteria for another mental disorder. Youth were ages
12–17 (mean 15.7 years) with an average of 2.9 prior arrests; 79% were male; 50% were Black, 47% White. Four‐year follow‐up
data were available for 80 cases (68%).

Interventions MST was delivered by Master's and Bachelor's level mental health counsellors. Medication management was provided by a team
child psychiatrist. Families received an average of 40 hours of direct contact (SD 28, range 12 to 187) over a period of 130 days
(SD 32, range 61 to 252). Contacts took place in the families' homes (64%), on the phone (19%), in the office (3%), in the youths'
schools (3%), in cars (3%), and in other community locations (8%). In addition, therapists made an average of 26 indirect contacts
per case, lasting an average of 15 minutes each (for a total of approximately 6 hours of indirect contact per case). Therapists
received 40 hours of initial training, weekly 1.5 hour group supervision sessions with the child psychiatrist, and periodic review of
cases and therapist interventions by the principal investigator.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 77 of 192

Usual services including referral by a probation officer to outpatient substance abuse services. However, 78% (47) of the families in
the control group received no substance abuse treatment or mental health services in the five months after referral; 7% received
mental health services only, 10% received substance abuse services only, and 5% received both mental health and substance
abuse services.

Outcomes Outcome data were obtained immediately after the end of treatment and approximately 6, 12, and 48 months later. Data on
substance use came from youth self reports (PEI) and biologic tests (urine and hair samples). Standardised measures were used
to gather information on self‐reported delinquency, youth's internalizing and externalizing behaviour problems (CBCL), parents
symptoms (GSI), parental monitoring, and family functioning (FACES‐III). Data on arrests were obtained from computerised
records maintained by the Department of Juvenile Justice (DJJ). Information on school attendance, service use, and out‐of‐home
placements was gathered in monthly telephone interviews with caregivers; correctional placements were confirmed with the
DJJ. Placements included "detention centers, jails, psychiatric or substance abuse hospitals, and residential treatment centers"
(Henggeler et al., 1999, p. 174).

Country USA

Funding US National Institute on Drug Abuse

Notes Results for CBCL scales, measures of parent and family functioning, and outcomes at 1 year follow‐up were not reported. Total
service utilisation costs per case (including costs of out‐of‐home placements) were estimated at $6,027 USD for MST and $5,150
for control cases (Schoenwald et al., 1996).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk Random assignment was mentioned but methods were not described.
generation
(selection bias)

Allocation concealment Unclear risk Random assignment was mentioned but methods were not described.
(selection bias)

Baseline equivalence High risk Based on self‐reports at baseline, youth in the MST group had significantly higher rates of alcohol/
marijuana use (d = 0.40) and other drug use (d = 0.55) than control cases. In the 4‐year follow‐
up sample, MST cases were older (d = 0.44) and reported more marijuana use (d = 0.49) than
control cases.

Performance bias High risk Youth and families in the usual services group "received few substance abuse or mental health
(confounding) services during the first 5 months following recruitment into the project, a period of time
comparable to the treatment period of their MST counterparts" (Henggeler et al., 1999, p. 175).

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Unclear risk Low risk of bias in 1st year after referral; administrative data were available for all cases. High risk
Administrative data of bias at 4‐year follow‐up, with 32% attrition overall and 12% differential attrition.

Attrition bias Unclear risk Low risk in the 1st year after referral with 8% attrition and 3% differential attrition. High risk of
Participant reports bias at 4‐year follow‐up, with 32% attrition overall and 12% differential attrition.

Intention to treat analysis Low risk No systematic exclusion of dropouts or refusers.

Standardized observation High risk Post‐treatment assessments occurred after MST services ended, ranging from 61 to 252 days
periods after referral. Follow‐ups were timed to occur at 6 months, 1 year, and 4 years after MST
services ended. Not clear whether/how observation periods differed for MST versus control
cases.

Validated outcome measures Unclear risk Use of standardised and unstandardized measures (e.g., self‐reported drug use at 4 years).

Selective reporting High risk There is no public protocol for this study. Data collected but not reported for the post‐treatment
and 6 month follow‐up include: youth social competence (CBCL youth and parent reports;
Brown et al., 1999), parent reports of problem behaviour (RPBC), caregiver symptoms (SCL‐
90‐R GSI), Dyadic Adjustment Scale, FACES‐III adaptability and cohesion (parent and youth
reports) and parental monitoring (Cunningham et al., 1999, Huey et al., 2000). One year follow‐
up (T4) data (Henggeler et al., 2002) were not reported.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
78 of 192 | LITTELL ET AL.

Conflicts of interest High risk Dr. Henggeler is a board member and shareholder of MST Services, the organisation that licenses
and disseminates MST. During this study, Dr. Henggeler conducted periodic reviews of MST
cases and MST therapists' interventions.

Henggeler 1999b

Methods Random assignment of pairs of cases, one to MST and one to hospitalisation, following referral for emergency psychiatric care and
informed consent in Charleston County, SC, 1995‐1999. Allocation decisions were in sealed envelopes opened by crisis
caseworkers in hospital. Psychosocial assessments were conducted at baseline (T1), after the hospitalised youth was released
(about 2 weeks, T2), at completion of MST (about 4 months, T3), at 10 months (T4), 16 months (T5), 22 months (T6) and 30
months (T7) post recruitment (Rowland 2004, Huey et al., 2004).

Participants 160 youth ages 10‐17 (average 12.9 years) with psychiatric illness severe enough to warrant hospitalisation. Primary presenting
problems included suicidal ideation (38%), homicidal (17%), psychosis (8%), and threat of harm to self or others (37%).
Participants were residents of Charleston County, SC, in noninstitutional placements. They were referred by schools, emergency
rooms, shelters, the justice department, social services, and mental health service providers. 65% were male; 65% Black, 33%
White; 70% of families were receiving public assistance; and 50% were single parent households. Early reports included the first
116 youth enroled in this study.

Interventions MST with additional clinical staff (psychiatrist, crisis caseworker) and pharmacological interventions. Therapists received daily
supervision at the beginning of the project, and thrice weekly supervision later. Caseloads were 3 families per therapist. Average
duration was 127 days (SD 32) with an average of 92 hours of clinical service. 49% of youth were hospitalised at some point
during MST services (2003), with an average length of stay of 3.8 days. MST staff maintained clinical responsibility for youth
during these hospital stays and isolated them from inpatient group and recreational activities (1999).
Psychiatric hospitalisation was provided in the Youth Division Inpatient Psychiatric Unit at the Medical University of South Carolina.
The unit had a milieu therapy program for behavioural modification. Youth were served by multidisciplinary teams including
psychiatrists, a social worker, a special education teacher, and nursing staff. These teams met 5 days a week to integrate
treatment plans. After discharge, youth were matched with community mental health providers.
Similar psychotropic medication use (type and frequency) in the two groups.

Outcomes Research staff administered instruments to youth and caregivers in separate sessions in the home or hospital or by phone. Outcomes
included measures of adolescent psychiatric symptoms (GSI of BSI), internalizing and externalizing behaviour problems (CBCL),
self‐reported substance use, self‐esteem, social functioning (FFS), parent symptoms (BSI), parental control, and family functioning
(FACES III). Archival data were obtained to assess subsequent arrests. Data on school attendance, service use, and out‐of‐home
placements were gathered in monthly phone calls or home visits with caregivers.
"Out‐of‐home placement included foster care, therapeutic foster care, shelters, orphanages, group homes, residential treatment
centers, psychiatric or substance abuse hospitals, detention centers, boot camps, reception and evaluation centers, jails, and
prisons" (Schoenwald et al., 2000, p. 7).

Country USA

Funding US National Institute of Mental Health, National Institute on Drug Abuse, Agency for Healthcare Research and Quality

Notes Two control cases dropped out immediately after random assignment; one MST case did not complete baseline assessment. These
cases were excluded from analysis and are not mentioned in some reports (e.g., Huey et al., 2004).
There were significant between‐group differences at baseline on youth symptoms, externalizing behaviours, out‐of‐home placement,
and regular school placement (Henggeler et al., 2003, p. 546).
No data were reported for outcomes at 10 months (T4); incomplete data (no means or SDs) at 16 and 22 month (T5 and T6).
Program costs: MST $5,954 USD per youth, hospitalisation $6,174 per youth; including incremental costs (other placements), total
costs: $8,017 for MST and $7,878 for the hospitalisation group.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk Randomly assignment was performed using sealed envelopes and yoked pairs, but it is not clear
generation how the random sequence was generated (see Henggeler et al., 1999, p. 1332).
(selection bias)

Allocation concealment Unclear risk If eligibility criteria were met and families consented, "the crisis caseworker opened a sealed
(selection bias) envelope that informed the family of condition to which they were assigned" (1999, p. 1332).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 79 of 192

Unclear if envelopes were sequentially numbered or opaque. "[P]airs of cases (one MST, one
hospitalisation) were yoked regarding the timing of assessments" (1999, p. 1332). Unclear
whether workers knew assignment of one case before its pair was assigned, and if the second
assignment could be foreseen.

Baseline equivalence High risk At baseline, youth in the MST group had fewer self‐reported symptoms on the GSI (d = 0.30), more
caregiver‐reported externalising behaviour problems (d ~ 0.23), more out‐of‐home placements
(d = 0.64), and more regular school placements (d = 0.32) (2003, p. 546).

Performance bias Unclear risk Both groups received substantial amounts of treatment, but there were no comparable measures
(confounding) of amounts of treatment or contacts.

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Low risk 3% attrition at 4 and 16 months, with 3% differential attrition; 6% attrition at 22 months with 2%
Administrative data differential attrition.

Attrition bias Low risk 3% attrition at 4 and 16 months, with 3% differential attrition; 6% attrition at 22 months with 2%
Participant reports differential attrition.

Intention to treat analysis High risk Dropouts were excluded; refusers were excluded from analyses of administrative data.

Standardized observation Low risk Observations at baseline, 2 weeks, ~4 months (end of MST), 10, 16, 22, and 30 months after
periods referral. Timing of post‐treatment assessments was matched in yoked pairs.

Validated outcome measures Unclear risk Use of standardised scales, selected items or subscales (2004), some with αs < .7.

Selective reporting High risk Full reporting (group means and standard deviations) provided for T1, T2, T3 (some significant
differences), but not for T4, T5, or T6 (no significant treatment effects at T5 or T6). No report
was available for T7.

Conflicts of interest High risk Dr. Henggeler and Dr. Rowland are board members and shareholders of MST Services, which
licenses and disseminates MST.

Henngeler 2006

Methods Random assignment to 4 treatment conditions: family court, drug court, drug court plus MST, and drug court plus MST plus
contingency management. Only 2 treatment groups are relevant for this review: drug court (DC) alone versus DC plus MST
(DC + MST). Participants were enroled in the study between 2000 and 2003 in Charleston County, SC. Data were collected from
youth and caregivers at baseline, 4, 12, 18, 24, 36, 48, and 60 months post baseline. Data on arrests and out‐of‐home placements
were obtained from criminal justice records. Available reports on treatment effects are limited to outcomes within 12 months of
referral (results for months 18 though 60 are not available).

Participants In this study of 161 youth, 38 youth were assigned to DC and 38 to DC + MST. Youth who met DSM‐IV diagnostic criteria for alcohol
or drug abuse or dependence were recruited from the Department of Juvenile Justice (DJJ) in Charleston County, SC.
Participants were 12–17 years old (mean age of 15.2); 84% were male; 67% were Black, 31% White, and 2% were biracial.

Interventions Participants in both groups appeared before a drug court judge once a week for monitoring of drug use with urine screens.
Participants in the DC + MST group received an average of 66 hours (SD 32) of direct or indirect contact over 4 months followed by
approximately 2 hours of contact with family members per month for the next 8 months (or for the remainder of the time the
young person was in drug court). MST therapists held Master's degrees and had an average of 5 years of postgraduate treatment
experience.
Participants in the drug court (DC) group received outpatient substance abuse services from the local center of the state's substance
abuse commission. Therapists employed by this center had an average of 10 years of service experience and most held Master's
degrees. "Youths in the DC condition received fewer hours of service than did counterparts in the MST conditions" (2006, p. 48).
Youth in both conditions were supervised by probation or parole officers, involving a minimum of 2 hours of contact per month for
1 year.

Outcomes Outcome measures included youths' self‐reported drug and alcohol use (Form 90), delinquency (SRD), urine drug screens,
externalizing and internalizing problems (CBCL YSR), measures of family functioning, peer relations, and school attendance
(Henggeler & Brady 2008). Administrative data on arrests (from juvenile and adult court records) and days in out‐of‐home
placements were supplemented by information gathered from caregivers in monthly telephone interviews. Composite measures
of out‐of‐home placements included foster care, group homes, residential treatment centers, juvenile justice facilities, and mental
health or substance abuse inpatient facilities (2006, p. 25).

Country USA
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
80 of 192 | LITTELL ET AL.

Funding US National Institute on Alcoholism and Alcohol Abuse, US Substance Abuse and Mental Health Services Administration/Center for
Substance Abuse Treatment, US National Institute on Drug Abuse ($3,986,904, 1999‐2008), and US Agency for Healthcare
Research & Quality

Notes Outcome data are available at 4 and 12 months only, and for only for drug/alcohol use, caregiver CBCL total scores, arrests, and
placements. Data were collected but not reported on family functioning, peer relations, and school attendance outcomes; and on
all primary outcomes at 18, 24, 36, 48, and 60 months postbaseline. Later reports examined self‐reported drug use and
delinquency among siblings of focal youth at 18 months after referral (Rowland et al., 2008), predictors of nonresponse to
juvenile drug court interventions (Halliday‐Boykins et al., 2010), predictive validity of an observer‐rated adherence protocol
(Gillespie et al., 2017), and ethnic differences in resistance in MST (Sayegh et al., 2019).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk Sequence generation methods are not mentioned.
generation
(selection bias)

Allocation concealment Unclear risk After obtaining informed consent, a researcher opened a sealed envelope and informed the family
(selection bias) of their treatment assignment (2006, p. 43). Unclear whether envelopes were numbered or
opaque.

Baseline equivalence High risk At baseline participants in the MST + DC group reported more status offences (d = 0.38), more
crimes against persons (d = 0.36), more marijuana use (d = 0.25) and somewhat more polydrug
use (d = 0.12), but they had fewer prior arrests (3.2 vs. 4.1, d = 0.34), were less likely to have
received mental health treatment in the past (34% vs. 42%, d = 0.18), and their families had
lower median incomes ($15,921 versus $19,474) compared with the DC group at baseline
(2006, 2009).

Performance bias High risk Investigators noted the unreliability of records of DC services, but recorded impressions that
(confounding) "youths in the DC condition received fewer hours of service than did counterparts in the MST
conditions" (2006, p. 48).

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Low risk No attrition; archival data were available on all cases.
Administrative data

Attrition bias Low risk Attrition ranged from 21% to 25%, differential attrition was 3% to 5%.
Participant reports

Intention to treat analysis Low risk "In no cases were available data excluded from the analyses…because a participant dropped out of
treatment, failed to complete a specified number of sessions, or did not otherwise collaborate
with the requirements of the treatment condition to which he or she was assigned" (2006,
p. 46).

Standardized observation Low risk Data were collected at baseline and at 4, 12, 18, 24, 36, 48, and 60 months after baseline.
periods

Validated outcome measures Low risk Use of standardized measures, biologic tests, and archival data.

Selective reporting High risk Funded proposals included plans for follow‐ups at 18 months, 2, 3, 4, and 5 years post recruitment
(Henggeler & Brady 2008). Outcome data are available for 4 and 12 months post recruitment.
Data on a subsample of participants' siblings are available at 18 months (Rowland et al., 2008).
No public reports on main effects at 18 months, 2, 3, 4, or 5 years post recruitment.

Conflicts of interest High risk Dr. Henggeler is a board member and shareholder of MST Services, which licenses and
disseminates MST.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 81 of 192

Leschied 2002

Methods Random assignment to MST or usual services in 4 sites in Ontario from 1997 to 2001. Data on criminal justice outcomes were
obtained at 6 months, 1, 2, and 3 years after the end of treatment; data on psychosocial outcomes were collected before and
after treatment by MST therapists using structured instruments. Site‐level data on study participants and post‐treatment
convictions are available.

Participants 409 juvenile offenders, ages 10 to 18 (average age 14.6 years). Two sites only received referrals from probations, 2 sites received
referrals from probation and community agencies. Overall sample was 74% male and 13% Aboriginal.

Interventions MST services included an average of 34 sessions over 4.9 months in Simcoe County; 53 sessions over 5.4 months in Ottawa. Use of the
Therapist Adherence Measure (TAM) suggested that "fidelity to the model was generally achieved" (2002, p. 17, p.123). Usual services
included case management plans developed by a probation officer and interventions with therapeutic components.

Outcomes Data on prosecutions, convictions, and incarceration were obtained from the Canadian Police Information System (a national
database) at 6 months, 1, 2, and 3 years post discharge. Pre‐ and post‐treatment data on youth attitudes, behaviours, and social
skills; parental supervision and symptoms; and family functioning were obtained in structured interviews conducted by MST
therapists (for both MST and TAU cases).

Country Canada

Funding Canadian Department of Justice, National Crime Prevention Centre

Notes Adherence to MST (TAM scores) was not associated with outcomes in the sample as a whole (2002. p. 125). Estimated costs of MST
were $6,000 to $7,000 (CDN) per case under nonresearch conditions. The government of Ontario paid MST Services Inc
$206,000 CDN for training and supervision over the first 2 years, followed by licensing fees of $6,000 USD per site ($24,000
USD) per year, plus costs of booster training sessions (2002 p. 125).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk Random assignment was mentioned but methods were not described in detail.
generation
(selection bias)

Allocation concealment Unclear risk Random assignment was mentioned but methods were not described in detail.
(selection bias)

Baseline equivalence Unclear risk Insufficient information on group characteristics at baseline to compute Cohen's d.

Performance bias Unclear risk No information on amounts of treatment or contact provided to participants in the two groups.
(confounding)

Detection bias (blinding) Unclear risk Police records were accessed by researchers. No discussion of blinding.
Administrative data

Detection bias (blinding) High risk MST therapists were present when random assignment occurred. Psychosocial data were
Participant reports collected by MST therapists from participants in both groups.

Attrition bias Unclear risk No missing data on main outcomes at 1 year follow‐up. High attrition in 2 and 3 year follow‐ups
Administrative data (42% and 72% missing) with < 2% differential attrition.

Attrition bias High risk Post‐treatment response rates for psychosocial data were higher for MST (71%) versus TAU (52%)
Participant reports (2002, p 167; d = 0.45).

Intention to treat analysis Low risk No systematic exclusion of dropouts or refusers.

Standardized observation Unclear risk Psychosocial data were obtained at discharge (unclear if there were differences in the timing of
periods assessments for MST and control cases). Police records were assessed at 6 months 1, 2, and 3
years post‐treatment.

Validated outcome measures Unclear risk Use of administrative data, some standardised scales (FACES II), and some measures with
unknown reliability/validity.

Selective reporting Unclear risk There is no public protocol for this study. No evidence of selective reporting.

Conflicts of interest Unclear risk There is no conflict of interest statement. Authors were not associated with MST or comparison
treatments.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
82 of 192 | LITTELL ET AL.

Letourneau 2009

Methods Random assignment occurred from 2004 to 2006, and was stratified based on age of victim (child or peer/older youth). Assessments
were conducted at 0, 6, 12, 18, and 24 months post recruitment, in addition to use of administrative data on arrests, and monthly
calls with caregivers to assess service use and out‐of‐home placements.

Participants As part of their probation requirements or a diversion agreement, 131 youth were referred for sex offender treatment in Cook
County, IL, an urban setting. Mean age = 14.6 years; 97.6% male; 44% Caucasian, 54% Black, 31% Hispanic. Four families were
excluded after randomisation: 2 because they were not assigned to the intervention they wanted (MST), 2 because youth had
degenerative brain disorders.

Interventions MST adapted for problem sexual behaviour (PSB) lasted an average of 7.1 months. MST therapists had caseloads of 4‐6 families. PSB
adaptations addressed youth and caregiver denial (and shame) about the sexual offence, the safety of potential victims, and
promotion of age‐appropriate and normative social experiences with peers.
Treatment as usual for juvenile sex offenders (TAU‐JSO) lasted an average of 12.5 months and included weekly hour‐long group
sessions of approximately 8‐10 youth, and referrals for other services. TAU cases had the option of paying for private therapy
instead of participating in TAU‐JSO groups, and 5 families chose to do this.

Outcomes Subsequent arrests, adolescent mental health (internalizing and externalizing behaviors, sexual interests), self‐reported delinquency,
substance use, parenting (discipline, supervision, communication), peer relations, and out‐of‐home placements (including
incarceration, residential treatment, and foster care).

Country USA

Funding US National Institute of Mental Health (NIMH)

Notes 2 year follow‐up results were reported. 10 year follow‐up data are not available (Sheerin 2017).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence generation Unclear risk Randomisation was mentioned, but the sequence generation process was not described.
(selection bias)

Allocation concealment Unclear risk Use of sealed envelopes and separate "randomisation lists" for different blocks of cases; not
(selection bias) clear whether envelopes were opaque, whether they were numbered, whether they were
used sequentially, or how they were used in relation to the randomisation lists (2009, p. 91).

Baseline equivalence Unclear risk No group‐level data were provided on demographic characteritics or prior history (insufficient
information for calculation of d).

Performance bias Unclear risk TAU‐JSO lasted longer than MST, but there was no comparable information on amounts of
(confounding) treatment or contacts provided to the two groups.

Detection bias (blinding) High risk "Data collection was not blind" (2013, p. 3).
Administrative data

Detection bias (blinding) High risk "Data collection was not blind" (2013, p. 3).
Participant reports

Attrition bias Low risk Overall attrition was 5%, with between‐group differences in response rates of 5%.
Administrative data

Attrition bias Low risk Overall attrition was 5% to 11%, with between‐group differences in response rates of 4%
Participant reports to 12%.

Intention to treat analysis High risk Described as an intent‐to‐treat analysis, but the study excluded two families who were randomly
assigned to the control group and withdrew "upon learning they were not randomized to
their desired intervention" (Letourneau et al., 2009, p. 91).

Standardized observation Low risk Data collected at 0, 6, 12, 18, and 24 months post recruitment.
periods

Validated outcome measures Unclear risk Chronbach's α was below .6 for some measures, above .7 for most measures.

Selective reporting High risk There is no public protocol for this study. Outcomes at 18 to 24 months were reported only for
outcomes that favored MST at 12 months (Letourneau et al., 2013). Outcomes for 10.2 year
follow‐up are not available and the abstract states that "MST‐PSB was no more effective
than TAU on most criminal and noncriminal outcomes" (Sheerin 2017).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 83 of 192

Conflicts of interest High risk "Scott W. Henggeler is a board member and stockholder of MST Services LCC, the Medical
University of South Carolina‐licensed organisation that provides training in MST. Charles M.
Borduin is a board member of MST Associates, LLC, the organisation that provides training in
MST for youth with problem sexual behaviors" (2013, p. 10).

Miller 1998

Methods Random assignment of juvenile offenders in the State of Delaware to MST or secure (Level IV) out‐of‐home placements, 1995 to
1997. Criminal justice outcomes were extracted from the state criminal justice information system 18 months after the last
referral.

Participants Serious juvenile offenders (n = 54) eligible for secure out‐of‐home placements. 83% were male; with 61% Black, 32% White, and 7%
Hispanic. Although differences are not statistically significant, there were more Black youth in the MST group (67% vs.
54%, d = 0.22).

Interventions MST services lasted an average of 5.7 months. In order to receive MST instead of secure residential placements, youth in the MST
group had their sentences reduced to Level III (probationary) supervision.
Control cases were sent to secure (Level IV) facilities; by the end of the study all but one youth in the control group had been
released from custody.

Outcomes Data on recidivism (misdemeanor or felony charges or convictions) were obtained from the state's criminal justice information
system; data on service and placement costs were obtained from state contracts.

Country USA

Funding State of Delaware Department of Services for Children, Youth and Their Families, Division of Youth Rehabilitative Services

Notes Two weeks of intensive training and weekly phone consultation were provided by staff at the MUSC (1998, p. 2). With minimal
treatment fidelity and 100% clinical staff turnover in first 2 years, quality assurance efforts were "redoubled" (Henggeler 2002a,
pp. 210‐215). Average costs of MST services were $11,513 USD per case, compared with average costs of $25,850 USD for
secure placements; however, almost 1/3 (9) of MST cases required secure placements following MST services; when those costs
are included, the average cost is $17,388 per MST case.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk Random assignment was conducted with sealed envelopes. Sequence generation methods were
generation not described.
(selection bias)

Allocation concealment Unclear risk Random assignment with sealed envelopes; opacity and sequential numbering of envelopes was
(selection bias) not mentioned.

Baseline equivalence High risk The MST group had relatively more Blacks (67% vs. 54%, d = 0.29), fewer Whites (27% vs. 38%,
d = 0.28), and fewer males (80% vs. 88%, d = 0.31) than the control group.

Performance bias Unclear risk No information on amouts of treatment or contacts provided to participants in the two groups.
(confounding)

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Not applicable


Participant reports

Attrition bias Low risk 0% to 2% overall attrition with 0% to 3% differential attrition.


Administrative data

Attrition bias Not applicable


Participant reports

Intention to treat analysis Unclear risk Analysis includes both program completers and dropouts. Unclear whether analysis includes 3
cases whose assignments to MST were overturned by juvenile officers.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
84 of 192 | LITTELL ET AL.

Standardized observation Unclear risk Standardised observations for 1‐year follow‐up; variable observations for 18 month follow‐up.
periods

Validated outcome measures Low risk Reliance on computerised criminal justice records.

Selective reporting Unclear risk No public protocol is available. No evidence of selective reporting.

Conflicts of interest Unclear risk No conflict of interest statement. No known conflicts.

Ogden 2004

Methods Initially, 100 families of youth with behavioural and mental problems were randomly assigned to MST or usual child welfare services
in 4 sites in Norway. One site was replaced with another site; 4 families dropped out of MST and were replaced with 4 new MST
cases (2004). Data were collected from youth, caregivers, and teachers at intake, 6 months, and 2 years.
One of the remaining four sites was dropped from the study after analysis of data showed that this site had the "poorest results" at 6
and 24 months (2005, p. 27). This omission was not mentioned in the follow‐up report, where the study was described as a
randomised experiment with 3 sites and 75 cases (2006, p. 143).
The 2nd year follow‐up was based on 69 (66%) of the original 104 cases. Teacher reports were obtained for 29 cases (28%) at the 2‐
year follow‐up.

Participants 104 families of youth ages 12‐17 (average age of 15 years) with multiple behavioural and mental health problems, including
emotional disturbance (64%), substance abuse (50%), criminal offences (37%), status offences (53%), harm to self or other (36%),
victim or perpetrator in domestic violence (29%), school expulsion 6%), after care from a residential treatment centre or
incarceration (6%), and abuse or neglect (4%). Participants were referred from municipal Child Welfare services to specialist
services for serious behaviour problems; 15% of youth were not living at home at referral (no information on what proportion
were in MST vs. usual services), 63% of youth were male, and 95% were of Norweigian decent.

Interventions MST services were provided by therapists who had the equivalent of a bachelors or Master's degree. Caseloads were 3‐6 families
per therapist. Services lasted an average 24.3 weeks (range of 7 to 38 weeks) in 3 of the 4 sites.
Usual child welfare services included institutional placements (in 37% of the cases), crisis placements (13%), in‐home supervision by
a social worker (16%), or other home‐based treatment (18%). No information was provided on amounts or duration of services
provided to the cases in the control group. Services were refused by 16% of these cases.

Outcomes Outcomes included young people's internalizing and externalizing behaviour problems (CBCL), self reported delinquency (SRD),
social competence (SCPQ), social skills (SSRS), family functioning (FACES‐III), and out‐of‐home placement (including placements
in foster care, institutions, or hospitals).
Data on out‐of‐home placements were based on parent responses to two questions: where the youth was living 1) at the time of the
assessment and 2) during most of the previous 6 months. Data on placements at 6 months excluded cases that were placed at
intake.
Archival data on arrests are not available, because Norway does not arrest youth who are under 15 years of age and does not
prosecute youth under the age of 18.

Country Norway

Funding Norwegian Ministry of Child and Family Affairs

Notes Of the 100 families randomly assigned, 4 MST cases withdrew and were "replaced" by 4 new MST families (the first 4 cases were
omitted from denominators in reported retention and response rates); another 4 cases dropped out before the 6 month
assessment (1 MST, 3 usual services cases). One site and its 25 cases were omitted from the 2006 report after analysis of data
showed that it had the "poorest MST outcomes" (2005; 2008, p. 24). The principal author declined to share aggregated data on
the 4th site, on the grounds that results might be "misinterpreted" (T. Ogden, personal communication, 24 January 2007).
Family functioning, social competence, and social skills outcomes were not reported in the 2nd year follow‐up. Self‐reported
delinquency outcomes were not reported at 6 months.
A 2007 report included only 2 of the 4 original sites plus 55 new, nonrandomised cases.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence High risk Random assignment was mentioned, but methods of sequence generation were not described.
generation Four MST cases were 'replaced' with new MST cases.
(selection bias)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 85 of 192

Allocation concealment High risk Allocation concealment methods were not described. Unconcealed allocation of 'replaced' cases.
(selection bias)

Baseline equivalence High risk MST caregivers were less likely to be divorced and more likely to be married to someone other
than the child's biological parent (p = .01; 2004, p. 81).

Performance bias Unclear risk No information provided on amounts or duration of services provided to cases in the control
(confounding) group.

Detection bias (blinding) Not applicable


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Not applicable


Administrative data

Attrition bias High risk 28%‐37% attrition, with 2%‐7% differential attrition in reports from youth and caregivers at the 2‐
Participant reports year follow‐up. Teacher reports were available for only 29 of 104 cases (72% missing data).

Intention to treat analysis High risk Exclusion of dropouts and refusers. Replacement of some cases. Exclusion of one site and its 25 cases
after analysis of data showed that this site had the "poorest MST outcomes" (2008, p. 24).

Standardized observation Unclear risk Outcomes were assessed at approximately 6 and 24 months after referral.
periods

Validated outcome measures Unclear risk Use of standardised measures, some with αs < .7 in the study sample. Researchers combined data
from the CBCL social competence scale, the SCPQ, and 10 items from the SSRS scale, but
provided no information on the internal consistency of this composite measure.

Selective reporting High risk No public protocol is available for this study. One site was dropped from the study after outcomes
were known. Six‐month placement data excluded cases with placements at intake. Data on
family functioning (FACES‐III) and youth social competence (3 scales) were reported at 6
months, but not reported at 24 months. Self‐reported delinquency was reported at 24 months,
but not at 6 months.

Conflicts of interest Unclear risk No conflict of interest statement was provided by the authors. Coauthor, Dr. Halliday‐Boykins, is
an associate of MST program developers at the Family Services Research Center at the
Medical University of South Carolina.

Rowland 2005

Methods From 2000 to 2001 youth age 9‐17 with serious mental health problems were randomly assigned to MST (n = 26) or usual services
(n = 29) on the island of Oahu, Hawaii. This project was originally conceived as part of a larger experiment on embedding MST
principles in a broader continuum of care. The project sparked controversy (Rosenblatt et al., 2001) and was terminated early
(Rowland et al., 2005). Outcome data are available on 31 cases (56%) at 6 months post referral. Data included measures of youth
and family functioning obtained from youth and caregivers, and archival data on arrests, out‐of‐home placements, and school
attendance.

Participants Participants were attending public schools and qualified to receive mental health services with a structured Individual Education
Plan. They were living with family members (71% with single parents) and were thought to be at risk of out‐of‐home placement.
In the subsample of 31 youth with outcome data, the average age at intake was 14.5 years; 58% were male; 84% were identified
as multiracial, 10% were White, and 7% were Asian American or Pacific Islander. Most youth (94%) had a DSM‐IV diagnosis, with
an average of 1.8 diagnoses per youth (39% with conduct disorder, 32% bipolar disorder, 23% attention‐deficit, 16% dysthymia,
13% major depression, and 10% PTSD). They had a lifetime average of 2 psychiatric or substance abuse hospitalisations and 7.5
prior arrests. Exclusion criteria included autism, severe developmental disabilities, and sexual offending. Most (71%) of the
families fell below median household income for the state and 42% received public assistance (2005, p. 15).

Interventions MST was adapted for youth with serious emotional disturbances (SED). Clinicians received the standard MST training (initial 5‐day
orientation, ongoing training and supervision, quarterly booster sessions, and weekly phone consultations with MST consultants)
plus training in crisis intervention and clinical treatment of substance abuse, internalizing problems, and borderline personality
traits. Data on all MST cases in Hawaii showed that the average length of MST services was 135 days (4.4 months). Youth
received an average of 12.1 hours of direct contact with MST therapists per month (SD 4.6, range 3.1 to 19.5).
Usual services included case management, care coordination, and an array of services including individual and family therapy,
intensive in‐home services, medication management, therapeutic foster care, group home care, day treatment, therapeutic aides,
and hospital and residential care. Youth in the usual services group spent 40% of their time in out‐of‐home placements and
received an average of 4 hours of community services per month (SD 4.7, range 0 to 16.8).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
86 of 192 | LITTELL ET AL.

Outcomes Research assistants collected data from youth and caregivers on youth mental health symptoms (CBCL, YSR, YRBS), substance use
(PEI), delinquency (SRD), family functioning (FACES‐III), and caregiver social support (SSQ). Archival records were obtain from
state and juvenile justice authorities to identify criminal behaviour. Data on school placements and school attendance were
collected from school records. State agencies provided data on out‐of‐home placements in inpatient, residential, foster care, and
group homes; and juvenile justice authorities provided data on detention and incarceration.

Country USA

Funding Hawaii Department of Health, Child and Adolescent Mental Health Division; the Annie E. Casey Foundation; and the U.S. National
Institute on Drug Abuse

Notes At baseline, young people in the MST group reported more internalizing and externalizing problems, drug use, and delinquency,
compared with youth in the control group.
In 2000‐2001 the average annual cost of MST per youth in Hawaii was $12,162 USD (2001, p. 26).

Risk of bias table

Authors'
Bias judgement Support for judgement.

Random sequence Unclear risk "Youths were randomly assigned to treatment conditions" (2005, p. 14). No description of the
generation sequence generation process.
(selection bias)

Allocation concealment Unclear risk "Youths were randomly assigned to treatment conditions" (2005, p. 14). No description of
(selection bias) allocation concealment.

Baseline equivalence High risk Young people in the MST group had more externalizing problems (for caregiver reports, d = 0.35;
youth reports d = .0.29), more internalizing problems (caregiver d = 0.20, youth d = 0.40), self‐
reported drug use (d = .51), self‐reported minor delinquency (d = 1.08), and index offences
(d = 0.79) compared with youth in the control group.

Performance bias Unclear risk Data on amounts of service provided to the two groups aren't strictly comparable.
(confounding)

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk "Attempts were made to keep research assistants unaware of the participants' treatment
Participant reports conditions (i.e., they were not involved in randomisation and were not informed by the
recruiter as to which condition the family was assigned). Families, however, could have
disclosed information during the assessments that made research assistants aware of
treatment assignment" (2005, p. 15).

Attrition bias High risk 44% attrition overall, 3% differential attrition.


Administrative data

Attrition bias High risk 44% attrition overall, 3% differential attrition.


Participant reports

Intention to treat analysis High risk Exclusion of dropouts.

Standardized observation Low risk Assessments were made at 6 months after referral.
periods

Validated outcome measures Unclear risk Use of standardized scales, some with αs < .7 in this sample, and use of archival data.

Selective reporting High risk There is no public protocol for this study. Cohen's ds were reported only for "analyses with a
significant or marginally significant treatment effect" (2005, p. 20).

Conflicts of interest High risk Dr. Rowland and Dr. Henggeler are board members and shareholders of MST Services, which
licenses and disseminates MST.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 87 of 192

Sundell 2006

Methods Random assignment of youth with conduct disorders to MST (n = 79) or treatment as usual (TAU, n = 77) in Sweden's three largest
cities (Stockholm, Göteborg, and Malmö) and one town (Halmstad) in 2004‐2005. Random assignment was conducted in separate
blocks by site; there were six sites, each served by one MST team. Outcomes were assessed at 7 months, 2 years, and 5 years
after randomisation.

Participants All 156 participants met DSM‐IV‐TR diagnostic criteria for conduct disorder. Most (67%) had been arrested at least once and 32%
had been placed outside of the home during six month period before the study began. Referrals were made by 27 local
authorities within the 4 cities. Exclusion criteria included: sexual offending, autism, psychosis, and suicide risk. Youths were
12–17 years old (average age was 15), 61% were male, 13% had at least one parent who was not born in Sweden, 67% lived in
single‐parent families, and 61% were receiving social welfare grants.

Interventions MST was provided by 6 teams (2 in Sockholm, 2 in Göteborg, 1 each in Malmö and Halmstad). MST therapists had the equivalent of a
bachelor or Master's degree in social work, psychology, or education. MST services lasted an average of 145.8 days (SD 51.6).
TAU services lasted an average of 116.3 days (SD 67.8). Individual counselling was provided by a case manager or a private
counsellor in 26% of the TAU cases, family therapy was provided in 21%; other services included mentorship (16%), out‐of‐home
care (10%), Aggression Replacement Training (5%), addiction treatment (3%), and special education (3%). Thirteen youths (17%)
received no services.

Outcomes Research assistants administered assessments of youth symptoms (CBCL, YSR), sense of coherence (SOC), self‐reported delinquency
(SRD), alcohol and drug use (AUDIT, DUDIT), peer relations (PYS, SCPQ, SSRS), parenting skills, and mother's mental health (SCL‐
90) at 7 months and 2 years after intake. Information on school attendance was obtained from school authorities, data on arrests
were derived from police records, information on service use (including out‐of‐home placements) was obtained from social
service records. Out of home placements included foster care, nursing homes, residential treatment, and institutions.

Country Sweden

Funding Swedish Institute for Evidence‐Based Social Work Practice, National Board of Health and Welfare; Ministry of Health and Social
Affairs; cities of Stockholm, Göteborg, Halmstad, and Malmö.

Notes Data were determined to be missing at random; multiple imputation was performed with the Markov chain Monte Carlo method.
The average cost of MST was $8,847 per youth. Total costs of resource use within 6 months after randomisation were $13,298 per
MST case and $8,260 per TAU case (in 2005 USD; Olsson 2010). There were no significant differences in outcomes between the
MST and TAU services. MST resulted in net costs of $5,038 per case within 6 months of random assignment (Olsson 2010),
44,500 SEK ($6,500) per case over 2 years (Olsson 2010a), and 112,000 SEK ($15,500) per case over 5 years (2014).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk Computer‐generated sequence was created at a remote location by the principal investigator.
generation Randomisation was performed within blocks, with each of the sites serving as a block.
(selection bias)

Allocation concealment Low risk "After research staff received completed instruments from both the youths and parents, research
(selection bias) staff opened a sealed and numbered envelope that contained the results of the computer
generated randomisation for that specific youth. In a central location separate from the data
collection locations, the contents of the sealed envelopes were determined before the referral
process began. The principal investigator was the only member of the research team to have
access to the randomisation sequence" (2008, p. 552).

Baseline equivalence High risk Parents in the MST group had more mental health symptoms at baseline than controls (mean 0.98
vs. 0.75, d = .32). According to their parents, 64% of youth in the MST group had early onset of
behavioural problems (before age 13) compared with 42% of those in the TAU group (d = 0.49).

Performance bias High risk MST cases received more services (mean number of services = 1.46, SD = .73 vs. TAU mean = 1.23,
(confounding) SD = .94; Sundell et al., 2006, d = 0.27) over a longer period of time (mean = 145.7 days, SD =
42.9 vs. TAU mean = 116.3 SD= 67.8; d = 0.52).

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
88 of 192 | LITTELL ET AL.

Attrition bias Low risk 4% to 5% missing data with 0% to 1% differential attrition. Data were judged to be missing at
Administrative data random; multiple imputation was performed with the Markov chain Monte Carlo method
(2008, p. 555).

Attrition bias Low risk 4% to 5% missing data with 0% to 1% differential attrition. Data were judged to be missing at
Participant reports random; multiple imputation was performed with the Markov chain Monte Carlo method
(2008, p. 554–555).

Intention to treat analysis Low risk No systematic exclusions of dropouts or refusers

Standardized observation Low risk Outcomes were collected at 7 months (mean of 212 days, SD 29.2), 2 years (mean 741 days, SD
periods 18), and 5 years after random assignment. There was no significant difference between groups
in the time between the first, second, and third assessments (2008, 2010).

Validated outcome measures Low risk Use of standardised measures with αs > .7 in the study sample.

Selective reporting Unclear risk There is no public protocol for this study. Outcome assessments conducted at baseline and 7
months were repeated at 2 years, with full reporting at both points in time. The 5 year follow‐
up relied solely on registry data. No evidence of selective reporting.

Conflicts of interest Low risk "None of the authors have any financial interest in multisystemic therapy" (2008, p. 550).

Swenson 2010

Methods Random assignment to MST‐CAN or Enhanced Outpatient Treatment (EOT) from 2000 to 2003 in a public mental health centre in
Charleston County, South Carolina. Parent and youth reports on individual and family functioning were obtained on structured
measures at baseline and 2, 4 10, and 16 months post baseline. Data on subsequent reports of child maltreatment and out‐of‐
home placements were obtained from Child Protective Services (CPS) records

Participants Families of 90 youth, ages 10‐17 (average age = 13.9), with a substantiated case of physical child abuse within the past 90 days; 23%
had a prior CPS report. 56% of youth were female, 22% White, 69% Black. Most participating parents (who were subjects of the
abuse report) were female (65%) and single parents (58%)

Interventions MST‐CAN is an adaptation for families of maltreated children. It includes a safety plan for children, flexible duration of services, and
a psychiatrist on the team who can prescribe medication and address psychiatric emergencies (28% of youth received medication
for ADHD, 7% of caregivers received medication for anxiety or depression). MST‐CAN included cognitive‐behavioural therapy
(CBT) for deficits in anger management (provided to 63% of parents and 28% of children) and problem solving skills (provided to
95% of families); and prolonged exposure therapy for PTSD symptoms (provided to 7% of parents). MST staff work closely with
CPS rather than juvenile justice systems. Therapists received 5‐day orientation to MST, additional training for MST‐CAN
adaptations, 4 hours of weekly group supervision, and individual supervision as needed. Families received an average of 88 hours
of treatment (range = 3 to 388 hours) over 2 to 12 months (average 7.6 months). 96% (43/45) completed MST.
EOT includes usual services plus enhanced engagement and parent training interventions and additions. Youth participated in
outpatient (41%), day (12%), and residential (17%) treatment for mental health problems and substance abuse outpatient (7%),
day (2%), and residential (5%) treatment (2010, pp. 500‐501). Therapists made additional efforts to remind parents of upcoming
appointments and reschedule missed appointments, made home visits if families did not have phones, and provided vouchers for
transportation. A structured, 7‐lesson group parent training program (STEP‐TEEN) was provided for all EOT cases. Therapists
received a one‐day training on STEP‐TEEN, weekly 1.5 hour consultation sessions with a supervisor. Families received an average
of 76 hours of treatment (range = 3 to 897 hours) over 1 to 12 months (average 4 months). 78% (35/45) completed STEP‐TEEN.
Therapists for both groups were employees of a public mental health centre.

Outcomes Structured measures of child functioning (CBCL, CBCL‐PTSD, TSCC, SSRS, CDI), parent functioning (GSI, PST), parenting behaviour
(CTS), family functioning (FACES), and social support for parents (ISEL) were obtained at baseline and 2, 4, 10, and 16 months
post baseline. Data on subsequent reports of child maltreatment and out‐of‐home placements were obtained from CPS records.
Data on mental health and substance abuse service utilisation were gathered in monthly phone interviews with parents.

Country USA

Funding US National Institutes of Mental Health

Notes Conflicting information on initial out‐home‐placements: 2009 report states that "26% of youth were in out‐of‐home placements at
the time of referral and 28% were in state protective custody" (p. 5). 2010 report states that youth 2.3% of MST cases and 9.5%
of EOT cases were placed at research enrolment (p. 503; d = 0.83).
Estimated costs of MST‐CAN were $15,961 USD, costs of EOT were $4,414 USD (2018).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 89 of 192

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk "Randomisation was based on a computer‐generated table of random numbers" (2010, p. 499).
generation
(selection bias)

Allocation concealment Unclear risk "After consent, the research assistant opened a sealed envelope and informed the family of the
(selection bias) assigned treatment condition" (2010, p. 499). Unclear whether envelopes were sequentially
numbered or opaque.

Baseline equivalence High risk MST cases include more parents who completed high school (75.0% vs. 64.3%, d = 0.28), fewer
single‐parent families (52.3% vs. 64.3%, d = 0.27), and fewer youth in out‐of‐home placements
at the time of referral (2.3% vs. 9.5%, d = 0.83).

Performance bias High risk Although usual services were enhanced, cases in the MST group received more treatment (88 vs.
(confounding) 76 hours on average) over a longer period of time (average of 7.6 vs. 4 months).

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Low risk 4% attrition overall, 4% differential attrition.


Administrative data

Attrition bias Low risk 8% attrition overall, 7% differential attrition.


Participant reports

Intention to treat analysis High risk Exclusion of 2 families who refused treatment.

Standardized observation Low risk Observations at 2, 4, 10, and 16 months post baseline.
periods

Validated outcome measures Unclear risk Standardised measures, some with αs < .7 in this sample (2010, p. 501–502)

Selective reporting High risk Effect sizes (d) were reported for statistically significant results only (2010, p. 504). Child
Depression Inventory and FACES‐III data were collected but not reported (2005, nd). There is
no public protocol for this study

Conflicts of interest High risk Dr. Swenson provided clinical supervision for MST therapists in this study. Dr. Swenson is a
consultant in development of MST‐CAN programmes through MST Services, LLC. Dr.
Henggeler is a board member and stockholder of MST Services, LLC, which licenses and
disseminates MST

Timmons‐Mitchell 2006

Methods Random assignment of juvenile offenders to MST or treatment as usual (TAU) from 1998 to 2001 in a midwestern USA state.
Outcomes include re‐arrest and measures of youth functioning, assessed using data contained in court records.

Participants Conflicting reports on sample size: 2003b report indicates that 163 youth were randomly assigned, with 82 in MST; 2006 report
states that 105 youth were randomly assigned and 93 completed treatment. Participants were youth with prior felony
convictions and suspended sentences. Average age at referral was 15.1 years, 78% were male, 78% White, 16% Black, and 4%
Hispanic.

Interventions MST services were provided by Master's level therapists with caseloads of 4‐6 families per therapist and 24/7 availability to families.
MST services lasted an average of 145 days (SD = 60, range = 43 to 438 days).
"Less is known about the services [received by youth] who were randomized into the TAU condition" (2006, p. 230). Probation
officers supervised youth in the TAU group and made referrals for mental health services. "Connections between families and
TAU services were sporadic. Many subsequent court hearings centered on the failure of families to seek or attend services to
which they had been referred by the court" (2006, p. 230).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
90 of 192 | LITTELL ET AL.

Outcomes New offences (formal arraignments) were assessed following discharge from treatment for the MST group or at 6 months
postrecruitment for the TAU group. Follow‐up data on criminal charges were collected at 12 and 24 months post random
assignment.
Measures of youth functioning in 8 domains were completed by research assistants, based on data in court records. CAFAS scales:
school/work, home, community, behaviour to others, moods/emotions, substance use, self‐harm behaviour, thinking. at baseline,
after treatment, and 6 months postdischarge for MST cases, and at 6 and 12 months post random assignment for TAU cases.

Country USA

Funding Ohio Office of Criminal Justice Services

Notes Higher proportion of females in MST versus control group (n = 163, 28% vs. 17%, d = 0.34).
Author did not respond to requests for information about discrepancies in reports on sample size.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Low risk "Randomisation was accomplished by having the court administrator flip a coin" (2006. p. 229).
generation
(selection bias)

Allocation concealment Unclear risk "Randomisation was accomplished by having the court administrator flip a coin" (2006. p. 229).
(selection bias)

Baseline equivalence High risk There was a higher proportion of females in the MST group versus the control group (28% vs. 17%,
2003b, pp. 6, 11, d = 0.34, see Table 3).

Performance bias High risk Compared with MST services, TAU services were "sporadic" (2006, p. 230).
(confounding)

Detection bias (blinding) High risk Outcomes data were extracted from court records. "Different raters rated the MST and Control
Administrative data groups at referral and discharge but the same raters rated all at follow‐up. There is good
reason to think that the MST raters may have been underreporting functional impairment"
(2003c).

Detection bias (blinding) Not applicable


Participant reports

Attrition bias High risk 35%‐45% overall attrition, 0%‐6% differential attrition.
Administrative data

Attrition bias Not applicable


Participant reports

Intention to treat analysis High risk Drop‐outs are not included in published data analyses (n = 48 MST, 45 TAU cases; 2006), but their
data do appear in an unpublished report (n = 54 MST, 52 TAU. 2003c).

Standardized observation High risk "Measures were collected at somewhat different time points for the MST and TAU groups" (2006,
periods p. 230).

Validated outcome measures Unclear risk Research assistants used court records to completed CAFAS ratings. The reliability criterion and
results were not reported for this sample (2006 p. 229).

Selective reporting High risk There is no public protocol for this study. Unpublished report shows nonsignificant differences on
2 of 8 CAFAS subscales at Time 3 (2003b); these results are not included in the published
report (2006).

Conflicts of interest Unclear risk There was no conflict of interest statement.


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 91 of 192

Wagner 2019

Methods Fifteen youth with autistic spectrum disorder (ASD) and disruptive behaviours were randomly assigned to MST‐ASD (n = 8) or usual
services (n = 7). Caregivers and youth completed assessments at baseline, 6 months, and 12 months post random assignment.

Participants Participants were between the ages of 10 and 17 (average 13.8); 87% were male; 73% were White, 7% Black, 13% Asian American, and
7% Hispanic. Families were recruited through an academic medical center specialising in diagnosis and treatment of youth with ASD.

Interventions MST, adapted for use with ASD, was provided by graduate students who were enroled in clinical, counselling, or school psychology
doctoral programmes. Therapists had caseloads of two youths.
"The number and characteristics of therapists who delivered services to youths in the [usual services] condition are unknown" (2019,
p. 46). These families were given information about community services, which could include: medication, speech and language
therapy, occupational therapy, individual counselling, behavioural therapy, social skills training, and school‐based interventions.
Families were offered a free psychiatric consultation, but "the number of families who used this service is not known" (p. 46).

Outcomes Caregivers and youth completed assessments at intake, 6 months, and 12 months after random assignment. Analysis was based on
caregiver reports on youth symptoms (BASC‐2), aggression (selected subscale and items from the BPI and AIM), caregiver
psychiatric symptoms (GSI‐BSI), family functioning (FACES‐II), caregiver stress (PSI‐SF), and youth peer relations (MPRI).

Country USA

Funding University of Missouri Research Board Grant

Notes Missing data were imputed using LOCF for cases missing data on all measures at the 12 month assessment, and single (sample mean)
imputation for cases missing data on both the 6 and 12 month assessments.

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk No information on sequence generation


generation
(selection bias)

Allocation concealment Unclear risk Use of sealed envelopes. Unclear whether envelopes were numbered or opaque
(selection bias)

Baseline equivalence Low risk For demographic differences (2019, p. 45), ds range from d = 0.08 (gender) to d = 0.22 (family
incomes over $40,000).

Performance bias Unclear risk No information on duration of service or amount of contact in the MST or usual services groups.
(confounding)

Detection bias (blinding) Not applicable


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Not applicable


Administrative data

Attrition bias High risk At the 6 month assessment, overall attrition was 13%, with 29% differential attrition; at the 12
Participant reports month assessment these figures were 27% and 30% respectively.

Intention to treat analysis Unclear risk Exclusion of dropouts in main analysis. Use of last observation carried forward (LOCF) and
imputation of mean values to estimate missing data.

Standardized observation Low risk Assessments at baseline, 6 months, and 12 months


periods

Validated outcome measures Unclear risk Use of standardised measures, selected subscales, and individual items, some with unreported αs
or test‐retest reliability < 0.7.

Selective reporting Unclear risk There is no public protocol for this study. "Although youths were also asked to complete several
measures, very few youths had the cognitive ability to complete these measures; thus, youth
report data were not analyzed in the present study" (2019, p. 47). Reports are available for
selected subscales (e.g., 1 of 3 MPRI subscales)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
92 of 192 | LITTELL ET AL.

Conflicts of interest High risk Throughout the study, supervision of the MST therapists was provided by Dr. Wagner. Dr. Borduin
is a former board member and shareholder of MST Services, and current board member of
MST Associates, the organisation that provides training in MST for youth with problem sexual
behaviours

Weiss 2013

Methods Random assignment of youth to MST or usual services in a large, southeastern USA city. Participants were recruited from public special
education classrooms (Moderate Intervention Program; MIP) for 7th‐11th grade youth with conduct problems. Repeated measures of
youth and family functioning were obtained in interviews with youth and parents and from teacher reports. Court records were used
to identify subsequent arrests. Schools provided data on grades, attendance, suspensions, and within‐school placement.

Participants Conflicting reports on sample size: 2010 report indicates that 169 youth were randomly assigned and 5 withdrew (2 control, 3
treatment cases; p. 859); subsequent reports state that 164 were randomised. Participants' average age was 14.6 years, 83%
were male, 60% were Black and 40% White. Two‐thirds (68%) of participants were involved with the juvenile justice system
(2013); most (86%) showed symptoms of internalizing problems (anxiety, depression, etc.) as well as externalizing
problems (2015).

Interventions Both groups received MIP classroom services. Control cases received no additional services from the project.
Over a 4 year period, MST services were provided by 8 clinicians, 7 of who held Master's degrees. MST consultants at the Family
Services Research Center (FSRC) of the MUSC led an initial 5 day training workshop, provided quarterly 1.5 day booster training
sessions, and provided weekly telephone supervision. FSRC approved all MST hires (2013, p. 1029). On‐site supervision was
provided on site by a licensed MSW with 15 years experience. MST services lasted an average of 5.2 months. 99% of MST
families received family therapy, 82% received parent training, 95% received individual parent training, 95% received individual
adolescent sessions, and 94% received school‐based interventions from the project that included individualized behaviour
management plans (2013, p. 1029).
Most families (72% of control cases and 77% of MST cases) received some mental health services outside of the project; 64% of
control cases and 74% of MST cases received services from mental health professionals outside of the project.

Outcomes Standardised measures of externalizing behaviour problems (reported by youth, parents, and teachers), self‐reported delinquency
and drug use, family adaptability and cohesion, parenting styles, and parents' mental health symptoms were administered at 3, 6,
and 18 months postbaseline. School information on grades, attendance, and suspensions were obtained at the same points in
time. Data on arrests were obtained from juvenile court and adult court records at 2.5 years postbaseline.

Country USA

Funding US National Institute of Mental Health (NIMH)

Notes Between group differences in race (n = 164): Black youth comprised 56% of MST cases versus 64% of the control group (d = 0.18).
Annual household incomes below $5 K: 4% of MST versus 15% of control (n = 153, d = 0.82). Incomes of $60 K or more: 13% of
MST versus 7% of controls (d = 0.38; 2010). Fewer single parent families (67% vs. 75%, d = 0.22) and more parents with high
school degrees (79% vs. 63%; d = 0.43) in MST versus the control group (2013).
Moderators of treatment effects include youth age, race, nonmarital partner in the home, and parenting factors (Tran et al., 2010,
Weiss et al., 2015).
Valid ns are not reported for specific outcomes; we assumed that valid n = 164 for all court and school data, and valid n = 153 for all
youth, parent, and teacher reports in follow‐ups (2013, p.1030).

Risk of bias table

Authors'
Bias judgement Support for judgement

Random sequence Unclear risk "After the assessment was completed, the interviewer opened a sealed envelope that contained
generation the family's random assignment to the treatment or control condition" (2013, p 1028).
(selection bias) Sequence generation methods were not described.

Allocation concealment Unclear risk Treatment assignments were revealed by research staff by opening sealed envelopes (2013,
(selection bias) p. 1028). Unclear whether envelopes were sequentially numbered or opaque.

Baseline equivalence High risk The MST group included relatively fewer Blacks (56% vs. 64%, d = 0.18), fewer households with
annual incomes below $5,000 (4% vs. 15%, d = 0.82), more households with incomes of $60 K
or more (13% vs. 7%, d = 0.38), and more parents who had graduated from high school (79% vs.
63%, d = 0.43).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 93 of 192

Performance bias High risk MST services were provided in addition to services received by the control group. 99% of MST
(confounding) families received family therapy, 82% received parent training, 95% received individual parent
training, 95% received individual adolescent sessions, and 94% received school‐based
interventions from the project that included individualized behaviour management plans
(2013, p. 1029). 64% of control cases and 74% of MST cases received services from mental
health professionals outside of the project (2013; d = 0.26).

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Administrative data

Detection bias (blinding) Unclear risk No discussion of blinding of assessors.


Participant reports

Attrition bias Low risk No attrition.


Administrative data

Attrition bias Low risk 7% attrition, 6% differential attrition.


Participant reports

Intention to treat analysis Low risk No systematic exclusions.

Standardized observation Low risk Administrative data on arrests obtained at 2.5 years after baseline. Data on school grades, days
periods absent, and days suspended were gathered at 6 and 18 months postbaseline.

Validated outcome measures Low risk Standardised measures were embedded in separate interviews with parents and youth, and
teachers' self‐administered questionnaires.

Selective reporting Unclear risk There is no public protocol for this study. Data collected and not reported: CBCL internalizing
scale (2013), main effects on CRPBI (2015).

Conflicts of interest Unclear risk No statement on conflicts of interests

Footnotes

Characteristics of excluded studies

Study ID Reason for exclusion


Aultman‐Bettridge 2007 Non‐random allocation to treatment

Baglivio 2014 Non‐random allocation to treatment (propensity score matching)

Barnoski 2004 Non‐random allocation to treatment (nonequivalent (waitlist) comparison group)

Barth 2007 Non‐random allocation to treatment (propensity score matching)

Bernstein 2005 No comparison or control group (single group, before‐and‐after study)

Blankestein 2019a No comparison or control group

Blankestein 2019b Non‐random allocation to treatment (MST‐ID compared with MST)

Boonstra 2009 No comparison or control group

Boonstra 2018 No comparison or control group

Boxer 2011 No comparison or control group

Boxer 2017 Non‐random allocation to treatment

Brunk 1987 Participants do not meet age criterion (mean ages of 9.8 and 6.8 years reported for subgroups).

Brunk 2014 No comparison or control group

Bytyci 2017 Not a licensed MST program

Cartwright 2009 Non‐random allocation to treatment

Connell 2016 No comparison or control group

Cunningham & Non‐random allocation to treatment. Nonequivalent comparison groups (2 schools). Intervention consists of MST
Henggeler 2001 combined with two other programs (Bullying Prevention and Project ALERT).

Cunningham 2009 No comparison or control group (single group studies of correlates and predictors)

Curtis 2009 No comparison or control group


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
94 of 192 | LITTELL ET AL.

Davis 2015 No comparison or control group, older youth (ages 17‐20), inelligible treatment (MST‐EA)

Davis 2016 Older youth (ages 18‐21), inelligible treatment (MST‐EA)

Dawe 2001 Not a licensed MST program

DeKraai 2004 No comparison or control group

Dirks‐Linhorst 2004 Not a licensed MST program (combined MST with targeted case managment)

Dopp 2018 Non‐random allocation to treatment

Dousi 2005 Not an evaluation of MST (study of effects of family play for families receiving MST or other intensive family treatments)

Drew 2019 No comparison or control group

Dæhlen 2016 Non‐random allocation to treatment (propensity score matching)

Eeren 2018 Non‐random allocation to treatment (propensity score matching)

Ellis 2003 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Ellis 2004 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Ellis 2005 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Ellis 2006 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Ellis 2010 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Ellis 2012 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Fain 2014 Non‐random allocation to treatment

Franks 2006 No comparison or control group

Gervan 2012 No comparison or control group

Giles 2004 Non‐random allocation to treatment (analysis of gender differences in responses to MST versus day treatment for youth
with conduct disorders)

Grimbos 2009 No control or comparison group

Hebert 2014 No comparison or control group (qualitative interviews with child protection staff)

Hefti 2020 No comparison or control group

Henggeler 1986 Non‐random allocation to treatment. Nonequivalent comparison groups (compares inner‐city delinquent youth who
received MST with delinquent youth in alternative treatment and nondelinquent youth)

Henggeler 2002 No comparison or control group

Holth 2012 Non‐random allocation to treatment

Hurley 2004 Not a licensed MST program

Lange 2017 No comparison or control group

Lee 2013 Non‐random allocation to treatment

Letourneau 2013 Not focused on youth with social, emotional, or behavioral problems (MST‐HealthCare)

Little 2004 Not a licensed MST program

Loftholm 2014 No comparison or control group

Mayfield 2011 Non‐random allocation to treatment. Comparison group created from administrative data.

Mitchell‐Herzfeld 2008 Non‐random allocation to treatment (quasi‐experimental study of MST as aftercare following residential placement for
serious juvenile offenders in New York)

Naar‐King 2009 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Naar‐King 2014 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Nelson 2009 Non‐random allocation to treatment. Longtudinal cohort study.

Ogden 2012 No comparison or control group

Painter 2007 Non‐random allocation to treatment. Nonequivalent comparison groups (mean age of 13.62 for MST group versus 10.15
for TAU group).

Pendley 2002 Not focused on youth with social, emotional, or behavioral problems (MST‐Health Care)

Porter 2016 No comparison or control group


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 95 of 192

Randall 1999 Non‐random allocation to treatment (nonequivalent comparison groups comprised of residents of two neighborhoods)

Rosenblatt 2001 Non‐random allocation to treatment. Nonexperimental study of youth with behavioral disorders and other problems in
Hawaii. (Portions of this report discuss Rowland 2005)

Rovers 2019 No comparison or control group (MST as aftercare)

Rowland 2017 No comparison or control group

Schaeffer 2013 Non‐random allocation to treatment. Inelligible treatment (MST‐BST combines MST‐CAN and Reinforcement Based
Therapy (RBT) for parental substance abuse)

Schoenwald 2003 Non‐random allocation to treatment (quasi‐experimental study of therapist adherence to MST and family outcomes)

Sheidow 2003 Not a licensed MST program (clinic‐based OPTION‐A services)

Sheidow 2017 Older youth (ages 17‐21), ineligible treatment (MST‐EA)

Smith Toles 2004 Non‐random allocation to treatment (quasi‐experimental study of modified MST vs traditional therapy in residential
treatment)

Smith‐Boydston 2014 Non‐random allocation to groups (quasi‐experimental study of MST with and without oversight)

Stambaugh 2007 Non‐random allocation to treatment (quasi‐experimental study of MST versus wrap‐arround services)

Stout 2013 No comparison or control group

Sutphen 1993 No comparison or control group (single group, before‐and‐after study)

Swenson 2012 Inelligible treatment (MST‐BSF combines MST‐CAN for child maltreatment and RBT for adult substance abuse)

ter Beek 2018 Non‐random allocation to treatment

Thomas 2002 No comparison or control group

Timmons‐Mitchell 2005 No comparison or control group (single group, before‐and‐after study)

Tolman 2008 No comparison or control group

Trupin 2011 Non‐random allocation to treatment, inelligible treatment (MST‐FIT)

Vidal 2017 Non‐random allocation to treatment (propensity score matching)

Westin 2014 Non‐random allocation to treatment (comparisons of program participants, refusers, completers, and dropouts)

Footnotes

Characteristics of studies awaiting classification


Schoenwald 2004

Methods Randomised controlled trial

Participants Serious juvenile offenders w/serious emotional disturbance, at risk of out‐of‐home placement. 63 youth reached 6‐month follow‐up,
44 reached 12‐month follow‐up.

Interventions MST‐based continuum of care vs. usual community services

Outcomes Mental health (symptoms), drug/alcohol use, criminal activity, family functioning, school functioning, service utilisation, community‐
based placements, residential placements, costs

Funding Annie E. Casey Foundation

Notes Final report to funder (2004) not available from author. No published/public reports.

Footnotes
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
96 of 192 | LITTELL ET AL.

DATA AND ANALYSES

1. Out‐of‐home placement

Outcome or subgroup Studies Participants Statistical method Effect estimate

1.1 Out‐of‐home placement, 1 year 11 2489 Odds Ratio (IV, Random, 95% CI) 0.67 [0.45, 0.99]

1.1.1 USA, Developer‐involved 7 1267 Odds Ratio (IV, Random, 95% CI) 0.52 [0.32, 0.84]

1.1.2 Non‐USA, Independent 4 1222 Odds Ratio (IV, Random, 95% CI) 1.14 [0.84, 1.55]

1.2 Out‐of‐home placement, 2.5 years 6 648 Odds Ratio (IV, Random, 95% CI) 0.81 [0.55, 1.20]

1.2.1 USA, Developer‐involved 2 280 Odds Ratio (IV, Random, 95% CI) 0.70 [0.28, 1.74]

1.2.2 Non‐USA, Independent 4 368 Odds Ratio (IV, Random, 95% CI) 0.87 [0.54, 1.41]

1.3 Out of home placement, 4‐5 years 2 Odds Ratio (M‐H, Random, 95% CI) Subtotals only

1.3.1 Non‐USA, Independent 2 263 Odds Ratio (M‐H, Random, 95% CI) 0.91 [0.51, 1.62]

1.4 Best Case/Worst Case analysis, 2 1 Odds Ratio (M‐H, Random, 95% CI) No totals
years

1.4.1 Observed 1 Odds Ratio (M‐H, Random, 95% CI) No totals

1.4.2 Best case 1 Odds Ratio (M‐H, Random, 95% CI) No totals

1.4.3 Worst Case 1 Odds Ratio (M‐H, Random, 95% CI) No totals

1.5 Days in out‐of‐home placement, 8 1510 Std. Mean Difference (IV, Random, 95% CI) −0.22 [−0.43, 0.00]
1 year

1.5.1 USA, Developer‐involved 6 666 Std. Mean Difference (IV, Random, 95% CI) −0.43 [−0.66, −0.21]

1.5.2 Non USA, Independent 2 844 Std. Mean Difference (IV, Random, 95% CI) −0.03 [−0.16, 0.11]

1.6 Days in out‐of‐home placement, 2.5 2 280 Std. Mean Difference (IV, Random, 95% CI) −0.35 [−0.72, 0.01]
years

1.6.1 USA, Developer‐involved 1 124 Std. Mean Difference (IV, Random, 95% CI) −0.16 [−0.51, 0.19]

1.6.2 Non‐USA, Independent 1 156 Std. Mean Difference (IV, Random, 95% CI) −0.53 [−0.85, −0.21]

1.7 Days in out‐of‐home placements, 4‐ 2 Std. Mean Difference (IV, Random, 95% CI) Subtotals only
5 years

1.7.1 Non‐USA, Independent 2 762 Std. Mean Difference (IV, Random, 95% CI) 0.05 [−0.10, 0.19]

2. Arrest or conviction

Outcome or subgroup Studies Participants Statistical method Effect estimate

2.1 Arrest or conviction, 1 year 6 1445 Odds Ratio (IV, Random, 95% CI) 0.84 [0.67, 1.06]

2.1.1 USA, Developer‐involved 2 202 Odds Ratio (IV, Random, 95% CI) 0.60 [0.34, 1.07]

2.1.2 USA, Independent 1 53 Odds Ratio (IV, Random, 95% CI) 1.10 [0.37, 3.26]

2.1.3 Non‐USA, Independent 3 1190 Odds Ratio (IV, Random, 95% CI) 0.89 [0.69, 1.16]

2.2 Arrest or conviction, 2.5 years 11 2070 Odds Ratio (IV, Random, 95% CI) 0.97 [0.71, 1.31]

2.2.1 USA, Developer‐involved 3 222 Odds Ratio (IV, Random, 95% CI) 0.43 [0.12, 1.53]

2.2.2 USA, Independent 3 311 Odds Ratio (IV, Random, 95% CI) 0.72 [0.35, 1.50]

2.2.3 Non‐USA, Independent 5 1537 Odds Ratio (IV, Random, 95% CI) 1.27 [1.01, 1.60]

2.3 Arrest or conviction, 4 years 4 1130 Odds Ratio (M‐H, Random, 95% CI) 0.72 [0.26, 2.01]

2.3.1 USA, Developer‐involved 1 176 Odds Ratio (M‐H, Random, 95% CI) 0.14 [0.07, 0.27]

2.3.2 Non‐USA, Independent 3 954 Odds Ratio (M‐H, Random, 95% CI) 1.14 [0.79, 1.63]
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 97 of 192

2.4 Best/Worst Case Analysis, 22.4 years 1 Odds Ratio (M‐H, Random, 95% CI) No totals

2.4.1 Observed 1 Odds Ratio (M‐H, Random, 95% CI) No totals

2.4.2 Best case 1 Odds Ratio (M‐H, Random, 95% CI) No totals

2.4.3 Worst case 1 Odds Ratio (M‐H, Random, 95% CI) No totals

2.5 Number of arrests or convictions, 1 year 8 1715 Std. Mean Difference (IV, Random, −0.19 [−0.37, −0.01]
95% CI)

2.5.1 USA, Developer‐involved 4 433 Std. Mean Difference (IV, Random, −0.19 [−0.38, −0.00]
95% CI)

2.5.2 USA, Independent 1 90 Std. Mean Difference (IV, Random, −0.89 [−1.32, −0.45]
95% CI)

2.5.3 Non‐USA, Independent 3 1192 Std. Mean Difference (IV, Random, −0.05 [−0.22, 0.13]
95% CI)

2.6 Number of arrests or convictions, 2.5 7 1245 Std. Mean Difference (IV, Random, −0.17 [−0.34, −0.00]
years 95% CI)

2.6.1 USA, Developer‐involved 2 140 Std. Mean Difference (IV, Random, −0.29 [−0.99, 0.41]
95% CI)

2.6.2 USA, Independent 1 93 Std. Mean Difference (IV, Random, −0.56 [−0.98, −0.15]
95% CI)

2.6.3 Non‐USA, Independent 4 1012 Std. Mean Difference (IV, Random, −0.07 [−0.20, 0.05]
95% CI)

2.7 Number of arrests or convictions, 4 years 3 847 Std. Mean Difference (IV, Random, 0.00 [0.17, 0.18]
95% CI)

2.7.1 USA, Developer‐involved 1 80 Std. Mean Difference (IV, Random, −0.31 [−0.75, 0.13]
95% CI)

2.7.2 Non‐USA, Independent 2 767 Std. Mean Difference (IV, Random, 0.06 [−0.08, 0.20]
95% CI)

3. Self‐reported delinquency

Outcome or subgroup Studies Participants Statistical method Effect estimate

3.1 Self‐reported delinquency, 1 year 5 1048 Std. Mean Difference (IV, Random, 95% CI) −0.05 [−0.17, 0.07]

3.1.1 USA, Developer‐involved 2 166 Std. Mean Difference (IV, Random, 95% CI) 0.11 [−0.19, 0.41]

3.1.2 USA, Independent 1 153 Std. Mean Difference (IV, Random, 95% CI) ‐0.05 [−0.37, 0.27]

3.1.3 Non‐USA, Independent 2 729 Std. Mean Difference (IV, Random, 95% CI) −0.10 [−0.30, 0.10]

3.2 Self‐reported delinquency, 2.5 years 4 793 Std. Mean Difference (IV, Random, 95% CI) 0.01 [−0.13, 0.15]

3.2.1 USA, Developer‐involved 1 124 Std. Mean Difference (IV, Random, 95% CI) −0.03 [−0.38, 0.32]

3.2.2 Non‐USA, Independent 3 669 Std. Mean Difference (IV, Random, 95% CI) 0.01 [−0.15, 0.17]

3.3 Self‐reported delinquency, 4 years 2 764 Std. Mean Difference (IV, Random, 95% CI) −0.11 [−0.59, 0.37]

3.3.1 USA, Developer‐involved 1 80 Std. Mean Difference (IV, Random, 95% CI) −0.40 [−0.85, 0.04]

3.3.2 Non‐USA, Independent 1 684 Std. Mean Difference (IV, Random, 95% CI) 0.10 [−0.05, 0.25]
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
98 of 192 | LITTELL ET AL.

4. Substance use

Outcome or subgroup Studies Participants Statistical method Effect estimate

4.1 Drug use other than marijuana/ 5 887 Std. Mean Difference (IV, Random, 95% CI) −0.08 [−0.38, 0.23]
alcohol (PEI, CAFAS, polydrug use
scales), 1 year

4.1.1 USA, Developer‐involved 2 168 Std. Mean Difference (IV, Random, 95% CI) −0.18 [−0.76, 0.40]

4.1.2 USA, Independent 2 246 Std. Mean Difference (IV, Random, 95% CI) −0.14 [−1.00, 0.72]

4.1.3 Non‐USA, Independent 1 473 Std. Mean Difference (IV, Random, 95% CI) 0.12 [−0.06, 0.30]

4.2 Drug use other than marijuana/ 2 Std. Mean Difference (IV, Random, 95% CI) Subtotals only
alcohol (PEI, SRD, polydrug use
scales), 2.5 years

4.2.1 Non‐USA, Independent 2 840 Std. Mean Difference (IV, Random, 95% CI) 0.13 [−0.00, 0.27]

4.3 Drug use other than marijuana/ 2 764 Std. Mean Difference (IV, Random, 95% CI) 0.08 [−0.06, 0.22]
alcohol (cocaine, polydrug use scales),
4 years

4.3.1 USA, Developer‐involved 1 80 Std. Mean Difference (IV, Random, 95% CI) −0.03 [−0.47, 0.41]

4.3.2 Non‐USA, Independent 1 684 Std. Mean Difference (IV, Random, 95% CI) 0.10 [−0.05, 0.25]

5. Peer relations

Outcome or subgroup Studies Participants Statistical method Effect estimate

5.1 Peer relations 4 Std. Mean Difference (IV, Random, 95% CI) No totals

5.1.1 USA, Developer‐involved, 1 year 2 Std. Mean Difference (IV, Random, 95% CI) No totals

5.1.2 Non‐USA, Independent, 2.5 years 2 Std. Mean Difference (IV, Random, 95% CI) No totals

5.1.3 Non‐USA, Independent, 4 years 1 Std. Mean Difference (IV, Random, 95% CI) No totals

6. Youth behaviour and symptoms

Outcome or subgroup Studies Participants Statistical method Effect estimate

6.1 Externalising behaviour, youth 5 522 Std. Mean Difference (IV, Random, 95% CI) −0.09 [−0.56, 0.38]
reports, 1 year

6.1.1 USA, Developer‐involved 4 369 Std. Mean Difference (IV, Random, 95% CI) 0.17 [−1.24, 1.58]

6.1.2 USA, Independent 1 153 Std. Mean Difference (IV, Random, 95% CI) −0.04 [−0.36, 0.27]

6.2 Externalising behaviour, youth 3 Std. Mean Difference (IV, Random, 95% CI) Subtotals only
reports, 2.5 years

6.2.1 Independent 3 906 Std. Mean Difference (IV, Random, 95% CI) −0.13 [−0.26, −0.00]

6.3 Externalising behaviour, youth 2 764 Std. Mean Difference (IV, Random, 95% CI) −0.04 [−0.18, 0.10]
reports, 4 years

6.3.1 USA, Developer‐involved 1 80 Std. Mean Difference (IV, Random, 95% CI) 0.16 [−0.28, 0.60]

6.3.2 Non‐USA, Independent 1 684 Std. Mean Difference (IV, Random, 95% CI) −0.06 [−0.21, 0.09]

6.4 Internalising behaviour, youth reports, 4 Std. Mean Difference (IV, Random, 95% CI) Subtotals only
1 year

6.4.1 USA, Developer‐involved 4 369 Std. Mean Difference (IV, Random, 95% CI) 0.06 [−0.84, 0.96]
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 99 of 192

6.5 Internalising behaviour, youth reports, 3 Std. Mean Difference (IV, Random, 95% CI) Subtotals only
2.5 years

6.5.1 Non‐USA, Independent 3 906 Std. Mean Difference (IV, Random, 95% CI) −0.27 [−0.57, 0.03]

6.6 Internalising behaviour, youth reports, 2 764 Std. Mean Difference (IV, Random, 95% CI) 0.02 [−0.12, 0.17]
4 years

6.6.1 USA, Developer‐involved 1 80 Std. Mean Difference (IV, Random, 95% CI) 0.12 [−0.32, 0.56]

6.6.2 Non‐USA, Independent 1 684 Std. Mean Difference (IV, Random, 95% CI) 0.01 [−0.14, 0.16]

7. Parent behavior and symptoms

Outcome or subgroup Studies Participants Statistical method Effect estimate

7.1 Parental mental health problems, 3 548 Std. Mean Difference (IV, Random, 95% CI) −0.20 [−0.38, −0.02]
1 year

7.1.2 USA, Developer‐involved 2 97 Std. Mean Difference (IV, Random, 95% CI) 0.07 [−1.16, 1.30]

7.1.3 Non‐USA, Independent 1 451 Std. Mean Difference (IV, Random, 95% CI) −0.20 [−0.39, −0.02]

7.2 Parental mental health problems, 2.5 2 Std. Mean Difference (IV, Random, 95% CI) Subtotals only
years

7.2.1 Non‐USA, Independent 2 840 Std. Mean Difference (IV, Random, 95% CI) −0.11 [−0.32, 0.10]

7.3 Parental social support, 1 year 2 559 Std. Mean Difference (IV, Random, 95% CI) 0.19 [0.01, 0.37]

7.3.1 USA, Developer‐involved 1 86 Std. Mean Difference (IV, Random, 95% CI) Not estimable

7.3.2 Non‐USA, Independent 1 473 Std. Mean Difference (IV, Random, 95% CI) 0.19 [0.01, 0.37]

8. Family functioning

Outcome or subgroup Studies Participants Statistical method Effect estimate

8.1 FACES Adaptability, parent reports, 1 year 3 320 Std. Mean Difference (IV, Random, 95% CI) −0.04 [−0.35, 0.27]

8.1.1 USA, Developer‐involved 2 167 Std. Mean Difference (IV, Random, 95% CI) 0.29 [−0.95, 1.53]

8.1.2 USA, Independent 1 153 Std. Mean Difference (IV, Random, 95% CI) −0.06 [−0.38, 0.26]

8.2 FACES Cohesion, parent reports, 1 year 4 832 Std. Mean Difference (IV, Random, 95% CI) 0.11 [−0.04, 0.27]

8.2.1 USA, Developer‐involved 2 167 Std. Mean Difference (IV, Random, 95% CI) 0.82 [−0.48, 2.12]

8.2.2 USA, Independent 1 153 Std. Mean Difference (IV, Random, 95% CI) −0.01 [−0.33, 0.31]

8.2.3 Non‐USA, Independent 1 512 Std. Mean Difference (IV, Random, 95% CI) 0.14 [−0.03, 0.31]

9. School outcomes

Outcome or subgroup Studies Participants Statistical method Effect estimate

9.1 School attendance, 1 year 3 Std. Mean Difference (IV, Random, 95% CI) Subtotals only

9.1.1 USA, Developer‐involved 2 274 Std. Mean Difference (IV, Random, 95% CI) Not estimable

9.1.2 USA, Independent 1 164 Std. Mean Difference (IV, Random, 95% CI) 0.09 [−0.21, 0.40]

9.2 School grades, 1 year 1 Std. Mean Difference (IV, Random, 95% CI) Subtotals only

9.2.1 USA, Independent 1 164 Std. Mean Difference (IV, Random, 95% CI) −0.16 [−0.46, 0.15]
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
100 of 192 | LITTELL ET AL.

SO URC ES O F S UPPOR T Implementation and Effectiveness. Proposal, Dossier number: 80‐


81700‐97‐003. The Hague: ZonMw.
Jansen, D. E. M. C., Vermeulen, K. M., Schuurman‐Luinge, A. H., Knorth, E.
Internal sources
J., Buskens, E., & Reijneveld, S. A. (2013). Cost‐effectiveness of Mul-
tisystemic Therapy for adolescents with antisocial behaviour: Study
• Bryn Mawr College, USA. Salary support for Professor Littell protocol of a randomized controlled trial. BMC Public Health,
13(100968562), 369. https://doi.org/10.1186/1471‐2458‐13‐369
Manders, W. A. (2008). Multisystemic Therapy in The Netherlands: Im-
plementation and Effectiveness. https://www.trialregister.nl/
External sources trial/1332
Manders, W. A., Deković, M., Asscher, J. J., van der Laan, P. H., & Prins, P.
• Smith Richardson Foundation, USA. Grant to Bryn Mawr College J. M. (2013a). Psychopathy as Predictor and Moderator of Multi-
systemic Therapy Outcomes among Adolescents Treated for Anti-
to support production of the original version of this review in 2003
social Behavior. Journal of Abnormal Child Psychology, 41(7),
• Institute for Evidence‐Based Social Work Practice (IMS), Sweden. 1121–1132. https://doi.org/10.1007/s10802‐013‐9749‐5
Grant to Bryn Mawr College to support production of the original Vermeulen, K. M., Jansen, D. E. M. C., Knorth, E. J., Buskens, E. &
version of this review in 2003 Reijneveld, S. A. (2017). Cost‐effectiveness of multisystemic therapy
• Danish National Institute of Social Research, Denmark. Grants to versus usual treatment for young people with antisocial problems.
Criminal Behaviour and Mental Health, 27(1), 89–102. https://doi.org/
Bryn Mawr College to support review production, 2004‐2006
10.1002/cbm.1988
• National Institute for Health Research, UK. Grant to the Campbell
Collaboration to support research production, 2019‐2020 B o rdu i n 1 9 90
*Borduin, C. M., Henggeler, S. W., Blaske, D. M., & Stein, R. (1990). Mul-
tisystemic treatment of adolescent sexual offenders. International
Journal of Offender Therapy and Comparative Criminology, 35,
References to studies 105–114. https://doi.org/10.1037/a0013971

B o rdu i n 1 9 95
Included studies
Borduin, C. M., & Henggeler, S. W. (1990). A multisystemic approach to
*Denotes the primary report on an included study. the treatment of serious delinquent behavior. In R. J. McMahon & R.
D. Peters (Eds.), Behavior disorders of adolescence: Research, inter-
vention, and policy in clinical and school settings (pp. 63–80). New
A ss c h e r 2 0 13
York: Plenum Press.
Asscher, J. Personal communication, 14 September 2020. *Borduin, C. M., Mann, B. J., Cone, L. T., Henggeler, S. W., Fucci, B. R.,
*Asscher, J. J., Deković, M., Manders, W. A., Van der Laan, H., Prins, P. J. Blaske, D. M., & Williams, R. A. (1995). Multisystemic treatment of
M., & Dutch MST Cost‐Effectiveness Study Group4. (2013). A ran- serious juvenile offenders: long‐term prevention of criminality and
domized controlled trial of the effectiveness of multisystemic therapy violence. Journal of Consulting and Clinical Psychology, 63, 569–578.
in The Netherlands: Post‐treatment changes and moderator effects. https://doi.org/10.1037/0022‐006X.63.4.569
Journal of Experimental Criminology, 9, 169–187. https://doi.org/10. Henggeler, S. W., Borduin, C. M., Melton, G. B., Mann, B. J., Smith, L. A.,
1007/s11292‐012‐9165‐9 Hall, J. A., Cone, L., & Fucci, B. R. (1991). Effects of multisystemic
Asscher, J. J., Dekovic, M., Manders, W., van der Laan, P., Prins, P., & van therapy on drug use and abuse in serious juvenile offenders: A pro-
Arum, S. (2014). Sustainability of the effects of multisystemic therapy gress report from two outcome studies. Family Dynamics of Addiction
for juvenile delinquents in The Netherlands: effects on delinquency Quarterly, 1(3), 40–51. https://doi.org/10.1023/A:1022373813261
and recidivism. Journal of Experimental Criminology, 10(2), 227–243. Henggeler, S. W., Cunningham, P. B., Pickrel, S. G., Schoenwald, S. K., &
https://doi.org/10.1007/s11292‐013‐9198‐8 Brondino, M. J. (1996). Multisystemic therapy: An effective violence
Asscher, J. J., Deković, M., van der Laan, P. H., Prins, P. J. M., & van prevention approach for serious juvenile offenders. Journal of Ado-
Arum, S. (2007). Implementing randomized experiments in criminal lescence, 19, 47–61.
justice settings: An evaluation of multi‐systemic therapy in the Johnides, B. D. (2015). Long‐term effects of Multisystemic Therapy on care-
Netherlands. Journal of Experimental Criminology, 3(2), 113–129. givers of serious and violent juvenile offenders. Master's thesis. University
https://doi.org/10.1007/s11292‐007‐9028‐y of Missouri‐Columbia. Retrieved from https://pdfs.semanticscholar.org/
Asscher, J. J., Deković, M., van den Akker, A. L., Manders, W. A., Prins, P. J. 75c1/4aafacb6e6c7263652180acf02f0d3e814d6.pdf
M., van der Laan, P. H., & Prinzie, P. (2016). Do personality traits Mann, B. J., Borduin, C. M., Henggeler, S. W., & Blaske, D. M. (1990). An
affect responsiveness of juvenile delinquents to treatment? Journal of investigation of systemic conceptualisations of parent‐child coalitions
Research in Personality, 63, 44–50. https://doi.org/10.1016/j.jrp.2016. and symptom change. Journal of Consulting and Clinical Psychology, 58,
05.004 336–344. https://doi.org/10.1037/0022‐006X.58.3.336
Deković, M. (2005) Multisystemic Therapy in the Netherlands. Power- Sawyer, A. M. (2008). Multisystemic therapy across the lifespan: A 21.9‐
point presentation. year follow‐up to a randomized clinical trial with serious and violent
Dekovic, M., Asscher, J. J., Manders, W. A., Prins, P. J., & van der Laan, P. juvenile offenders. Master's thesis. University of Missouri‐Columbia.
(2012). Within‐intervention change: Mediators of intervention effects Sawyer, A. M., & Borduin, C. M. (2011). Effects of Multisystemic Therapy
during multisystemic therapy. Journal of Consulting and Clinical Psy- across through midlife: A 21.9‐year follow‐up to a randomized clinical
chology, 80(4), 574–587. https://doi.org/10.1037/a0028482 trial with serious and violent juvenile offenders. Journal of Consulting
Deković, M., Bleeker, H., Manders, W., Asscher, J., Van der Laan, P., & and Clinical Psychology, 79(5), 643–652. https://doi.org/10.1037/
Prins, P. (2007). Multisystemic Therapy in The Netherlands: a0024862
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 101 of 192

Schaeffer, C. M. (2000). Moderators and mediators of therapeutic change Fonagy, P., Joffe, H., Fuggle, P., Butler, S., Cottrell, D., Losel, F., Goodyer, I.,
in multisystemic treatment of serious juvenile offenders. Doctoral White, I., Wason, J., Smith, J., Buyford, S., Pilling, S., & Scott, S. (2013).
dissertation. University of Missouri‐Columbia. Research protocol: START (Systemic Therapy for At Risk Teens): A
national randomised controlled trial to evaluate multi‐systemic ther-
Bo r d u i n 20 0 9 apy. https://fundingawards.nihr.ac.uk/award/12/136/70
*Fonagy, P., Butler, S., Cottrell, D., Scott, S., Pilling, S., Eisler, I., Fuggle, P.,
Borduin, C. M., & Dopp, A. R. (2015). Economic impact of multisystemic
Kraam, A., Byford, S., Wason, J., Ellison, R., Simes, E., Ganguli, P.,
therapy with juvenile sexual offenders. Journal of Family Psychology,
Allison, E., & Goodyer, I. M. (2018). Multisystemic therapy versus
29(5), 687–696. https://doi.org/10.1037/fam0000113
management as usual in the treatment of adolescent antisocial be-
Borduin, C. M., & Schaffer, C. M. (2002). Multisystemic treatment of ju-
haviour (START): a pragmatic, randomised controlled, superiority trial.
venile sexual offenders: A progress report. Journal of Psychology and
The Lancet. Psychiatry, 5(2), 119–133. https://doi.org/10.1016/S2215‐
Human Sexuality, 13(3‐4), 25–42. https://doi.org/10.1300/
0366(18)30001‐4
J056v13n03_03
Fonagy, P., Butler, S., Cottrell, D., Scott, S., Pilling, S., Eisler, I., Fuggle, P.,
Borduin, C. M., Schaeffer, C. M., & Heiblum, N. (2000). Multisystemic
Kraam, A., Byford, S., Wason, J., Ellison, R., Simes, E., Ganguli, P.,
treatment of juvenile sexual offenders: Instrumental and ultimate
Allison, E., & Goodyer, I. M. (2018a). Multisystemic therapy versus
outcomes. Unpublished manuscript.
management as usual in the treatment of adolescent antisocial be-
*Borduin, C. M., Schaeffer, C. M., & Heiblum, N. (2009). A randomized
haviour (START): a pragmatic, randomised controlled, superiority trial:
clinical trial of Multisystemic Therapy with juvenile sexual offenders:
Supplementary appendix. The Lancet Psychiatry, 5(Suppl), 119–133.
Effects on youth social ecology and criminal activity. Journal of Con-
https://doi.org/10.1016/S2215‐0366(18)30001‐4
sulting and Clinical Psychology, 77(1), 26–37. https://doi.org/10.1037/
Fonagy, P., Butler, S., Cottrell, D., Scott, S., Pilling, S., Eisler, I., Fuggle, P.,
a0013035
Kraam, A., Byford, S., Wason, J., Smith, J. A., Anokhina, A., Ellison, R.,
Simes, E., Ganguli, P., Allison, E., & Goodyer, I. M. (2020). Multi-
Bu t l e r 2 0 11 systemic therapy versus management as usual in the treatment of
Butler, S. (2010). UK evaluations of MST: Brandon Centre and START adolescent antisocial behaviour (START): 5‐year follow‐up of a prag-
trials. Retrieved from http://www.nmhdu.org.uk/news/multi‐systemic‐ matic, randomised, controlled, superiority trial. The Lancet Psychiatry,
therapy‐new‐therapy‐brings‐results‐for‐troubled‐young‐people/ 7(5), 420–430. https://doi.org/10.1016/S2215‐0366(20)30131‐0
Butler, S. Unpublished data. Personal communication, 31 August 2020 Fonagy, P., Butler, S., Cottrell, D., Scott, S., Pilling, S., Eisler, I., Fuggle, P.,
and 15 September 2020. Kraam, A., Byford, S., Wason, J., Smith, J. A., Anokhina, A., Ellison, R.,
*Butler, S., Baruch, G., Hickey, N., & Fonagy, P. (2011). A randomized Simes, E., Ganguli, P., Allison, E., & Goodyer, I. M. (2020a). Multi-
controlled trial of multisystemic therapy and a statutory therapeutic systemic therapy versus management as usual in the treatment of
intervention for young offenders. Journal of the American Academy of adolescent antisocial behaviour (START): 5‐year follow‐up of a prag-
Child and Adolescent Psychiatry, 50(12), 1220–1235.e2. https://doi.org/ matic, randomised, controlled, superiority trial: Supplementary ap-
10.1016/j.jaac.2011.09.017 pendix. The Lancet Psychiatry, 7(Suppl), 420–430. https://doi.org/10.
Cary, M., Butler, S., Baruch, G., Hickey, N., & Byford, S. (2013). Economic 1016/S2215‐0366(20)30131‐0
evaluation of multisystemic therapy for young people at risk for Fonagy, P., Butler, S., Goodyer, I., Cottrell, D., Scott, S., Pilling, S., Eisler, I.,
continuing criminal activity in the UK. PLoS One, 8(4), e61070. https:// Fuggle, P., Kraam, A., Byford, S., Wason, J., & Haley, R. (2013). Eva-
doi.org/10.1371/journal.pone.0061070 luation of multisystemic therapy pilot services in the Systemic Ther-
The Brandon Centre, London. (2012). A Trial of Multisystemic Therapy in apy for At Risk Teens (START) trial: study protocol for a randomized
UK a Statutory Therapeutic Intervention for Young Offenders. controlled trial. Trials, 14(1), 265. https://doi.org/10.1186/1745‐6215‐
https://Clinicaltrials.Gov/Show/NCT01713088 14‐265
Tighe, A., Pistrang, N., Casdagli, L., Baruch, G., & Butler, S. (2012). Multi-
systemic therapy for young offenders: families’ experiences of ther- G l i s s o n 20 1 0
apeutic processes and outcomes. Journal of Family Psychology, 26(2),
Glisson, C. Personal communication, 8 July 2020 and 17 August 2020.
187–197. https://doi.org/10.1037/a0027120
Glisson, C., & Schoenwald, S. K. (2005). The ARC organizational and
community intervention strategy for implementing evidence‐based
Fonagy 2017 children's mental health treatments. Mental Health Services Research,
*Fonagy, P., Butler, S., Baly, A., Seto, M., Anokhina, A., Kaminsak, K., & 7(4), 243–259. https://doi.org/10.1007/s11020‐005‐7456‐1
Ellison, R. (2017). Evaluation of Multisystemic Therapy for adolescent *Glisson, C., Schoenwald, S. K., Hemmelgarn, A., Green, P., Dukes, D.,
problematic sexual behavior. Children's Social Care Innovation Pro- Armstrong, K. S., & Chapman, J. E. (2010). Randomized trial of MST
gramme Evaluation Report. https://www.gov.uk/government/ and ARC in a two‐level evidence‐based treatment implementation
publications/multisystemic‐therapy‐for‐adolescent‐problematic‐ strategy. Journal of Consulting and Clinical Psychology, 78(4),
sexual‐behaviour 537–550. https://doi.org/10.1037/a0019160
Fonagy, P., Butler, S., Baruch, G., Byford, S., & Seto, M. C. (2015). Eva-
luation of multisystemic therapy pilot services in Services for Teens He nggel er 1 99 2
Engaging in Problem Sexual Behaviour (STEPS‐B): Study protocol for a
Henggeler, S. W., Borduin, C. M., Melton, G. B., Mann, B. J., Smith, L. A.,
randomized controlled trial. Trials, 16, 492. https://doi.org/10.1186/
Hall, J. A., Cone, L., & Fucci, B. R. (1991). Effects of multisystemic
s13063‐015‐1017‐2
therapy on drug use and abuse in serious juvenile offenders: A pro-
gress report from two outcome studies. Family Dynamics of Addiction
Fonagy 2018 Quarterly, 1(3), 40–51.
Fonagy, P. (2009). START (Systemic Therapy for At Risk Teens): a national Henggeler, S. W., Cunningham, P. B., Pickrel, S. G., Schoenwald, S. K., &
randomised controlled trial to evaluate multisystemic therapy in the Brondino, M. J. (1996). Multisystemic therapy: An effective violent
UK context. https://www.isrctn.com/ISRCTN77132214. [https://doi. prevention approach for serious juvenile offenders. Journal of Ado-
org/10.1186/ISRCTN77132214] lescence, 19, 47–61. https://doi.org/10.1006/jado.1996.0005
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
102 of 192 | LITTELL ET AL.

*Henggeler, S. W., Melton, G. B., & Smith, L. A. (1992). Family preservation Hen g g e l e r 19 99 b
using multisystemic therapy: An effective alternative to incarcerating *Henggeler, S. W., Rowland, M. D., Halliday‐Boykins, C., Sheidow, A. J.,
serious juvenile offenders. Journal of Consulting and Clinical Psychology, Ward, D. M., Randall, J., Pickrel, S. C., Cunningham, P. B., & Edwards, J.
60(6), 953–961. https://doi.org/10.1037/0022‐006X.60.6.953 (2003). One‐year follow‐up of multisystemic therapy as an alternative
Henggeler, S. W., Melton, G. B., Smith, L. A., Schoenwald, S. K., & Hanley, J. to the hospitalization of youths in psychiatric crisis. Journal of the
H. (1993). Family preservation using using multisystemic treatment: American Academy of Child and Adolescent Psychiatry, 42(5), 543–551.
Long tem follow‐up to a clinical trial with serious juvenile offenders. https://doi.org/10.1097/01.CHI.0000046834.09750.5F
Journal of child and family studies, 2(4), 283–293. https://doi.org/10. Henggeler, S. W., Rowland, M. D., Pickrel, S. G., Miller, S. L., Cunningham, P.
1007/BF01321226 B., Santos, A. B., Schoenwald, S. K., Randall, J., & Edwards, J. E. (1997).
Investigating family‐based alternatives to institution‐based mental
H e n g g e l e r 19 97 health services for youth: Lessons learned from the pilot study of a
*Henggeler, S. W., Melton, G. B., Brondino, M. J., Scherer, D. G., & randomized field trial. Journal of Clinical Child Psychology, 26(3), 226–233.
Hanley, J. H. (1997). Multisystemic therapy with violent and chronic https://doi.org/10.1207/s15374424jccp2603_1
juvenile offenders and their families: The role of treatment fidelity in Henggeler, S. W., Rowland, M. D., & Randall, J. (1999b). Home‐based
successful dissemination. Journal of Consulting and Clinical Psychology, multisystemic therapy as an alternative to the hospitalization of
65, 821–833. https://doi.org/10.1037/0022‐006X.65.5.821 youths in psychiatric crisis: clinical outcomes. Journal of the American
Huey, S. J., Henggeler, S. W., & Brondino, M. J. (2000). Mechanisms of Academy of Child and Adolescent Psychiatry, 38(11), 1331–1339.
change in multisystemic therapy: reducing delinquent behavior https://doi.org/10.1097/00004583‐199911000‐00006
through therapist adherence and improved family and peer func- Huey, S. J., Henggeler, S. W., Rowland, M. D., Halliday‐Boykins, C. A.,
tioning. Journal of Consulting and Clinical Psychology, 68(3), Cunningham, P. B., Pickrel, S. G., & Edwards, J. (2004). Multisystemic
451–467. https://doi.org/10.1037/0022‐006X.68.3.451 therapy effects on attempted suicide by youths presenting psychiatric
Scherer, D. G., Brondino, M. J., Henggeler, S. W., Melton, G. B., & Hanley, J. emergencies. Journal of the American Academy of Child and Adolescent
H. (1994). Multisystemic family preservation therapy: Preliminary Psychiatry, 43(2), 183–190. https://doi.org/10.1097/01.chi.000
findings from a study of rural and minority serious adolescent of- 0101700.15837.f3
fenders. Journal of Emotional and Behavioral Disorders, 2, 198–206. Rowland, M. D. (2004, March 1). Follow‐up of Multisystemic Therapy
https://doi.org/10.1177/106342669400200402 (MST) as an alternative to hospitalization. Presented at the 17th
Annual RTC Conference, Tampa, FL. Retrieved from http://rtckids.
fmhi.usf.edu/rtcconference/handouts/pdf/17/Session%2019/
Hen g g e l e r 19 99 a
Rowland.pdf 2004, March 1.
Brown, T. L., Henggeler, S. W., Schoenwald, S. K., Brondino, M. J., & Schoenwald, S. K., Ward, D. M., Henggeler, S. W., & Rowland, M. D. (2000).
Pickrel, S. G. (1999). Multisystemic treatment of substance abusing Multisystemic Therapy Versus Hospitalization for Crisis Stabilization of
and dependent juvenile delinquents: effects on school attendance at Youth: Placement Outcomes 4 Months Postreferral. Mental Health Services
posttreatment and 6‐month follow‐up. Children's Services: Social Policy, Research, 2(1), 3–12. https://doi.org/10.1023/a:1010187706952
Research, and Practice, 2(2), 81–93. https://doi.org/10.1207/
s15326918cs0202_2
He nggel er 2 00 6
Cunningham, P. B., Henggeler, S. W., Brondino, M. J., & Pickrel, G. G.
(1999). Testing underlying assumptions of the family empowerment Gillespie, M. L., Huey, S. J., & Cunningham, P. B. (2017). Predictive validity
perspective. Journal of child and family studies, 8(4), 437–449. https:// of an observer‐rated adherence protocol for multisystemic therapy
doi.org/10.1023/A:1021951720298 with juvenile drug offenders. Journal of Substance Abuse Treatment, 76,
Henggeler, S. W., Clingempeel, W. G., Brondino, M. J., & Pickrel, S. G. 1–10. https://doi.org/10.1016/j.jsat.2017.01.001
(2002). Four‐year follow‐up of multisystemic therapy with substance‐ Halliday‐Boykins, C. A., Schaeffer, C. M., Henggeler, S. W., Chapman, J. E.,
abusing and substance‐dependent juvenile offenders. Journal of the Cunningham, P. B., Randall, J., & Shapiro, S. B. (2010). Predicting
American Academy of Child and Adolescent Psychiatry, 41(7), nonresponse to juvenile drug court interventions. Journal of Substance
868–874. https://doi.org/10.1097/00004583‐200207000‐00021 Abuse Treatment, 39(4), 318–328. https://doi.org/10.1016/j.jsat.2010.
*Henggeler, S. W., Pickrel, S. G., & Brondino, M. J. (1999a). Multi- 07.011
systemic Treatment of Substance‐Abusing and ‐Dependent Delin- Henggeler, S. W., & Brady, T. M. (2008). Substance abusing delinquents: 5‐
quents: Outcomes, Treatment Fidelity, and Transportability. Mental year outcomes of RCT. NIH RePORTER project information. https://
Health Services Research, 01(3), 171–184. https://doi.org/10.1023/ projectreporter.nih.gov/project_info_description.cfm?aid=7491569&
A:1022373813261 icde=49864713
Henggeler, S. W., Pickrel, S. G., Brondino, M. J., & Crouch, J. L. (1996). *Henggeler, S. W., Halliday‐Boykins, C. A., Cunningham, P. B., Randall, J.,
Eliminating (almost) treatment dropout of substance abusing or de- Shapiro, S. B., & Chapman, J. E. (2006). Juvenile drug court: Enhancing
pendent delinquents through home‐based multisystemic therapy. outcomes by integrating evidence‐based treatments. Journal of Con-
American Journal of Psychiatry, 153, 427–428. https://doi.org/10.1176/ sulting and Clinical Psychology, 74(1), 42–54. https://doi.org/10.1037/
ajp.153.3.427 0022‐006X.74.1.42
Huey, S. J., Henggeler, S. W., & Brondino, M. J. (2000). Mechanisms of McCollister, K. E., French, M. T., Sheidow, A. J., Henggeler, S. W., &
change in multisystemic therapy: reducing delinquent behavior Halliday‐Boykins, C. A. (2009). Estimating the differential costs of
through therapist adherence and improved family and peer func- criminal activity for juvenile drug court participants: challenges and
tioning. Journal of Consulting and Clinical Psychology, 68(3), recommendations. Journal of Behavioral Health Services & Research,
451–467. https://doi.org/10.1037/0022‐006X.68.3.451 36(1), 111–126. https://doi.org/10.1007/s11414‐007‐9094‐y
Schoenwald, S. K., Ward, D. M., & Henggeler, S. W. (1996). Multisystemic Rowland, M. D., Chapman, J. E., & Henggeler, S. W. (2008). Sibling Out-
therapy treatment of substance abusing or dependent adolescent comes from a Randomized Trial of Evidence‐Based Treatments with
offenders: costs of reducing incarceration, inpatient, and residential Substance Abusing Juvenile Offenders. Journal of child & adolescent
placement. Journal of child and family studies, 5, 431–444. https://doi. substance abuse, 17(3), 11–26. https://doi.org/10.1080/154706508
org/10.1007/BF02233864 02071622
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 103 of 192

Sayegh, C. S., Hall‐Clark, B. N., McDaniel, D. D., Halliday‐Boykins, C. A., Ogden 2 00 4


Cunningham, P. B., & Huey, S. J. J. (2019). A Preliminary Investigation Ogden, T. Personal communication, 24 January 2007.
of Ethnic Differences in Resistance in Multisystemic Therapy. Journal Ogden, T., Amlund‐Hagen, K., & Sørlie, M.‐A. (2008). Implementing and
of Clinical Child and Adolescent Psychology, 48(sup1), S13–S23. https:// evaluating evidence‐based programs targeting conduct problems in
doi.org/10.1080/15374416.2016.1157754 Norwegian children and youth. 21st Annual Research Conference, A
Sheidow, A. J., Jayawardhana, J., Bradford, D. W., Henggeler, S. W., & System of Care for Children's Mental Health: Expanding the Research
Shapiro, S. B. (2012). Money Matters: Cost‐Effectiveness of Juvenile Base, Tampa, FL.
Drug Court with and without Evidence‐Based Treatments. Journal of Ogden, T., & Hagen, K. A. (2006). Multisystemic therapy of serious be-
child & adolescent substance abuse, 21(1), 69–90. https://doi.org/10. havior problems in youth: Sustainability of therapy effectiveness two
1080/1067828X.2012.636701 years after intake. Child and Adolescent Mental Health, 11(3), 142–149.
https://doi.org/10.1111/j.1475‐3588.2006.00396.x
Le sch i e d 2 0 02 Ogden, T., Hagen, K. A., & Anderson, O. (2007). Sustainability of the ef-
Cunningham, A. (2002). One step forward: Lessons learned from a rando- fectiveness of a programme of multisystemic treatment (MST) across
mized study of Multisystemic Therapy in Canada. PRAXIS: Research participant groups in the second year of operation. Journal of Chil-
from the Centre for Children & Families in the Justice System, 1–32. dren's Services, 2(3), 4–14. https://doi.org/10.1108/17466660
Cunningham, A. Personal communication, 12 March 2004. 200700022
Leschied, A. W., & Cunningham, A. (1999). Clinical trials of multisystemic *Ogden, T., & Halliday‐Boykins, C. (2004). Multisystemic treatment of
therapy in Ontario: Rationale and current status of a community‐ antisocial adolescents in Norway: Replication of clinical outcomes
based alternative for high‐risk young offenders. Forum on Corrections outside of the US. Journal of Child and Adolescent Mental Health, 9,
Research, 11, 25–29. 77–83. https://doi.org/10.1111/j.1475‐3588.2004.00085.x
Leschied, A. W., & Cunningham, A. (2001, April). Clinical trials of multi- Sørlie, M. ‐A., & Ogden, T. (2005). Multisystemic intervention. Excellence
systemic therapy 1997 to 2001: Evaluation update report. Centre for in Special Education: Time to Move On. Västerås, Sweden: Mälardalen
Children & Families in the Justice System, London Family Court. University.
*Leschied, A. W., & Cunningham, A. (2002). Seeking effective interven-
tions for young offenders: Interim results of a four‐year randomized Rowl and 2 005
study of Multisystemic Therapy in Ontario, Canada. Centre for Chil- Rosenblatt, A., Deuel, L.‐L., Mak, W., Thornton, P., Baize, H., Morea, J., &
dren & Families in the Justice System, London Family Court, London, Smucker, S. (2001). Evaluation of two therapeutic programs for chil-
Ontario, Canada. dren with serious mental health problems and their families: Home‐
Oakley, T. L. (2000). Multisystemic Therapy for high‐risk young offenders: based mutlisystemic therapy (MST) and the MST continuum of care.
An exploration of school outcomes. Master of Science thesis. The San Francisco, CA: University of California San Francisco, Child Ser-
University of Guelph. http://www.collectionscanada.gc.ca/obj/s4/f2/ vices Research Group.
dsk3/ftp04/MQ56357.pdf Rowland, M. D., Halliday‐Boykins, C. A., Henggeler, S. W., Cunningham, P.
B., Lee, T. G., Kruesi, M. J. P., & Shapiro, S. B. (2005). A randomized
Le tourne au 2009 trial of Multisystemic Therapy with Hawaii's Felix Class Youths.
Henngeler, S. W., Letourneau, E. J., Chapman, J. E., Borduin, C. M., Journal of Emotional and Behavioral Disorders, 13(1), 13–23. https://doi.
Schewe, P. A., & McCart, M. R. (2009). Mediators of change for org/10.1177/10634266050130010201
Multisystemic Therapy with juvenile sexual offenders. Journal of
Consulting and Clinical Psychology, 77(3), 451–462. https://doi.org/10. Sundell 2006
1037/a0013971 Löfholm, C. A., Olsson, T., Sundell, K., & Hansson, K. (2009). Multisystemic
Letourneau, E. J., Henggeler, S. W., Borduin, C. M., Schewe, P. A., Therapy with conduct disordered youngpeople: Stability of treatment
McCart, M. R., Chapman, J. E., & Saldana, L. (2008). Effectiveness of outcomes two years after intake. Evidence & Policy, 5(4), 373–397.
MST with juvenile sex offenders: 1‐year outcomes. Presented at the Löfholm, C. A., Sundell, K., Hansson, K., & Olsson, T. (2009). The trans-
21st Annual RTC Conference, Tampa, FL. Retrieved from http:// portability of MST to Sweden: A two‐year follow‐up of a randomized
rtckids.fmhi.usf.edu/rtcconference/handouts/pdf/21/Session%2047/ controlled trial of conduct disordered youth. Campbell Collaboration
letourneau.pdf Colloquium presentation, Oslo, Norway.
*Letourneau, E. J., Henggeler, S. W., Borduin, C. M., Schewe, P. A., Olsson, T. (2008). Crossing the quality chasm? The short‐term effective-
McCart, M. R., Chapman, J. E., & Saldana, L. (2009). Multisystemic ness and efficiency of MST in Sweden: An example of evidence‐based
Therapy for juvenile sexual offenders: 1‐year results from a rando- practice applied to social work. Unpublished dissertation. Lunds
mized effectiveness trial. Journal of Family Psychology, 23(1), University, Sweden.
89–102. https://doi.org/10.1037/a0014352 Olsson, T. M. (2010). Intervening in youth problem behavior in Sweden: a
Letourneau, E. J., Henggeler, S. W., McCart, M. R., Borduin, C. M., pragmatic cost analysis of MST from a randomized trial with conduct
Schewe, P. A., & Armstrong, K. S. (2013). Two‐year follow‐up of a disordered youth. International Journal of Social Welfare, 10(2),
randomized effectiveness trial evaluating MST for juveniles who 194–205. https://doi.org/10.1111/j.1468‐2397.2009.00653.x
sexually offend. Journal of Family Psychology, 27(6), 978–985. https:// Olsson, T. M. (2010a). MST with conduct disordered youth in Sweden:
doi.org/10.1037/a0034710 Costs and benefits after 2 years. Research on Social Work Practice,
Sheerin, K. M. (2017). Multisystemic therapy with juvenile sexual offenders: 20(6), 561–571. https://doi.org/10.1177/1049731509339028
A 10.2‐year follow‐up to a randomized effectiveness trial. Unpublished Socialstyrelsen. (2014). Evaluation of Multisystemic therapy for young
dissertation. University of Missouri‐‐Columbia [full text not available]. people with serious behavior problems: Results after five years [Ut-
värdering av Multisystemisk terapi för ungdomar med allvarliga be-
Mi ller 1998 teendeproblem: Resultat efter fem år]. Socialstyrelsen. https://www.
*Miller, M. L. (1998). The Multisystemic Therapy Pilot Program: Final socialstyrelsen.se/globalassets/sharepoint‐dokument/artikelkatalog/
evaluation. Wilmington, Delaware: Evaluation Research and Planning. statistik/2014‐11‐20.pdf
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
104 of 192 | LITTELL ET AL.

Sundell, K., Hansson, K., Löfholm, C. A., Olsson, T., Gustle, L.‐H., & Tran, N. T., Weiss, B., Han, S. S., Harris, V. S., Catron, T., Ngo, V. K., & Caron, A.
Kadesjö, C. (2006). The transportability of MST to Sweden: Short‐ (2010). Moderators of the effectiveness of Multisystemic Therapy out-
term results from a randomized trial with conduct disorder youth. come for adolescents with severe conduct problems. Unpublished paper.
Unpublished report. Weiss, B., Catron, T., Han, S., Harris, V., Caron, A., & Ngo, V. (2005). An
*Sundell, K., Hansson, K., Löfholm, C. A., Olsson, T., Gustle, L. ‐H., & independent evaluation of Multisystemic Therapy (MST). Retrieved 5
Kadesjö, C. (2008). The transportability of MST to Sweden: Short‐term October 2006, from http://rtckids.fmhi.usf.edu/rtcconference/
results from a randomized trial with conduct disorder youth. Journal of handouts/pdf/18/Session/Catron.pdf
Family Psychology, 22(3), 550–560. https://doi.org/10.1037/a0012790 Weiss, B., Han, S., Harris, V., Catron, T., Ngo, V. K., & Caron, A. (2010). An
independent evaluation of the MST treatment program. Unpublished
Swe ns o n 20 1 0 paper.
*Weiss, B., Han, S., Harris, V., Catron, T., Ngo, V. K., Caron, A., Gallop, R., &
Dopp, A. R., Schaeffer, C. M., Swenson, C. C., & Powell, J. S. (2018). Economic
Guth, C. (2013). An independent randomized clinical trial of multi-
Impact of Multisystemic Therapy for Child Abuse and Neglect. Admin-
systemic therapy with non‐court‐referred adolescents with serious
istration and Policy in Mental Health and Mental Health Services Research,
conduct problems. Journal of Consulting and Clinical Psychology, 81(6),
45(6), 876–887. https://doi.org/10.1007/s10488‐018‐0870‐1
1027–1039. https://doi.org/10.1037/a0033928
Swenson, C. C. (2005). MST for child abuse and neglect: What do we know
Weiss, B., Han, S. S., Tran, N. T., Gallop, R., & Ngo, V. K. (2015). Test of
and where are we headed. Powerpoint presentation. Retrieved from.
“Facilitation” vs. “Proximal Process” Moderator Models for the Effects
http://www.bvs.is/?m=13&ser=261
of Multisystemic Therapy on Adolescents with Severe Conduct Pro-
Swenson, C. C. (2009). MST for child abuse and neglect (MST‐CAN): An
blem. Journal of Abnormal Child Psychology, 43(5), 971–983. https://doi.
update. Unpublished paper.
org/10.1007/s10802‐014‐9952‐z
Swenson, C. C., Saldana, L., Joyner, C. D., & Henggeler, S. W. (nd). Ecolo-
gical treatment for parent‐to‐child violence. Unpublished paper. Re-
trieved from www.bvs.is/files/file278.pdf Excluded studies
*Swenson, C. C., Schaeffer, C. M., Henggeler, S. W., Faldowski, R., Saldana, L.,
& Mayhew, A. (2010). Multisystemic Therapy for child abuse and neglect: Au l tma n ‐ B e tt rid g e 2 0 07
a randomized controlled effectiveness trial. Journal of Family Psychology, Aultman‐Bettridge, T. (2007). A gender‐specific analysis of community‐
24, 497–507. https://doi.org/10.1037/a0020324 based juvenile justice reform: The effectiveness of family therapy
programs for delinquent girls (Washington). Dissertation Abstracts In-
Tim m o n s ‐ Mi tch e l l 2 0 06 ternational, A: The Humanities and Social Sciences, 68(05), 2179.
Timmons‐Mitchell, J. Personal communication, 20 October 2003c.
Timmons‐Mitchell, J., Bender, M., Kishna, M., & Mitchell, C. (2003a). CA- Bagliv io 2014
FAS Scores measure MST treatment success. Unpublished Power- Baglivio, M. T., Jackowski, K., Greenwald, M. A., & Wolff, K. T. (2014).
Point presentation. Ohio Office of Criminal Justice. Comparison of Multisystemic Therapy and Functional Family Therapy
*Timmons‐Mitchell, J., Bender, M. B., Kishna, M. A., & Mitchell, C. C. (2006). Effectiveness: a multiyear statewide propensity score matching analysis
An independent effectiveness trial of Multisystemic Therapy with juve- of juvenile offenders. Criminal Justice and Behavior, 41, 1033–1056.
nile justice youth. Journal of Clinical Child & Adolescent Psychology, 35(2), https://doi.org/10.1177/0093854814543272
227–236. https://doi.org/10.1207/s15374424jccp3502_6
Timmons‐Mitchell, J., Kishna, M., Bender, M., & Mitchell, C. (2003b). As-
B ar n o sk i 2 00 4
sessment and treatment of mental health needs of girls and boys in
the juvenile justice system: The effectiveness of multisystemic ther- Barnoski, R. (2004). Outcome evaluation of Washington State's research‐
apy. Unpublished report, 12 July 2003; 26 pages. based programs for juvenile offenders: Appendices. Olympia, WA:
Washington State Institute for Public Policy.
Barnoski, R., & Aos, S. (2004). Outcome evaluation of Washington State's
W ag n e r 20 1 9
research‐based programs for juvenile offenders. Olympia, WA: Wa-
Wagner, D. V. (2013). Adapting multisystemic therapy for disruptive be- shington State Institute for Public Policy.
havior problems in youth with autism spectrum disorder: Conceptual
and empirical development. Unpublished dissertation. University of
B ar th 2 00 7
Missouri‐‐Columbia [full text not available].
Wagner, D. V., Borduin, C. M., Kanne, S. M., Mazurek, M. O., Farmer, J. E., Barth, R. P., Greeson, J. K. P., Guo, S., Green, R. L., Hurley, S., & Sisson, J.
& Brown, R. M. A. (2014). Multisystemic therapy for disruptive be- (2007). Outcomes for youth receiving intensive in‐home therapy or
havior problems in youths with autism spectrum disorders: A progress residential care: A comparison using propensity scores. American
report. Journal of Marital and Family Therapy, 40(3), 319–331. https:// Journal of Orthopsychiatry, 77, 497–505. https://doi.org/10.1037/
doi.org/10.1111/jmft.12012 0002‐9432.77.4.497
*Wagner, D. V., Borduin, C. M., Mazurek, M. O., Kanne, S. M., & Dopp, A. R.
(2019). Multisystemic Therapy for Disruptive Behavior Problems in B e rn s te i n 20 05
Youths with Autism Spectrum Disorder: Results from a Small Ran- Bernstein, D., Coen, A. S., & Brunk, M. (2005, February). Developing and
domized Clinical Trial. Evidence‐Based Practice in Child and Adolescent piloting a statewide outcomes database for MST teams in Colorado.
Mental Health, 4(1), 42–54. https://doi.org/10.1080/23794925.2018. In: Presented at the 18th Annual RTC conference, Tampa, FL. Re-
1560237 trieved from http://rtckids.fmhi.usf.edu/rtcconference/handouts/pdf/
18/Session%201000/Bernstein.pdf
Weiss 2 013
Ellis, M., Weiss, B., Han, S., & Gallop, R. (2010). The influence of parental B l a n kes te i n 2 0 19 a
factors on therapist adherence in multi‐systemic therapy. Journal of Blankestein, A., Lange, A., van der Rijken, R., Scholte, R., Moonen, X., &
Abnormal Child Psychology, 38(6), 857–868. https://doi.org/10.1007/ Didden, R. (2019a). Brief report: Follow‐up outcomes of multisystemic
s10802‐010‐9407‐0 therapy for adolescents with an intellectual disability and the relation
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 105 of 192

with parental intellectual disability. Journal of Applied Research in In- C ar tw ri g h t 2 0 09


tellectual Disabilities, 33, 618–624. https://doi.org/10.1111/jar.12691 Cartwright, W. S., Kitsantas, P., & Rose, S. R. (2009). A demographic‐
economic model for adolescent abuse and crime prevention. Journal of
Bla n k est ein 2 01 9b Comparative Social Welfare, 25, 157–172. https://doi.org/10.1080/
Blankestein, A., van der Rijken, R., Eeren, H. V., Lange, A., Scholte, R., 17486830902789780
Moonen, X., De Vuyst, K., Leunissen, J., & Didden, R. (2019b). Evalu-
ating the effects of multisystemic therapy for adolescents with in- Conne ll 2016
tellectual disabilities and antisocial or delinquent behaviour and their Connell, C. M., Steeger, C. M., Schroeder, J. A., Franks, R. P., & Tebes, J. K.
parents. Journal of Applied Research in Intellectual Disabilities, 32(3), (2016). Child and case influences on recidivism in a statewide dis-
575–590. https://doi.org/10.1111/jar.12551 semination of Multisystemic Therapy for juvenile offenders. Criminal
Justice and Behavior, 43(10), 1330–1346. https://doi.org/10.1177/
Boonstra 2009 0093854816641715
Boonstra, C., Jonkman, C., Speteman, D., & Van Busschbach, J. (2009).
Multisystemic therapy for seriously antisocial and delinquent juve- Cunningham 2001
niles: Two‐year follow‐up study. Systemic Therapy, 21, 94–104. Cunningham, P. B., & Henggeler, S. W. (2001). Implementation of an
empirically based drug and violence prevention and intervention
Boonstra 2018 program in public school settings. Journal of Clinical Child Psychology,
Boonstra, C., Doelman, E., Lange, A. M., & van der Rijken, R. (2018). A de- 30(2), 221–232. https://doi.org/10.1207/S15374424JCCP3002_9
scription of the treatment results for Multisystemic Therapy for adoles-
cents with serious sexual behavioural problems (MST‐PSB). Kind en Cunningham 2009
Adolescent, 39(4), 282–296. https://doi.org/10.1007/s12453‐018‐0186‐7 Glebova, T., Foster, S. L., Cunningham, P. B., Brennan, P. A., & Whitmore, E.
(2012). Examining therapist comfort in delivering family therapy in
Boxer 20 11 home and community settings: development and evaluation of the
Boxer, P. (2011). Negative peer involvement in multisystemic therapy for Therapist Comfort Scale. Psychotherapy: Theory, Research, Practice,
the treatment of youth problem behavior: exploring outcome and Training, 49(1), 52–61. https://doi.org/10.1037/a0025910
process variables in “real‐world” practice. Journal of Clinical Child & Robinson, B. A., Winiarski, D. A., Brennan, P. A., Foster, S. L.,
Adolescent Psychology, 40(6), 848–854. https://doi.org/10.1080/ Cunningham, P. B., & Whitmore, E. A. (2015). Social context, parental
15374416.2011.614583 monitoring, and multisystemic therapy outcomes. Psychotherapy,
51(1), 103–110. https://doi.org/10.1037/a0037948
Ryan, S. R., Cunningham, P. B., Foster, S. L., Brennan, P. A., Brock, R. L., &
Boxer 20 17
Whitmore, E. (2013). Predictors of Therapist Adherence and Emo-
Boxer, P., Docherty, M., Ostermann, M., Kubik, J., & Veysey, B. (2017). tional Bond in Multisystemic Therapy: Testing Ethnicity as a Mod-
Effectiveness of Multisystemic Therapy for gang‐involved youth of- erator. Journal of Child and Family Studies, 22(1), 122–136. https://doi.
fenders: One year follow‐up analysis of recidivism outcomes. Children org/10.1007/s10826‐012‐9638‐5
and Youth Services Review, 73, 107–112. https://doi.org/10.1016/j. Tiernan, K., Foster, S. L., Cunningham, P. B., Brennan, P., & Whitmore, E.
childyouth.2016.12.008 (2015). Predicting early positive change in multisystemic therapy with
Boxer, P., Kubik, J., Ostermann, M., & Veysey, B. (2015). Gang involvement youth exhibiting antisocial behaviors. Psychotherapy, 52, 93–102.
moderates the effectiveness of evidence‐based intervention for https://doi.org/10.1037/a0035975
justice‐involved youth. Children and Youth Services Review, 52, Tiernan, K. N. (2011). Predicting early positive change in multisystemic
26–33. https://doi.org/10.1016/j.childyouth.2015.02.012 therapy with youth exhibiting antisocial behaviors. Unpublished dis-
sertation. California School of Professional Psychology, Alliant Inter-
Br u n k 1 98 7 national University.
Brunk, M., Henggeler, S. W., & Whelan, J. P. (1987). A comparison of
multisystemic therapy and parent training in the brief treatment of C u r ti s 20 0 9
child abuse and neglect. Journal of Consulting and Clinical Psychology, Curtis, N. M., Ronan, K. R., Heilblum, N., & Crellin, K. (2009). Dissemina-
55(2), 171–178. https://doi.org/10.1037/0022‐006X.55.2.171 tion and effectiveness of multisystemic treatment in New Zealand: A
Brunk Tribble, M. A. (1985). A comparison of Multisystemic Therapy and benchmarking study. Journal of Family Psychology, 23(2), 119–129.
parent training in the brief treatment of child abuse and neglect. https://doi.org/10.1037/a0014974
Unpublished dissertation. Memphis State University.

D æ h l e n 2 0 16
Br u n k 2 01 4
Dæhlen, M., & Madsen, C. (2016). School enrolment following multi-
Brunk, M. A., Chapman, J. E., & Schoenwald, S. K. (2014). Defining and systemic treatment: A register‐based examination among youth with
evaluating fidelity at the program level in psychosocial treatments: A severe behavioural problems. Children & Youth Services Review, 67,
preliminary investigation. Zeitschrift Fur Psychologie, 222(1), 22–29. 76–83. https://doi.org/10.1016/j.childyouth.2016.05.016
https://doi.org/10.1027/2151‐2604/a000162

D a v i s 2 0 15
By t y c i 2 01 7
Davis, M., Sheidow, A. J., & McCart, M. R. (2015). Reducing recidivism and
Bytyci, D. G. (2017). The effectiveness of multi‐systemic family therapy in symptoms in emerging adults with serious mental health conditions and
bullying behavior of adolescents. European Psychiatry, 41, S262–S262. justice system involvement. The Journal of Behavioral Health Services &
https://doi.org/10.1016/j.eurpsy.2017.02.073 Research, 42(2), 172–190. https://doi.org/10.1007/s11414‐014‐9425‐8
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
106 of 192 | LITTELL ET AL.

Davis, M., Sheidow, A. J., McCart, M. R., & Perrault, R. T. (2018). Voca- E l l i s 20 0 4
tional coaches for justice‐involved emerging adults. Psychiatric Re- Ellis, D. A., Naar‐King, S., Frey, M., Templin, T., Rowland, M., & Greger, N.
habilitation Journal, 41(4), 266–276. https://doi.org/10.1037/ (2005). Use of multisystemic therapy to improve regimen adherence
prj0000323 among adolescents with type 1 diabetes in poor metabolic control: A
pilot investigation. Journal of clinical psychology in medical settings, 11(4),
Davi s 2 016 315–324. https://doi.org/10.1023/B:JOCS.0000045351.98563.4d
Davis, M. (2016). Multisystemic therapy: Emerging adults trial. https://
Clinicaltrials.Gov/Show/NCT02922335 Ellis 2005
Ellis, D., Naar‐King, S., Templin, T., Frey, M., Cunningham, P., Sheidow, A.,
Daw e 20 01 Cakan, N., & Idalski, A. (2008). Multisystemic therapy for adolescents
Dawe, S., & Harnett, P. (2007). Reducing potential for child abuse among with poorly controlled type 1 diabetes: reduced diabetic ketoacidosis
methadone‐maintained parents: Results from a randomized con- admissions and related costs over 24 months. Diabetes Care, 31(9),
trolled trial. Substance Abuse Treatment, 32, 381–390. https://doi.org/ 1746–1747. https://doi.org/10.2337/dc07‐2094
10.1016/j.jsat.2006.10.003 Ellis, D. A., Frey, M. A., Naar‐King, S., Templin, T., Cunningham, P., &
Dawe, S., Harnett, P., & Staiger, P. (2001). Multisystemic family therapy in Cakan, N. (2005). Use of multisystemic therapy to improve regimen
methadone maintained families: Preliminary results from a randomized adherence among adolescents with type 1 diabetes in chronic poor
controlled trial. Drug and Alcohol Dependence. 63(Suppl 1), 37–38. metabolic control: a randomized controlled trial. Diabetes Care, 28(7),
1604–1610. https://doi.org/10.2337/diacare.28.7.1604
Ellis, D. A., Frey, M. A., Naar‐King, S., Templin, T., Cunningham, P. B., &
D e K r aa i 20 0 4
Cakan, N. (2005). The effects of multisystemic therapy on diabetes
DeKraai, M., Hoffman, S. J., Dillion, Y. A., Handley, T. J., Baxter, B., & stress among adolescents with chronically poorly controlled type 1
Tvrdik, A. (2004). The impact of Multisystemic Therapy on children diabetes: findings from a randomized, controlled trial. Pediatrics,
within a system of care. Presentation at the 17th Annual RTC Con- 116(6), e826–e832. https://doi.org/10.1542/peds.2005‐0638
ference, Tampa, FL. Retrieved from http://rtckids.fmhi.usf.edu/ Ellis, D. A., Naar‐King, S., Templin, T., Frey, M. A., & Cunningham, P. B.
rtcconference/handouts/pdf/17/Session%2044/DeKraai.pdf (2007). Improving health outcomes among youth with poorly con-
trolled type I diabetes: the role of treatment fidelity in a randomized
Dirks‐ L i n h o r s t 2 0 0 4 clinical trial of multisystemic therapy. Journal of Family Psychology,
Dirks‐Linhorst, P. A. (2004). An evaluation of a family court diversion 21(3), 363–371. https://doi.org/10.1037/0893‐3200.21.3.363
program for delinquent youth with chronic mental health needs. Ellis, D. A., Templin, T., Naar‐King, S., Frey, M. A., Cunningham, P. B.,
Unpublished dissertation. University of Missouri: St Louis. Podolski, C., & Cakan, N. (2007). Multisystemic therapy for adoles-
cents with poorly controlled type I diabetes: Stability of treatment
effects in a randomized controlled trial. Journal of Consulting & Clinical
Dopp 20 18
Psychology, 75(1), 168–174. https://doi.org/10.1037/0022‐006X.75.1.
Dopp, A. R., Coen, A. S., Smith, A. B., Reno, J., Bernstein, D. H., Kerns, S. E. U., 168
& Altschul, D. (2018). Economic impact of the statewide implementation
of an evidence‐based treatment: Multisystemic Therapy in New Mexico.
Ellis 2006
The Intersection of Implementation Science and Behavioral Health, 49(4),
551–566. https://doi.org/10.1016/j.beth.2017.12.003 Ellis, D. A., Naar‐King, S., Cunningham, P. B., & Secord, E. (2006). Use of
multisystemic therapy to improve antiretroviral adherence and health
outcomes in HIV‐infected pediatric patients: evaluation of a pilot
Dousi 2 005
program. AIDS Patient Care & Stds, 20(2), 112–121. https://doi.org/10.
Dousi, P. J. (2005). The impact of prescribed family play on families in 1089/apc.2006.20.112
conflict. Unpublished dissertation. Capella University.

Ellis 2010
Drew 20 19
Ellis, D. A., Janisse, H., Naar‐King, S., Kolmodin, K., Jen, K. L.,
Drew, H., Holmes, L., Dunn, V., & Harrison, N. (2019). Evaluation of the Mul- Cunningham, P. B., & Marshall, S. (2010). The effects of multisystemic
tisystemic Therapy Service in Essex: Report of the Findings. Rees Centre, therapy on family support for weight loss among obese African‐
University of Oxford. https://doi.org/10.13140/RG.2.2.11024.97282 American adolescents: findings from a randomized controlled trial.
Journal of Developmental Pediatrics, 31, 461–468. https://doi.org/10.
Eer e n 20 1 8 1097/DBP.0b013e3181e35337
Eeren, H. V., Goossens, L. M. A., Scholte, R. H. J., Busschbach, J. J. V., & van
der Rijken, R. E. A. (2018). Multisystemic Therapy and Functional Ellis 2012
Family Therapy compared on their effectiveness using the propensity Ellis, D. A., Naar‐King, S., Chen, X., Moltz, K., Cunningham, P. B., & Idalski‐
score method. Journal of Abnormal Child Psychology, 46(5), Carcone, A. (2012). Multisystemic therapy compared to telephone
1037–1050. https://doi.org/10.1007/s10802‐017‐0392‐4 support for youth with poorly controlled diabetes: Findings from a
randomized controlled trial. Annals of Behavioral Medicine, 44(2),
Elli s 2 003 207–215. https://doi.org/10.1007/s12160‐012‐9378‐1
Ellis, D. A., Naar‐King, S., Frey, M., Rowland, M., Greger, N., Ellis, D. A.,
Naar‐King, S., Frey, M., Rowland, M., & Greger, N. (2003). Case Fain 2014
Study: Feasibility of multisystemic therapy as a treatment for urban Fain, T., Greathouse, S. M., Turner, S., & Weinberg, H. D. (2004). Is Mul-
adolescents with poorly controlled type 1 diabetes. Journal of Pe- tisystemic Therapy (MST) effective for hispanic youth? An evaluation
diatric Psychology, 28(4), 287–293. https://doi.org/10.1093/jpepsy/ of outcomes for juvenile offenders in Los Angeles County
jsg017
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 107 of 192

(No. RB‐9791). Santa Monica, CA: RAND Corporation. Retrieved Henggeler, S. W., Rodick, J. D., Borduin, C. M., Hanson, C. L., Watson, S. M.,
from. https://www.rand.org/pubs/research_briefs/RB9791.html & Urey, J. R. (1986). Multisystemic treatment of juvenile offenders:
Fain, T., Greathouse, S. M., Turner, S. F., & Weinberg, H. D. (2014). Effective- Effects on adolescent behavior and family interactions. Developmental
ness of Multisystemic Therapy for minority youth: outcomes over 8 years Psychology, 22, 132–141. https://doi.org/10.1037/0012‐1649.22.1.
in Los Angeles County. OJJDP Journal of Juvenile Justice, 3(2), 24–37. 132

Fra n ks 20 06 He nggel er 2 00 2
Franks, R. P., & Adnopoz, J. (2006). Implementing evidence‐based prac- Henggeler, S. W., Schoenwald, S. K., Liao, J. G., Letourneau, E. J., &
tices at the state level: Challenges, successes, and lessons learned. Edwards, D. L. (2002). Transporting efficacious treatments to field
Presented at the 19th Annual RTC conference, Tampa, FL. Retrieved settings: the link between supervisory practices and therapist fidelity
from http://rtckids.fmhi.usf.edu/rtcconference/handouts/pdf/19/ in MST programs. Journal of Clinical Child & Adolescent Psychology,
Session%2032/franks.pdf 31(2), 155–167. https://doi.org/10.1207/s15374424jccp3102_02
Franks, R. P., Schroeder, J. A., & Connell, C. M. (2009). Evaluation of Schoenwald, S. K., Halliday‐Boykins, C. A., & Henggeler, S. W. (2003).
Multisystemic Therapy (MST) in Connecticut: Examining the adoption, Client‐level predictors of adherence to MST in community service
implementation, and outcomes of a statewide evidence based practice settings. Family Process, 42(3), 345–359. https://doi.org/10.1111/j.
initiative. Presented at the 22nd Annual RTC Conference, Tampa, FL. 1545‐5300.2003.00345.x
Retrieved from http://rtckids.fmhi.usf.edu/rtcconference/handouts/
default.cfm?appid=22177 Ho l th 20 1 2
Franks, R. P., Schroeder, J. A., & Fixsen, D. L. (2008). Evaluating the statewide
Holth, P., Torsheim, T., Sheidow, A., Ogden, T., & Henggeler, S. (2011).
implementation and outcomes of evidence‐based practice: Results of
Intensive quality assurance of therapist adherence to behavioral in-
Connecticut's MST progress report. Presented at the 21st Annual RTC
terventions for adolescent substance use problems. Journal of child &
Conference, Tampa, FL. Retrieved from http://rtckids.fmhi.usf.edu/
adolescent substance abuse, 20, 289–313. 10.1080/1067828X.2011.
rtcconference/handouts/pdf/21/Session%2049/franks.pdf
581974

Ger v a n 20 12
Hu r l e y 2 00 4
Gervan, S., Granic, I., Solomon, T., Blokland, K., & Ferguson, B. (2012).
Hurley, S. Personal communication, 13 May 2020.
Paternal involvement in Multisystemic Therapy: effects on adolescent
Hurley, S., Goldsmith, T., Vander Weg, M. W., Sell, M., Mittleman, D.,
outcomes and maternal depression. Journal of Adolescence, 35(3), 743
Relyea, G., & Sisson, J. (2006). Intensive In‐home Therapy as Early In-
–751. https://doi.org/10.1016/j.adolescence.2011.10.009
tervention: Results from a Clinical Trial. In: 18th Annual Conference on A
System of Care for Children's Mental Health: Expanding the Research
Gil es 2 00 4 Base, Chapter 11: Creating Integrated Service Systems, 425–428. http://
Giles, M. J. (2004). Gender differences among adolescents with conduct rtckids.fmhi.usf.edu/rtcconference/proceedings/18thproceedings/
disorder in response to day treatment or multisystemic therapy. 18thChapter11.pdf
Unpublished dissertation. Waldon University. Hurley, S., Vander Weg, M., & Goldsmith, T. (2004). Staying the
course: Correlates and effects of therapist adherence to the
Gr i mb o s 20 0 9 Multisystemic Therapy model. Presented at the 17th Annual RTC
Conference, Tampa, FL. Retrieved 11 October 2009 from http://
Grimbos, T., & Granic, I. (2009). Changes in maternal depression are as-
rtckids.fmhi.usf.edu/rtcconference/handouts/pdf/17/Session%
sociated with MST outcomes for adolescents with co‐occurring ex-
2044/Hurley.pdf
ternalizing and internalizing problems. Journal of Adolescence, 32(6),
Hurley, S., Vender Weg, M. W., & Goldsmith, T. (2005). Staying the
1415–1423. https://doi.org/10.1016/j.adolescence.2009.05.004
Course: Correlates and Effects of Therapist Adherence to the Multi‐
Systemic Therapy Model. In: 17th Annual Conference on A System of
Heb e rt 2 01 4 Care for Children's Mental Health: Expanding the Research Base,
Hebert, S., Bor, W., Swenson, C. C., & Boyle, C. (2014). Improving colla- Chapter Two: Evidence‐Based Practices and Processes in Systems of
boration: A qualitative assessment of inter‐agency collaboration be- Care. Tampa, FL: Research and Training Center for Children's Mental
tween a pilot Multisystemic Therapy Child Abuse and Neglect (MST‐ Health, 2005:111–114. http://rtckids.fmhi.usf.edu/rtcconference/
CAN) program and a child protection team. Australasian Psychiatry, proceedings/17thproceedings/chapter02.pdf
22(4), 370–373. https://doi.org/10.1177/1039856214539572
L an g e 2 01 7
Heft i 2 02 0 Lange, A. M. C., van der Rijken, R. E. A., Busschbach, J. J. V., Delsing, M. J.
Hefti, S., Pérez, T., Fürstenau, U., Rhiner, B., Swenson, C. C., & Schmid, M. M. H., & Scholte, R. H. J. (2017). It's not just the Therapist: Therapist
(2020). Multisystemic Therapy for child abuse and neglect: do parents and Country‐Wide Experience Predict Therapist Adherence and
show improvement in parental mental health problems and parental Adolescent Outcome. Child & Youth Care Forum, 46(4), 455–471.
stress? Journal of Marital and Family Therapy, 46(1), 95–109. https:// https://doi.org/10.1007/s10566‐016‐9388‐4
doi.org/10.1111/jmft.12367 Lange, A. M. C., van der Rijken, R. E. A., Delsing, M. J. M. H.,
Busschbach, J. J. V., & Scholte, R. H. J. (2019). Development of
H e n g g e l e r 19 86 Therapist Adherence in Relation to Treatment Outcomes of Ado-
lescents with Behavioral Problems. Journal of Clinical Child and
Borduin, C. M., & Henggeler, S. W. (1990). A multisystemic approach to
Adolescent Psychology: The Official Journal for the Society of Clinical
the treatment of serious delinquent behavior. In R. J. McMahon & R.
Child and Adolescent Psychology, American Psychological Association,
D. Peters (Eds.), Behavior disorders of adolescence: Research, inter-
Division, 48(sup1), S337–S346. https://doi.org/10.1080/15374416.
vention, and policy in clinical and school settings (pp. 63–80). New
2018.1477049
York: Plenum Press.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
108 of 192 | LITTELL ET AL.

L e e 2 01 3 Exceptional Children, 76(1), 7–30. https://doi.org/10.1177/


Lee, M. Y., Greene, G. J., Fraser, J. S., Edwards, S. G., Grove, D., Solovey, A. 001440290907600101
D., & Scott, P. (2013). Common and Specific Factors Approaches to
Home‐Based Treatment: I‐FAST and MST. Research on Social Work O gd en 20 12
Practice, 23(4), 407–418. https://doi.org/10.1177/104973151348 Ogden, T., Bjornebekk, G., Kjobli, J., Patras, J., Christiansen, T.,
3181 Taraldsen, K., & Tollefsen, N. (2012). Measurement of implementation
components ten years after a nationwide introduction of empirically
Le tourne au 2013 supported programs‐‐a pilot study. Implementation Science, 7, 49.
Letourneau, E. J., Ellis, D. A., Naar‐King, S., Chapman, J. E., Cunningham, P. https://doi.org/10.1186/1748‐5908‐7‐49
B., & Fowler, S. (2013). Multisystemic therapy for poorly adherent
youth with HIV: Results from a pilot randomized controlled trial. AIDS P a i n ter 20 0 7
Care, 25(4), 507–514. https://doi.org/10.1080/09540121.2012. Painter, K. (2009). Multisystemic Therapy as community‐based treatment
715134 for youth with severe emotional disturbance. Research on Social Work
Practice, 19(3), 314–324. https://doi.org/10.1177/1049731
Lit tl e 20 04 508318772
Little, M., Kogan, J., Bullock, R., & Laan, P. V. D. (2004). ISSP: An experi- Painter, K. R. (2007). A quasi‐experimental design: Multisystemic Therapy
ment in multi‐systemic responses to persistent young offenders as an alternative community‐based treatment for youth with severe
known to children's services. British Journal of Criminology, 44(2), 225 emotional disturbance. Unpublished dissertation. University of Texas
–240. https://doi.org/10.1093/bjc/44.2.225 at Arlington.

Lo f h o l m 2 0 1 4 Pendl ey 2002
Lofholm, C. A., Eichas, K., & Sundell, K. (2014). The Swedish im- Pendley, J. S., Kasmen, L. J., Miller, D. L., Donze, J., Swenson, C., &
plementation of Multisystemic Therapy for adolescents: Does treat- Reeves, G. (2002). Peer and family support in children and adoles-
ment experience predic treatment adherence? Journal of Clinical Child cents with Type 1 diabetes. Journal of Pediatric Psychology, 27(5),
& Adolescent Psychology, 43(4), 643–655. https://doi.org/10.1080/ 429–438. https://doi.org/10.1093/jpepsy/27.5.429
15374416.2014.883926
P o rt e r 2 0 16
Ma y f i e l d 2 01 1 Porter, M., & Nuntavisit, L. (2016). An evaluation of Multisystemic Ther-
Mayfield, J. (2011). Multisystemic Therapy Outcomes in an Evidence‐ apy with Australian families. Australian and New Zealand. Journal of
Based Practice Pilot. Document No. 11‐04‐3901. Olympia, WA: Wa- Family Therapy, 37, 443–462. https://doi.org/10.1002/anzf.1182
shington State Institute for Public Policy. https://www.wsipp.wa.gov/
ReportFile/1084/Wsipp_Multisystemic‐Therapy‐Outcomes‐in‐an‐ R a n d a l l 19 9 9
Evidence‐Based‐Practice‐Pilot_Full‐Report.pdf Randall, J., Swenson, C. C., & Henggeler, S. W. (1999). Neighborhood so-
lutions for neighborhood problems: An empirically‐based violence
Mi tch ell ‐H e rz f e l d 2 00 8 prevention collaboration. Health, Education, and Behavior, 26,
Mitchell‐Herzfeld, S., Shady, T. A., Mayo, J., Kim, D. H., Marsh, K., 806–820. https://doi.org/10.1177/109019819902600605
Dorabawila, V., & Rees, F. (2008). Effects of Multisystemic Therapy
(MST) on recidivism among juvenile delinquents in New York State. Ros enbla tt 20 01
New York State Office of Children and Family Services. Rosenblatt, A., Deuel, L.‐L., Mak, W., Thornton, P., Baize, H., Morea, J., &
Satin, R. (2000). A test of the efficacy of multisystemic therapy for re- Smucker, S. (2001). Evaluation of two therapeutic programs for chil-
ducing recidivism and decreasing the length of residential treatment. dren with serious mental health problems and their families: Home‐
Paper presented at the First Annual International MST Conference, based mutlisystemic therapy (MST) and the MST continuum of care.
Savannah, GA. San Francisco, CA: University of California San Francisco, Child Ser-
vices Research Group.
Na ar ‐ King 20 09
Naar‐King, S., Ellis, D., Kolmodin, K., Cunningham, P., Jen, K. L. C., R o v e r s 2 01 9
Saelens, B., & Brogan, K. (2009). A Randomized Pilot Study of Multi- Rovers, A., Blankestein, A., van der Rijken, R., Scholte, R., & Lange, A.
systemic Therapy Targeting Obesity in African‐American Adolescents. (2019). Treatment Outcomes of a Shortened Secure Residential
Journal of Adolescent Health, 45(4), 417–419. https://doi.org/10.1016/ Stay Combined With Multisystemic Therapy: A Pilot Study. Inter-
j.jadohealth.2009.03.022 national Journal of Offender Therapy & Comparative Criminology,
63(15/16), 2654–2671. https://doi.org/10.1177/0306624X1985
Na ar ‐ King 20 14 6521
Naar‐King, S., Ellis, D., King, P. S., Lam, P., Cunningham, P., Secord, E.,
Bruzzese, J. M., & Templin, T. (2014). Multisystemic Therapy for high‐ Rowl and 2 017
risk African American adolescents with asthma: A randomized clinical Rowland, M. (2019). Multisystemic therapy psychiatric adaptation
trial. Journal of Consulting and Clinical Psychology, 82(3), 536–645. program. The New York Foundling's final report to the Robin Hood
https://doi.org/10.1037/a0036092 Foundation 2017. [cited in: Rowland MD. A psychiatric adaptation
of Multisystemic Therapy for suicidal youth. In M. Berk (Ed.),
Nels on 20 09 Evidence‐based treatment approaches for suicidal adolescents:
Nelson, J. R., Hurley, K. D., Synhorst, L., Epstein, M. H., Stage, S., & Translating science into practice (pp. 191–228). American Psy-
Buckley, J. (2009). The Child Outcomes of a Behavior Model. chiatric Association.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 109 of 192

Sc haef fer 20 13 St o u t 2013


Schaeffer, C. M., Swenson, C. C., Tuerk, E. H., & Henggeler, S. W. (2013). Stout, B., & Holleran, D. (2013). The Impact of Evidence‐Based Practices
Comprehensive treatment for co‐occurring child maltreatment and on Requests for Out‐of‐Home Placements in the Context of System
parental substance abuse: Outcomes from a 24‐month pilot study of Reform. Journal of Child & Family Studies, 22(3), 311–321. https://doi.
the MST‐Building Stronger Families Program. Child Abuse & Neglect, org/10.1007/s10826‐012‐9580‐6
37, 596–607. https://doi.org/10.1016/j.chiabu.2013.04.004
Swenson, C. C., Schaeffer, C. M., Tuerk, E. H., Henggeler, S. W., Tuten, M., S u tp h en 19 93
Panzarella, P., Lau, C., Remele, L., Foley, T., Cannata, E., & Guillorn, A.
Sutphen, R. D. (1993). The Evaluation of a Multisystemic Treatment
(2009). Adapting Multisystemic Therapy for co‐occurring child mal-
Program for High‐Risk Juvenile Delinquents. Unpublished disserta-
treatment and parental substance abuse: The Building Stronger Fa-
tion. University of Georgia.
milies Project. Report on Emotional & Behavioral Disorders in Youth, 3–8.
Sutphen, R. D., Thyer, B. A., & Kurtz, P. D. (1995). Multisystemic treatment
Winter.
of high‐risk juvenile offenders. International Journal of Offender Therapy
and Comparative Criminology, 39, 327–334. https://doi.org/10.1177/
Schoenwald 2003 0306624X9503900405
Halliday‐Boykins, C. A., Schoenwald, S. K., & Letourneau, E. J. (2005).
Caregiver‐therapist ethnic similarity predicts youth outcomes from an S wen so n 2 01 2
empirically‐based treatment. Journal of Clinical and Consulting Psychology,
Swenson, C. C. (2012). Family‐Based Treatment for Parental Substance
73(5), 808–818. https://doi.org/10.1037/0022‐006X.73.5.808
Abuse and Child Maltreatment. https://clinicaltrials.gov/ct2/show/
Schoenwald, S. K. (2005). MST transportability project: Annual report for
NCT01656837
2004. Unpublished manuscript.
Schoenwald, S. K., Letourneau, E. J., & Halliday‐Boykins, C. (2005). Pre-
dicting therapist adherence to a transported family‐based treatment t er B eek 20 18
for youth. Journal of Clinical Child and Adolescent Psychology, 34(4), 658 ter Beek, E., van der Rijken, R. E. A., Kuiper, C. H. Z., Hendriks, J., &
–670. https://doi.org/10.1207/s15374424jccp3404_8 Stams, G. J. J. M. (2018). The allocation of sexually transgressive ju-
Schoenwald, S. K., Sheidow, A. J., Letourneau, E. J., & Liao, J. G. (2003). veniles to intensive specialized treatment: An assessment of the ap-
Transportability of multisystemic therapy: Evidence for multilevel plication of RNR principles. International Journal of Offender Therapy
influences. Mental Health Services Research, 5(4), 223–239. https://doi. and Comparative Criminology, 62(5), 1179–1200. https://doi.org/10.
org/10.1023/A:1026229102151 1177/0306624X16674684

Sheid o w 20 03 T h o ma s 2 00 2
Sheidow, A. Personal communication, 17 May 2020. Thomas, C. R., Holzer, C. E., & Wall, J. (2002). The Island Youth Programs:
Sheidow, A. J. (2003). Development of outpatient MST for dually diag- Community interventions for reducing youth violence and de-
nosed youth. https://clinicaltrials.gov/ct2/show/study/NCT00438685. linquency. Adolescent Psychiatry: The Annals of the American Society for
https://reporter.nih.gov/project‐details/6704951 Adolescent Psychiatry, 26, 125–143.

Sheid o w 20 17 T i m m o n s‐ M i t c h e l l 2 00 5
Sheidow, A. (2017). Multisystemic Therapy‐‐Emerging Adults (MST‐EA) Timmons‐Mitchell, J., Hussey, D. L., Buckeye, L. A., Usaj, K., & Mitchell, C.
for Substance Abuse. https://clinicaltrials.gov/ct2/show/NCT0303 C. (2005). The CAFAS, MST, and Safe Schools Healthy Students: Re-
5877 silience in Action. 18th Annual RTC conference, http://rtckids.fmhi.
usf.edu/rtcconference/handouts/pdf/18/Session%2054/Mitchell.pdf
Smi th ‐ B o ydst on 20 14
Smith‐Boydston, J. M., Holtzman, R. J., & Roberts, M. C. (2014). Trans- T o l ma n 20 0 8
portability of Multisystemic Therapy to Community Settings: Can a Tolman, R. T., Mueller, C. W., Daleiden, E. L., Stumpf, R. E., & Pestle, S. L.
Program Sustain Outcomes Without MST Services Oversight? Child & (2008). Outcomes from multisystemic therapy in a statewide system
Youth Care Forum, 43(5), 593–605. https://doi.org/10.1007/s10566‐ of care. Journal of Child & Family Studies, 17(6), 894–908. https://doi.
014‐9255‐0 org/10.1007/s10826‐008‐9197‐y

Smi th T o l e s 2 0 04 T ru p i n 20 1 1
Smith Toles, M. D. (2004). Mental health and related behavioral problems Trupin, E. J., Kerns, S. E. U., Walker, S. C., DeRobertis, M. T., & Stewart, D.
in children and adolescents: Modified multisystemic versus traditional G. (2011). Family Integrated Transitions: A Promising Program for
therapy in residential treatment. Unpublished dissertation. Walden Juvenile Offenders with Co‐Occurring Disorders. Journal of child &
University. adolescent substance abuse, 20(5), 421–436. https://doi.org/10.1080/
1067828x.2011.614889
St am baugh 2 00 7
Stambaugh, L. F., Mustillo, S. A., Burns, B. J., Stephens, R. L., Baxter, B., V i d a l 2 01 7
Edwards, D., & DeKraai, M. (2007). Outcomes from Wraparound and Vidal, S., Steeger, C. M., Caron, C., Lasher, L., & Connell, C. M. (2017). Pla-
Multisystemic Therapy in a center for mental health services system‐ cement and delinquency outcomes among system‐involved youth re-
of‐care demonstration site. Journal of Emotional and Behavioral Dis- ferred to Multisystemic Therapy: propensity score matching analysis.
orders, 15(3), 143–155. https://doi.org/10.1177/106342660701 Administration and Policy in Mental Health and Mental Health Services Re-
50030201 search, 44(6), 853–866. https://doi.org/10.1007/s10488‐017‐0797‐y
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
110 of 192 | LITTELL ET AL.

W e s ti n 2 0 14 B o rdu i n 1 98 9
Westin, A. M. L., Barksdale, C. L., & Stephan, S. H. (2014). The effect of Borduin, C. M., Blaske, D. M., Cone, L., Mann, B. J., & Hazelrigg, M. D.
waiting time on youth engagement to evidence based treatments. (1989). Development and validation of a measure of peer relations:
Community Mental Health Journal, 50(2), 221–228. https://doi.org/10. The Missouri Peer Relations Inventory. Unpublished manuscript. De-
1007/s10597‐012‐9585‐z partment of Psychology, University of Missouri, Columbia, MO.

Studies awaiting classification Bo r d u i n 19 9 0a


Borduin, C. M., & Henggeler, S. W. (1990a). A multisystemic approach to the
S c h o e n w a l d 20 0 4 treatment of serious delinquent behavior. In R. J. McMahon & R. D. Peters
Schoenwald, S. K. (2019). MST‐based Continuum of Care in Philadelphia. (Eds.), Behavior disorders of adolescence: Research, intervention, and
Final report submitted to the Annie E. Casey Foundation 2004. [cited policy in clinical and school settings. (63–80). New York: Plenum Press.
in: Rowland MD. A psychiatric adaptation of Multisystemic Therapy
for suicidal youth. In M. Berk (Ed.), Evidence‐based treatment ap- Bo r d u i n 19 9 5a
proaches for suicidal adolescents: Translating science into practice (pp. Borduin, C. M., Mann, B. J., Cone, L. T., Henggeler, S. W., Fucci, B. R.,
191–228). American Psychiatric Association. Blaske, D. M., & Williams, R. A. (1995a). Multisystemic treatment of
serious juvenile offenders: long‐term prevention of criminality and
Ongoing studies
violence. Journal of Consulting and Clinical Psychology, 63, 569–578.
https://doi.org/10.1037/0022‐006X.63.4.569
Additional references
Bo r e n st e i n 2 01 0
A c h enb ac h 1 9 9 1
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2010). A
Achenbach, T. M. (1991). Manual for the Child Behavior Checklist and 1991
basic introduction to fixed‐effect and random‐effects models for meta‐
Profile. Burlington: University of Vermont, Department of Psychiatry.
analysis. Research Synthesis Methods, 1(2), 97–111. https://doi.org/10.
1002/jrsm.12
Ac h en b ac h 20 1 6
Achenbach, T. M., Ivanova, M. Y., Rescorla, L. A., Turner, L. V., & Althoff, R. Boutron 2 018
R. (2016). Internalizing/Externalizing Problems: Review and Re-
Boutron, I., & Ravaud, P. (2018). Misrepresentation and distortion of research
commendations for Clinical and Research Applications. Journal of the
in biomedical literature. Proceedings of the National Academy of Sciences,
American Academy of Child & Adolescent Psychiatry, 55(8), 647–656.
115(11), 2613. https://doi.org/10.1073/pnas.1710755115
https://doi.org/10.1016/j.jaac.2016.05.012

B r o n fe n br e n n e r 1979
Ao s 20 0 1
Bronfenbrenner, U. (1979). The ecology of human development: Experi-
Aos, S., Phipps, P., Barnoski, R., & Lieb, R. (2001). The comparative costs
ments by nature and design. Cambridge, MA: Harvard University
and benefits of programs to reduce crime (Version 4.0). Document
Press.
Number 01‐05‐1201. Washington State Institute for Public Policy.
http://www.wsipp.wa.gov/rptfiles/costbenefit.pdf (accessed February
2004).
Brosnan 2000
Brosnan, R., & Carr, A. (2000). Adolescent conduct problems. In A. Carr
(Ed.), What works with children and adolescents? A critical review of
Ao s 20 0 6
psychological interventions with children, adolescents and their fa-
Aos, S., Miller, M., & Drake, E. (2006). Evidence‐Based public policy options to
milies (pp. 131–154). Florence, KY: Taylor & Francis/Routledge.
reduce future prison construction, criminal justice costs, and crime rates.
Olympia, WA: Washington State Institute for Public Policy.
Br o w n 19 9 9
Brown, T. L., Henggeler, S. W., Schoenwald, S. K., Brondino, M. J., &
AP A 2 01 3
Pickrel, S. G. (1999). Multisystemic treatment of substance abusing
American Psychiatric Association. (2013). Diagnostic and Statistical Man-
and dependent juvenile delinquents: Effects on school attendance at
ual of Mental Disorders (5th edition.). Arlington, VA: American Psy-
posttreatment and 6‐month follow‐up. Children's Services: Social Policy,
chiatric Association.
Research, and Practice, 2(2), 81–93. https://doi.org/10.1207/
s15326918cs0202_2
Barnoski 2009
Barnoski, R. (2009). Providing Evidence‐Based Programs With Fidelity in Bur n s 2 00 0
Washington State Juvenile Courts: Cost Analysis. Report No.: 09‐
Burns, B. J., Schoenwald, S. K., Burchard, J. D., Faw, L., & Santos, A. B.
12–1201. Olympia, WA: Washington State Institute for Public Policy.
(2000). Comprehensive community‐based interventions for youth with
https://www.wsipp.wa.gov/ReportFile/1058/Wsipp_Providing‐
severe emotional disorders: multisystemic therapy and the wrap‐
Evidence‐Based‐Programs‐With‐Fidelity‐in‐Washington‐State‐
around process. Journal of child and family studies, 9(3), 283–314.
Juvenile‐Courts‐Cost‐Analysis_Full‐Report.pdf
https://doi.org/10.1023/A:1026440406435

Bl on ig e n 2 0 1 5
Bur n s 2 00 4
Blonigen, D. M., Finney, J. W., Wilbourne, P. L., & Moos, R. H. (2015).
Burns, B. J., Compton, S. N., Egger, H. L., & Farmer, E. M. Z. (2004). An
Psychosocial treatments for substance use disorders. In P. E. Nathan &
annotated review of the evidence base for psychosocial and psycho-
J. M. Gorman (Eds.), A guide to treatments that work (4th edition, pp.
pharmacological interventions for children with attention‐deficit/
731–761). Oxford University Press.
hyperactivity disorder, major depressive disorder, disruptive
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 111 of 192

behavior disorders, anxiety disorders, and posttraumatic stress dis- review. PLoS One, 8(7), e66844. https://doi.org/10.1371/journal.pone.
order. NIDA Commissioned Paper. http://www.drugabuse.gov/ 0066844
Meetings/Childhood/Commissioned/Burns_Compton_Egger_Farmer/
Burns1.html (accessed February 2004). E i sn e r 20 0 9
Eisner, M. (2009). No effects in independent prevention trials: Can we
C a rte r 2 01 9 reject the cynical view? Journal of Experimental Criminology, 5(2),
Carter, A. (2019). The consequences of adolescent delinquent behavior 163–183. https://doi.org/10.1007/s11292‐009‐9071‐y
for adult employment outcomes. Journal of Youth and Adolescence,
48(1), 17–29. https://doi.org/10.1007/s10964‐018‐0934‐2 Elliott 1983
Elliott, D. S., Ageton, S. S., Huizinga, D., Knowles, B. A., & Canter, R. J.
C h a l mer s 1 99 0 (1983). The prevalence and incidence of delinquent behavior: 1976‐
Chalmers, l. (1990). Underreporting research is scientific misconduct. 1980 (National Youth Survey Project Report No. 26). Boulder, CO:
Journal of the American Medical Association, 263(19), 1405–1408. Behavioral Research Institute.
https://doi.org/10.1001/jama.1990.03440100121018
E rs ki n e 2 0 15
C h a l mer s 2 00 7 Erskine, H. E., Moffitt, T. E., Copeland, W. E., Costello, E. J., Ferrari, A. J.,
Chalmers, I. Personal communication, 1 March 2007. Patton, G., Degenhardt, L., Vos, T., Whiteford, H. A., & Scott, J. G.
(2015). A heavy burden on young minds: the global burden of mental
C h a l mer s 2 00 9 and substance use disorders in children and youth. Psychol Medicine,
45(7), 1551–1561. https://doi.org/10.1017/S0033291714002888
Chalmers, I., & Glasziou, P. (2009). Avoidable waste in the production and
reporting of research evidence. The Lancet, 374(9683), 86–89. https://
doi.org/10.1016/S0140‐6736(09)60329‐9 F a rr i n g to n 2 00 3
Farrington, D. P., & Welsh, B. C. (2003). Family‐based prevention of of-
C h a n 2 0 17 fending: A meta‐analysis. The Australian and New Zealand Journal of
Criminology, 36(2), 127–151. https://doi.org/10.1375/acri.36.2.127
Chan, A. ‐W., Pello, A., Kitchen, J., Axentiev, A., Virtanen, J. I., Liu, A., &
Hemminki, E. (2017). Association of Trial Registration With Reporting
of Primary Outcomes in Protocols and Publications. Journal of the F e r g u ss o n 2 0 07
American Medical Association, 318(17), 1709–1711. https://doi.org/10. Fergusson, D. M., Boden, J. M., & Horwood, L. J. (2007). Recurrence of major
1001/jama.2017.13001 depression in adolescence and early adulthood, and later mental health,
educational and economic outcomes. British Journal of Psychiatry, 191(4),
C h i u 20 1 7 335–342. https://doi.org/10.1192/bjp.bp.107.036079

Chiu, K., Grundy, Q., & Bero, L. (2017). “Spin” in published biomedical
literature: A methodological systematic review. PLoS Biology, 15(9), Fi s h e r & Ti pt o n 2 015
e2002173. https://doi.org/10.1371/journal.pbio.2002173 Fisher, Z., & Tipton, E. (2015). robumeta: An R‐package for robust var-
iance estimation in meta‐analysis. https://cran.r‐project.org/web/
C u rt i s 2 0 04 packages/robumeta/vignettes/robumetaVignette.pdf

Curtis, N. M., Ronan, K. R., & Borduin, C. M. (2004). Multisystemic Treatment:


A meta‐analysis of outcome studies. Journal of Family Psychology, 18(3), F r as e r 1 99 7 a
411–419. https://doi.org/10.1037/0893‐3200.18.3.411 Fraser, M. W. (1997a). Risk and resilience in childhood: An ecological
perspective. Washington DC: NASW Press.
D e ro g a tis 19 8 3
Derogatis, L. R. (1983). SCL‐90‐R: Manual‐II. Towson, MD: Clinical Psy- F r as e r 1 99 7 b
chometric Research. Fraser, M. W., Nelson, K. E., & Rivard, J. C. (1997). Effectiveness of family
preservation services. Social Work Research, 21(3), 138–153. https://
D e ro g a tis 19 9 3 doi.org/10.1093/swr/21.3.138

Derogatis, L. R. (1993). Brief symptom inventory: Administration, scoring,


and procedural manual. Minneapolis: National Computer Systems. Gand hi 2006
Gandhi, A. G., Murphy‐Graham, E., Petrosino, A., Chrismer, S. S., & Weiss, C.
D w a n 2 0 10 H. (2006). The devil is in the details: Examining the evidence for “proven”
school‐based drug abuse prevention programs. Evaluation Review, 31(1),
Dwan, K., Gamble, C., Kolamunnage‐Dona, R., Mohammed, S., Powell, C., &
43–74. https://doi.org/10.1177/0193841X06287188
Williamson, P. (2010). Assessing the potential for outcome reporting
bias in a review: a tutorial. Trials, 11(1), 52. https://doi.org/10.1186/
1745‐6215‐11‐52 Goorde n 2016
Goorden, M., Schawo, S. J., Bouwmans‐Frijters, C. A. M., van der Schee, E.,
D w a n 2 0 13 & Hendriks, V. M. (2016). Hakkaart‐van Roijen L. The cost‐
effectiveness of family/family‐based therapy for treatment of ex-
Dwan, K., Gamble, C., Williamson, P. R., Kirkham, J. J., & the Reporting
ternalizing disorders, substance use disorders and delinquency: A
Bias Group (2013). Systematic review of the empirical evidence of
systematic review. BMC Psychiatry, 16, 237. https://doi.org/10.1186/
study publication bias and outcome reporting bias ‐ an updated
s12888‐016‐0949‐8
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
112 of 192 | LITTELL ET AL.

G o rm an 2 00 3 AuClaire (Eds.), Home‐based services for troubled children.


Gorman, D. M. (2003). The best of practices, the worst of practices: The (pp. 113–130). Lincoln, Nebraska: University of Nebraska Press.
making of science‐based primary prevention programs. Psychiatric Ser-
vices, 54(8), 1087–1089. https://doi.org/10.1176/appi.ps.54.8.1087 He nggel er 1 99 6
Henggeler, S. W., Cunningham, P. B., Pickrel, S. G., Schoenwald, S. K., &
Gor ma n 20 17 Brondino, M. J. (1996). Multisystemic therapy: An effective violent
Gorman, D. M. (2017). The decline effect in evaluations of the impact of prevention approach for serious juvenile offenders. Journal of Ado-
the Strengthening Families Program for Youth 10‐14 (SFP 10‐14) on lescence, 19, 47–61. https://doi.org/10.1006/jado.1996.0005
adolescent substance use. Children and Youth Services Review, 81,
29–39. https://doi.org/10.1016/j.childyouth.2017.07.009 He nggel er 1 99 8
Henggeler, S. W., Schoenwald, S. K., Borduin, C. M., Rowland, M. D., &
Gor ma n 20 18 Cunningham, P. B. (1998). Multisystemic treatment of antisocial be-
Gorman, D. M. (2018). Can We Trust Positive Findings of Intervention havior in children and adolescents. New York: Guilford Press.
Research? The Role of Conflict of Interest. Prevention Science, 19(3),
295–305. https://doi.org/10.1007/s11121‐016‐0648‐1 He nggel er 2 00 2a
Henggeler, S. W., Schoenwald, S. K., Rowland, M. D., & Cunningham, P. B.
Gu r ev it ch 20 18 (2002a). Serious emotional disturbances in children and adolescents:
Gurevitch, J., Koricheva, J., Nakagawa, S., & Stewart, G. (2018). Meta‐analysis Multisystemic therapy. New York: Guilford Press.
and the science of research synthesis. Nature, 555(7695), 175–182.
He nggel er 2 00 2b
Ha l e y 19 7 6 Henggeler, S. W., Schoenwald, S. K., Liao, J. G., Letourneau, E. J., &
Haley, J. (1976). Problem solving therapy. San Francisco: Jossey‐Bass. Edwards, D. L. (2002b). Transporting efficacious treatments to field
settings: The link between supervisory practices and therapist fidelity in
MST programs. Journal of Clinical Child and Adolescent Psychology, 31(2),
Ha z e l 20 0 8
155–167. https://doi.org/10.1207/S15374424JCCP3102_02
Hazel, N. (2008). Cross‐national comparison of youth justice. http://usir.
salford.ac.uk/id/eprint/50528/1/Cross_national_final.pdf
He nggel er 2 00 3
Henggeler, S. W., Rowland, M. D., Halliday‐boykins, C., Sheidow, A. J.,
Hedg es 19 80
Ward, D. M., Randall, J., Pickrel, S. G., Cunningham, P. B., & Edwards, J.
Hedges, L. V., & Olkin, I. (1980). Vote‐counting methods in research (2003). One‐year follow‐up of multisystemic therapy as an alternative
synthesis. Psychological Bulletin, 88(2), 359–369. https://doi.org/10. to the hospitalization of youths in psychiatric crisis. Journal of the
1037/0033‐2909.88.2.359 American Academy of Child and Adolescent Psychiatry, 42(5),
543–551. https://doi.org/10.1097/01.CHI.0000046834.09750.5F
Hedg es 20 10
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance esti- He nggel er 2 00 4
mation in meta‐regression with dependent effect size estimates. Re- Henggeler, S. W. (2004). Decreasing effect sizes for effectiveness studies ‐
search Synthesis Methods, 1(1), 39–65. https://doi.org/10.1002/jrsm.5 Implications for the transport of evidence‐based treatments: Comment
on Curtis, Ronan, and Borduin (2004). Journal of Family Psychology, 18(3),
H e n g g e l e r 19 91 420–423. https://doi.org/10.1037/0893‐3200.18.3.420
Henggeler, S. W., Borduin, C. M., Melton, G. B., Mann, B. J., Smith, L. A., Hall, J.
A., Cone, L., & Fucci, B. R. (1991). Effects of multisystemic therapy on drug He nggel er 2 00 9
use and abuse in serious juvenile offenders: A progress report from two Henggeler, S. W., Schoenwald, S. K., Borduin, C. M., Rowland, M. D., &
outcome studies. Family Dynamics of Addiction Quarterly, 1(3), 40–51. Cunningham, P. B. (2009). Multisystemic treatment of antisocial be-
havior in children and adolescents (2nd edition.). New York: Guilford
Hen g g e l e r 19 92 a Press.
Henggeler, S. W., Melton, G. B., & Smith, L. A. (1992a). Family preservation
using multisystemic therapy: An effective alternative to incarcerating Hi ggins 20 11
serious juvenile offenders. Journal of Consulting and Clinical Psychology, Higgins, J. T. P. & Green, S. (Eds.). (2011). Cochrane Handbook for Sys-
60(6), 953–961. https://doi.org/10.1037/0022‐006X.60.6.953 tematic Reviews of Interventions (5.1 edition). Chichester, UK: John
Wiley & Sons, Ltd.
H e n g g e l e r 19 93
Henggeler, S. W., Melton, G. B., Smith, L. A., Schoenwald, S. K., & Hanley, J. Hi ggins 20 20
H. (1993). Family preservation using using multisystemic treatment: Higgins, J. P. T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M. J. &
Long tem follow‐up to a clinical trial with serious juvenile offenders. Welch, V. A. (Eds.). (2020). Cochrane Handbook for Systematic Re-
Journal of child and family studies, 2(4), 283–293. https://doi.org/10. views of Interventions version 6.1 (updated September 2020). Co-
1007/BF01321226 chrane. Available from. www.training.cochrane.org/handbook

H e n g g e l e r 19 95 Ho a g w o o d 2 00 1
Henggeler, S. W., & Borduin, C. M. (1995). Multisystemic treatment of Hoagwood, K., Burns, B. J., Kiser, L., Ringeisen, H., & Schoenwald, S. K.
serious juvenile offenders and their families. In I. M. Schwartz & P. (2001). Evidence‐based practice in child and adolescent mental health
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 113 of 192

services. Psychiatric Services, 52(9), 1179–1189. https://doi.org/10. La n g e 20 1 6


1176/appi.ps.52.9.1179 Lange, A. M., Scholte, R. H., van Geffen, W., Timman, R., Busschbach, J. J.,
& van der Rijken, R. E. (2016). The lack of cross‐national equivalence
Ho f f ma n n 2 0 14 of a Therapist Adherence Measure (TAM‐R) in multisystemic therapy
Hoffmann, T. C., Glasziou, P. P., Boutron, I., Milne, R., Perera, R., Moher, D., (MST). European Journal of Psychological Assessment, 32(4),
Altman, D. G., Barbour, V., Macdonald, H., Johnston, M., Lamb, S. E., 312–325. https://doi.org/10.1027/1015‐5759/a000262
Dixon‐Woods, M., McCulloch, P., Wyatt, J. C., Chan, A. W., & Michie, S.
(2014). Better reporting of interventions: template for intervention L an g e 2 01 7
description and replication (TIDieR) checklist and guide. British Med- Lange, A. M., van der Rijken, R. E., Delsing, M. J., Busschbach, J. J., van
ical Journal, 348, g1687. https://doi.org/10.1136/bmj.g1687 Horn, J. E., & Scholte, R. H. (2017). Alliance and adherence in a sys-
temic therapy. Child and Adolescent Mental Health, 22(3), 148–154.
ICMJE data sharing https://doi.org/10.1111/camh.12172
International Committee of Medical Journal Editors. Data sharing. http://
www.icmje.org/recommendations/browse/publishing‐and‐editorial‐ L an g e 2 01 8
issues/clinical‐trial‐registration.html#two Lange, A. M., Delsing, M. J., Scholte, R. H., & van der Rijken, R. E. (2018).
Factorial structure of the Therapist Adherence Measure‐Revised
ICM JE tr ia l reg is tr at io n (TAM‐R) within multisystemic therapy. European Journal of Psycholo-
International Committee of Medical Journal Editors. Trial registration. gical Assessment, 36, 427–431. https://doi.org/10.1027/1015‐5759/
http://www.icmje.org/recommendations/browse/publishing‐and‐ a000505
editorial‐issues/clinical‐trial‐registration.html#one
L at i m e r 2 00 1
Institute o f M edici n e 2015 Latimer, J. (2001). A meta‐analytic examination of youth delinquency,
Institute of Medicine. (2015). Sharing Clinical Trial Data: Maximizing family treatment, and recidivism. Canadian Journal of Criminology and
Benefits, Minimizing Risk. Washington, DC: The National Academies Criminal Justice, 43(2), 237–253. https://doi.org/10.3138/cjcrim.43.2.
Press. https://doi.org/10.17226/18998 237

Ioanni dis 2 014 L e s c h i e d 2 00 2a


Ioannidis, J. P. A., Greenland, S., Hlatky, M. A., Khoury, M. J., Macleod, M. Leschied, A. W., & Cunningham, A. (2002a). Seeking effective interven-
R., Moher, D., Schulz, K. F., & Tibshirani, R. (2014). Increasing value tions for young offenders: Interim results of a four‐year randomized
and reducing waste in research design, conduct, and analysis. The study of multisystemic therapy in Ontario, Canada. London, Ontario:
Lancet, 383(9912), 166–175. https://doi.org/10.1016/S0140‐ Centre for Children and Families in the Justice System.
6736(13)62227‐8
L i e b 2 01 6
Jü n i 1 99 9 Lieb, K., Osten‐Sacken, Jvonder, Stoffers‐Winterling, J., Reiss, N., &
Jüni, P., Witschi, A., Bloch, R., & Egger, M. (1999). The hazards of scoring Barth, J. (2016). Conflicts of interest and spin in reviews of psycho-
the quality of clinical trials for meta‐analysis. Journal of the American logical therapies: a systematic review. BMJ Open, 6(4), e010606.
Medical Association, 282, 1054–1060. https://doi.org/10.1001/jama. https://doi.org/10.1136/bmjopen‐2015‐010606
282.11.1054
L i p se y 1 99 8
Jü n i 2 00 1 Lipsey, M. W., & Wilson, D. B. (1998). Effective intervention for serious
Jüni, P., Altman, D. G., & Egger, M. (2001). Assessing the quality of con- juvenile offenders. In R. Loeber & D. P. Farrington (Eds.), Serious and
trolled clinical trials. British Medical Journal, 323(7303), 42–46. https:// violent juvenile offenders: Risk factors and successful interventions.
doi.org/10.1136/bmj.323.7303.42 (pp. 313–345). Thousand Oaks, CA: Sage Publications.

Kazd in 19 98 Litt ell 2 00 4


Kazdin, A. E., & Weisz, J. R. (1998). Identifying and developing empirically Littell, J. H., Popa, M., & Forsythe, B. (2004). Multisystemic therapy for
supported child and adolescent treatments. Journal of Consulting and social, emotional, and behavioral problems in youth aged 10‐17:
Clinical Psychology, 66(1), 19–36. https://doi.org/10.1037/0022‐006X. Protocol. Campbell Systematic Reviews, 1(1), 1–15. https://doi.org/10.
66.1.19 1002/CL2.8

Kazd in 20 15 Litt ell 2 00 6


Kazdin, A. E. (2015). Psychosocial treatments for conduct disorder in Littell, J. H. (2006). The case for Multisystemic Therapy: Evidence or or-
children and adolescents. In P. E. Nathan & J. M. Gorman (Eds.), A thodoxy? Children and Youth Services Review, 28, 458–472. https://doi.
guide to treatments that work (4th edition, pp. 141–173). Oxford org/10.1016/j.childyouth.2005.07.002
University Press.
Litt ell 2 00 8
Keerie 2018 Littell, J. H. (2008). Evidence‐based or biased? The quality of published
Keerie, C., Tuck, C., Milne, G., Eldridge, S., Wright, N., & Lewis, S. C. (2018). reviews of evidence‐based practices. Children and Youth Services Re-
Data sharing in clinical trials – practical guidance on anonymising trial view, 30, 1299–1317. https://doi.org/10.1016/j.childyouth.2008.
datasets. Trials, 19(1), 25. https://doi.org/10.1186/s13063‐017‐2382‐9 04.001
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
114 of 192 | LITTELL ET AL.

Lofholm 2013 Service utilization for lifetime mental disorders in U.S. adolescents:
Lofholm, C. A., Brannstrom, L., Olsson, M., & Hansson, K. (2013). Results of the National Comorbidity Survey‐Adolescent Supplement
Treatment‐as‐usual in effectiveness studies: What is it and does it (NCS‐A). Journal of the American Academy of Child and Adolescent
matter. International Journal of Social Welfare, 22(1), 25–34. https://doi. Psychiatry, 50(1), 32–45. https://doi.org/10.1016/j.jaac.2010.10.006
org/10.1111/j.1468‐2397.2012.00870.x
M i h a l i c 2 00 4
Lu b o r sk y 1 99 9 Mihalic, S., Fagan, A., Irwin, K., Ballard, D., & Elliott, D. (2004). Blueprints
Luborsky, L., Diguer, L., Seligman, D. A., Rosenthal, R., Krause, E. D., for Violence Prevention. US Department of Justice Office of Juvenile
Johnson, S., Halperin, G., Bishop, M., Berman, J. S., Schweizer, E., & Justice and Delinquency Prevention; 2004. Report No.: NCJ 204274.
Luborsky, L. (1999). The researcher's own therapy allegiances: A 'wild https://www.ojp.gov/pdffiles1/ojjdp/204274.pdf
card' in comparisons of treatment efficacy. Clinical Psychology: Science
and Practice, 6, 95–106. https://doi.org/10.1093/clipsy.6.1.95 Minuchin 1 974
Minuchin, S. (1974). Families and family therapy. Cambridge, MA: Harvard
Lundh 2 020 University Press.
Lundh, A., Boutron, I., Stewart, L., & Hróbjartsson, A. (2020). What to do with
a clinical trial with conflicts of interest. BMJ Evidence‐Based. Medicine, Moher 2009
25(5), 157–158. https://doi.org/10.1136/bmjebm‐2019‐111230 Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G., The PRISMA Group.
(2009). Preferred Reporting Items for Systematic Reviews and Meta‐
Lu x 2 01 6 Analyses: The PRISMA Statement. PLoS Medicine, 6(7), e1000097.
Lux, J. L. (2016). Assessing the effectiveness of Multisystemic Therapy: A https://doi.org/10.1371/journal.pmed.1000097
meta‐analysis. Ph.D. Dissertation, University of Cincinnati.
Moher 2015
Ma r kh a m 2 01 6 Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M.,
Markham, A. C. C. (2016). Multisystemic therapy: therapist experience of Shekelle, P., & Stewart, L. A. (2015). Preferred reporting items for sys-
programme delivery, processes and outcomes. University of Bir- tematic review and meta‐analysis protocols (PRISMA‐P) 2015 statement.
mingham. Available from http://etheses.bham.ac.uk/6831/ Systematic Reviews, 4(1), 1. https://doi.org/10.1186/2046‐4053‐4‐1

Ma r kh a m 2 01 8 M S T Se r v i c e s 2 01 9
Markham, A. (2018). A review following systemic principles of multi- MST Services, Inc. (2019). Multisystemic Therapy Adaptations: Pilot
systemic therapy for antisocial behavior in adolescents aged 10‐17 Studies to Large‐Scale Dissemination. https://www.mstservices.com
years. Adolescent Research Review, 3(1), 67–93. https://doi.org/10.
1007/s40894‐017‐0072‐1 M ST Se rvice s 2 02 0a
MST Services, Inc. (2020a). Multisystemic Therapy (MST) research at a
McH u g h 2 0 12 glance: Published MST outcome, implementation and benchmarking
McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia studies. https://www.mstservices.com/mst-whitepapers
Medica, 22(3), 276–282.
M ST Se rvice s 2 02 0b
McL e o d 2 0 04 MST Services, Inc. (2020b). Proven results. https://www.mstservices.com/
McLeod, B. D., & Weisz, J. R. (2004). Using dissertations to examine po- proven‐results (accessed 11 April 2020).
tential bias in child and adolescent clinical trials. Journal of Consulting
and Clinical Psychology, 72(2), 235–251. https://doi.org/10.1037/ Na r u syt e 20 16
0022‐006X.72.2.235 Narusyte, J., Ropponen, A., Alexanderson, K., & Svedberg, P. (2017). In-
ternalizing and externalizing problems in childhood and adolescence
Me rik an g a s 2 00 9 as predictors of work incapacity in young adulthood. Social Psychiatry
Merikangas, K. R., Nakamura, E. F., & Kessler, R. C. (2009). Epidemiology and Psychiatric Epidemiology, 52(9), 1159–1168. https://doi.org/10.
of mental disorders in children and adolescents. Dialogues in clinical 1007/s00127‐017‐1409‐6
neuroscience, 11(1), 7–20. https://doi.org/10.31887/DCNS.2009.11.1/
krmerikangas NI C E 2 0 13
National Institute for Health and Care Excellence. (2013). Antisocial be-
Me rik an g a s 2 01 0 havior and conduct disorders in children and young people: The NICE
Merikangas, K. R., He, J., Burstein, M., Swanson, S. A., Avenevoli, S., Cui, L., guideline on recognition, interention, and magagement. National
Benjet, C., Georgiades, K., & Swendsen, J. (2010). Lifetime prevalence Clinical Guideline Number 158. London: National Collaborating
of mental disorders in U.S. adolescents: results from the National Centre for Mental Health and Social Care Institute for Excellence.
Comorbidity Survey Replication‐‐Adolescent Supplement (NCS‐A). https://www.nice.org.uk/guidance/cg158/evidence/full‐guideline‐pdf‐
Journal of the American Academy of Child and Adolescent Psychiatry, 189848413
49(10), 980–989. https://doi.org/10.1016/j.jaac.2010.05.017
NI C E 2 0 18
Me rik an g a s 2 01 1 National Institute for Health and Care Excellence. (2018). Surveillance
Merikangas, K. R., He, J. P., Burstein, M., Swendsen, J., Avenevoli, S., report (exceptional review) 2018 – Antisocial behaviour and conduct
Case, B., Georgiades, K., Heaton, L., Swanson, S., & Olfson, M. (2011). disorders in children and young people: Recognition and management
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 115 of 192

(2013) NICE guideline CG158 (p. 11). 2 https://www.nice.org.uk/ Qu a y 1 98 7


guidance/cg158/resources/surveillance‐report‐exceptional‐review‐ Quay, H. C., & Peterson, D. R. (1987). Manual for the Revised Problem
2018‐antisocial‐behaviour‐and‐conduct‐disorders‐in‐children‐and‐ Behavior Checklist. Coral Gables, FL: University of Miami.
young‐people‐recognition‐and‐management‐2013‐nice‐guideline‐
cg158‐4842384733/chapter/Surveillance‐decision?tab=evidence
Rothstein 2 005
Rothstein, H. R., Sutton, A. J., & Bornstein, M. (2005). Publication bias in
NID A 1 99 9
meta‐analysis: Prevention, assessment and adjustments. Chichester,
National Institute on Drug Abuse (NIDA). (1999). Principles of drug addiction UK: John Wiley & Sons, Ltd.
treatment: A research‐based guide (http://www.nida.nih.gov/PODAT/
PODATindex.html). Rockville, MD: National Institution on Drug Abuse.
R o w l a n d 20 1 9
Rowland, M. D. (2019). A psychiatric adaptation of Multisystemic Therapy
Nunnally 1994
for suicidal youth. In M. Berk (Ed.), Evidence‐based treatment ap-
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd edi- proaches for suicidal adolescents: Translating science into practice
tion.). New Tork: McGraw‐Hill. (pp. 191–228). American Psychiatric Association.

Ols o n 1 9 82 Sa wyer 20 19
Olson, D. H., Portner, H., & Bell, R. (1982). FACES‐II. In D. H. Olson, H. I. Sawyer, W. (2019). Youth Confinement: The Whole Pie. https://www.
Mccubbin, H. L. Barnes, A. Larsen, M. Muxen & M. Wilson (Eds.), Family prisonpolicy.org/reports/youth2019.html. 2019.
inventories (pp. 5–24). St. Paul: University of Minnesota, Department
of Family Social Science.
Scha eff er 2 00 0
Schaeffer, C. M. (2000). Moderators and mediators of therapeutic change
Ols o n 1 9 85
in multisystemic treatment of serious juvenile offenders. Ph.D. dis-
Olson, D. H., Portner, J., & Lavee, Y. (1985). FACES‐III. St Paul: University sertation, University of Missouri‐Columbia.
of Minnesota, Department of Family Social Science.

Schoenwald 2000
Pa t ter so n 1 98 5
Schoenwald, S. K., Henggeler, S. W., Brondino, M. J., & Rowland, M. D.
Patterson, G. R., & Dishion, T. J. (1985). Contributions of family and peers (2000). Multisystemic therapy: Monitoring treatment fidelity. Family
to delinquency. Criminology, 23, 63–79. https://doi.org/10.1111/j. Process, 39(1), 83–103. https://doi.org/10.1111/j.1545‐5300.2000.
1745‐9125.1985.tb00326.x 39109.x

Pe tr o sin o 2 00 5 Schoenwald 2001


Petrosino, A., & Soydan, H. (2005). The impact of program developers as Schoenwald, S. K., & Hoagwood, K. (2001). Effectiveness, transport-
evaluators on criminal recidivism: Results from meta‐analyses of experi- ability, and dissemination of interventions: What matters when?
mental and quasi‐experimental research. Journal of Experimental Criminol- Psychiatric Services, 52(9), 1190–1197. https://doi.org/10.1176/
ogy, 1(4), 435–450. https://doi.org/10.1007/s11292‐005‐3540‐8 appi.ps.52.9.1190

Pi go t t 2 01 9 Schulz 1 995
Pigott, T. D. (2019). Missing data in meta‐analysis. In H. Cooper, L. V. Hedges Schulz, K. F., Chalmers, I., Hayes, R. J., & Altman, D. G. (1995). Em-
& J. C. Valentine (Eds.), The handbook of research synthesis and meta‐ pirical evidence of bias. Dimensions of methodological quality as-
analysis (3rd edition, pp. 367–381). New York: Russell Sage Foundation. sociated with estimates of treatment effects in controlled trials.
JAMA, 273(5), 408–412. https://doi.org/10.1001/jama.1995.035
Po n t o p p i d a n 20 1 2 20290060030
Pontoppidan, M. (2012). The long road to start up RCTs within the social
services interventions in Denmark. Clinical Trials, 9(4), 455–554. Sh a d i s h 2 0 02
https://doi.org/10.1177/1740774512453224 Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and
quasi‐experimental designs for general causal inference. Boston:
Pus te jov sk y & T ipt on 20 20 Houghton Mifflin.
Pustejovsky, J. E., & Tipton, E. (2020). Meta‐Analysis with Robust Variance
Estimation: Expanding the Range of Working Models. MetaArXiv, Sh e a 20 0 7
https://doi.org/10.31222/osf.io/vyfcj Shea, B. J., Grimshaw, J. M., Wells, G. A., Boers, M., Andersson, N.,
Hamel, C., Porter, A. C., Tugwell, P., Moher, D., & Bouter, L. M. (2007).
Pu z z an ch e ra 20 09 Development of AMSTAR: a measurement tool to assess the metho-
Puzzanchera, C. (2009). Juvenile arrests 2007. U.S. Department of Justice, dological quality of systematic reviews. BMC Medical Research Metho-
Office of Juvenile Justice and Delinquency Prevention. https://files. dology, 7(10), 10. https://doi.org/10.1186/1471‐2288‐7‐10
eric.ed.gov/fulltext/ED505593.pdf
Sh e e r i n 2 01 7
Pu z z an ch e ra 20 20 Sheerin, K. M. (2017). Multisystemic therapy with juvenile sexual offen-
Puzzanchera, C. (2020) Juvenile Arrests, 2018. U.S. Department of Justice, ders: A 10.2‐year follow‐up to a randomized effectiveness trial. Un-
Office of Juvenile Justice and Delinquency Prevention. https://ojjdp. published dissertation. University of Missouri‐‐Columbia [full text not
ojp.gov/sites/g/files/xyckuh176/files/media/document/254499.pdf available].
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
116 of 192 | LITTELL ET AL.

Si mp son 1 99 2 disorders: systematic review. London Journal of Primary Care, 9(6), 95


Simpson, D. D., & McBride, A. A. (1992). Family, friends, and self (FFS) –103. https://doi.org/10.1080/17571472.2017.1362713
assessment scales for Mexican American youth. Hispanic Journal of
Behavioral Science, 14(3), 327–340. https://doi.org/10.1177/ Tanne r‐ Sm ith 2 01 4
07399863920143003 Tanner‐Smith, E. E., & Tipton, E. (2014). Robust variance estimation with
dependent effect sizes: Practical considerations including a software
Skinner 1 983 tutorial in Stata and spss. Research Synthesis Methods, 5(1), 13–30.
Skinner, H. A., Steinhauer, P. D., & Santa‐Barbara, J. (1983). The Family https://doi.org/10.1002/jrsm.1091
Assessment Measure. Canadian Journal of Community Mental Health, 2,
91–105. https://doi.org/10.7870/cjcmh‐1983‐0018 Tanne r‐ Sm ith 2 01 6
Tanner‐Smith, E. E., Tipton, E., & Polanin, J. R. (2016). Handling Complex
Smi th 1 99 7 Meta‐analytic Data Structures Using Robust Variance Estimates: a
Smith, C. A., & Stern, S. B. (1997). Delinquency and antisocial behavior: A Tutorial in R. Journal of Developmental and Life‐Course Criminology, 2(1),
review of family processes and intervention research. Social Service 85–112. https://doi.org/10.1007/s40865‐016‐0026‐5
Review, 71(3), 382–420. https://doi.org/10.1086/604263
Tsujimoto 2 017
Smi th 2 01 0 Tsujimoto, Y., Tsujimoto, H., Kataoka, Y., Kimachi, M., Shimizu, S., Ikenoue, T.,
Smith, J. P., & Smith, G. C. (2010). Long‐term economic costs of psycho- Fukuma, S., Yamamoto, Y., & Fukuhara, S. (2017). Majority of systematic
logical problems during childhood. Social Science & Medicine, 71(1), 110 reviews published in high‐impact journals neglected to register the
–115. https://doi.org/10.1016/j.socscimed.2010.02.046 protocols: a meta‐epidemiological study. Journal of Clinical Epidemiology,
84, 54–60. https://doi.org/10.1016/j.jclinepi.2017.02.008

Song 20 09
US D HH S 2 00 1
Song, F., Parekh‐Bhurke, S., Hooper, L., Loke, Y. K., Ryder, J. J., Sutton, A.
J., Hing, C. B., & Harvey, I. (2009). Extent of publication bias in dif- United States Department of Health and Human Services, Office of the
ferent categories of research cohorts: a meta‐analysis of empirical Surgeon General. (2001). Youth violence: A report of the Surgeon
studies. BMC Medical Research Methodology, 9(1), 79. https://doi.org/ General. Washington, DC: US DHSS. https://pubmed.ncbi.nlm.nih.gov/
10.1186/1471‐2288‐9‐79 20669522/

Song 20 10 v a n de r P o l 2 0 1 9
Song, F., Parekh, S., Hooper, L., Loke, Y., Ryder, J., Sutton, A., Hing, C., van der Pol, T. M., van Domburgh, L., van Widenfelt, B. M., Hurlburt, M. S.,
Kwok, C., Pang, C., & Harvey, I. (2010). Dissemination and publication of Garland, A. F., & Vermeiren, R. (2019). Common elements of evidence‐
research findings: An updated review of related biases. Health Technology based systemic treatments for adolescents with disruptive behaviour
Assessment, 14(8), 1–220. https://doi.org/10.3310/hta14080 problems. Lancet Psychiatry, 6(10), 862–868. https://doi.org/10.1016/
s2215‐0366(19)30089‐9

St e w a rt 2 0 12
v a n de r S t o u w e 20 1 4
Stewart, L., Moher, D., & Shekelle, P. (2012). Why prospective registration
of systematic reviews makes sense. Systematic Reviews, 1, 7. https:// van der Stouwe, T., Asscher, J. J., Stams, G. J. J. M., Deković, M., & van der
doi.org/10.1186/2046‐4053‐1‐7 Laan, P. H. (2014). The effectiveness of Multisystemic Therapy (MST):
A meta‐analysis. Clinical Psychology Review, 34(6), 468–481. https://
doi.org/10.1016/j.cpr.2014.06.006
Sundel l 2014
Sundell, K., Ferrer‐Wreder, L., & Fraser, M. W. (2014). Going Global: A
Wel sh 2 012
Model for Evaluating Empirically Supported Family‐Based Interven-
tions in New Contexts. Evaluation & the Health Professions, 37(2), 203 Welsh, B. C., Braga, A. A., & Hollis‐Peel, M. E. (2012). Can “disciplined passion”
–230. https://doi.org/10.1177/0163278712469813 overcome the cynical view? An empirical inquiry of evaluator influence
on police crime prevention program outcomes. Journal of Experimental
Criminology, 8(4), 415–431. https://doi.org/10.1007/s11292‐012‐9153‐0
Swe ns o n 20 0 3
Swenson, C. C., & Henggeler, S. W. (2003). Multisystemic therapy (MST)
W H O 20 2 0
for maltreated children and their families. In B. E. Saunders, L. Berliner
& R. F. Hanson (Eds.), Child Physical and Sexual Abuse: Guidelines for World Health Organization. (2020). Adolescent mental health. https://
treatment (Final report: January 15, 2003) (pp. 75–78). Charleston, www.who.int/news‐room/fact‐sheets/detail/adolescent‐mental‐
SC: National Crime Victims Research and Treatment Center. health

Swe ns o n 20 0 5 W H O tr ia l reg is tr at io n
Swenson, C. C., Henggeler, S. W., Taylor, I. S., & Addison, O. W. (2005). World Health Organization. (2021). Trial registration. https://www.who.
Multisystemic therapy and neighborhood partnerships: Reducing int/clinical‐trials‐registry‐platform/network/trial‐registration (ac-
adolescent violence and substance abuse. Guildford Press. cessed 3 September 2021).

Ta n 20 1 7 W i n t e r s 19 8 9
Tan, J. X., & Fajardo, M. (2017). Efficacy of multisystemic therapy in Winters, K. C., & Henley, G. (1989). The Personal Experiences Inventory.
youths aged 10‐17 with severe antisocial behaviour and emotional Los Angeles: Western Psychological Services.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 117 of 192

WMA 2013 APPENDICES


World Medical Association. (2013). Declaration of Helsinki: Ethical prin-
ciples for medical research involving human subjects. https://www. A. Detailed search strategies
wma.net/policies‐post/wma‐declaration‐of‐helsinki‐ethical‐principles‐
Database searches were performed in 2010 and again in 2020, as
for‐medical‐research‐involving‐human‐subjects
described in the text. Some changes in search strings and syntax
Woolfend en 2002 were made in 2020, due to changes in databases or interfaces, or to

Woolfenden, S. R., Williams, K., & Peat, J. K. (2002). Family and parenting the imposition of new date restrictions to avoid duplication of early
interventions for conduct disorder and delinquency: A meta‐analysis searches. All such changes are noted below.
of randomized controlled trials. Archives of Disease in Childhood, 86, ASSIA
251–256.
2010:
(DE = "multisystemic therapy") or (KW = (multisystemic therap* or
Woolfend en 2004
multi‐systemic therap*)) or (KW = (multisystemic treatment* or multi‐
Woolfenden, S., Williams, K., & Peat, J. (2004). Family and parenting in-
terventions in children and adolescents with conduct disorder and systemic treatment*))
delinquency aged 10‐17 (Cochrane Review). Cochrane Database of 2020: as above with limits applied (Date range 2010‐2020)
Systematic Reviews, Issue 1), https://doi.org/10.1002/14651858. Cambridge University Press Journals Complete
CD003015. Art. No.: CD003015.
“multisystemic therapy”, “multi‐systemic therapy”, "multisystemic
treatment", “multi‐systemic treatment”
WW C at t r it i o n
CINAHL
What Works Clearinghouse. (2018). WWC standards brief: Attrition
S25 S23 or S24
standard. https://ies.ed.gov/ncee/wwc/Docs/referenceresources/
wwc_brief_attrition_080715.pdf S24 S21 and S22
S23 S14 and S21
WWC base line S22 (compar* or research* or evaluat* or outcome* or intervent* or
What Works Clearinghouse. (2018). WWC standards brief: Baseline effectiv*)
equivalence. US Institute of Education Sciences. https://ies.ed.gov/ S21 S15 or S16 or S17 or S18 or S19 or
ncee/wwc/Docs/referenceresources/wwc_brief_baseline_080715.pdf
S20 multi systemic treatment*
S19 multi‐systemic treatment*
Other published versions of this review
S18 multisystemic treatment*
L i tt e l l 2 00 5 a S17 multi systemic therap*
Littell, J. H., Popa, M., & Forsythe, B. (2005a). Multisystemic Therapy for S16 multi‐systemic therap*
social, emotional, and behavioral problems in youth aged 10‐17. S15 multisystemic therap*
Campbell Systematic Reviews, 1(1), 1–63. https://doi.org/10.4073/csr.
S14 S1 or S2 or S3 or S4 or S5 or S6 or S7 or S8 or S9 or S10 or S11
2005.1
or S12 or S13
Lit te l l 20 0 5b S13 "cross over*"

Littell, J. H., Popa, M., & Forsythe, B. (2005b). Multisystemic Therapy S12 crossover*
for social, emotional, and behavioral problems in youth aged S11 (MH "Crossover Design")
10‐17. Cochrane Database of Systematic Reviews, Issue 4), https:// S10 (tripl* N3 mask*) or (tripl* N3 blind*)
doi.org/10.1002/14651858.CD004797.pub4. Art. No.: CD
S9 (trebl* N3 mask*) or (trebl* N3 blind*)
004797.
S8 (doubl* N3 mask*) or (doubl* N3 blind*)
S7 (singl* N3 mask*) or (singl* N3 blind*)
SU P P ORT IN G INF O RM A TIO N S6 (clinic* N3 trial*) or (control* N3 trial*)
Additional supporting information may be found online in the S5 (random* N3 allocat*) or (random* N3 assign*)
Supporting Information section. S4 randomis* or randomiz*
S3 (MH "Meta Analysis")
S2 (MH "Clinical Trials + ")

How to cite this article: Littell J. H., Pigott T. D., Nilsen K. H., S1 MH random assignment

Green S. J., & Montgomery O. L. K. Multisystemic Therapy® 2020: top line altered as follows

for social, emotional, and behavioural problems in youth age S25 S23 or S24 (Limiters ‐ Published Date: 20100101‐20201231)

10 to 17: An updated systematic review and meta‐analysis. EMBASE Classic + Embase

Campbell Systematic Reviews, 2021;17:e1158. 1 multisystemic therap$.mp.

https://doi.org/10.1002/cl2.1158 2 multi‐systemic therap$.mp.


3 multisystemic treatment$.mp.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
118 of 192 | LITTELL ET AL.

4 multi‐systemic treatment$.mp. 9 EVALUATION‐METHODS.de.


5 Clinical trial/ 10 (random$ or compar$ or research$ or evaluat$ or outcome$ or
6 Randomized controlled trial/ intervent$ or effectiv$).mp. [mp=abstract, title, heading word,
7 Randomization/ identifiers]
8 Single blind procedure/ 11 or/2‐10
9 Double blind procedure/ 12 1 and 11
10 Crossover procedure/
11 Placebo/ MEDLINE
12 Randomi#ed.tw. 1 multisystemic therap$.mp.
13 RCT.tw. 2 multi‐systemic therap$.mp.
14 (random$ adj3 (allocat$ or assign$)).tw. (80945) 3 multisystemic treatment$.mp.
15 randomly.ab. 4 multi‐systemic treatment$.mp.
16 groups.ab. 5 or/1‐4 (97)
17 trial.ab. 6 (research or evaluat$ or intervention$ or effectiv$ or out-
18 ((singl$ or doubl$ or trebl$ or tripl$) adj3 (blind$ or mask$)).tw. come$).mp.
19 Placebo$.tw. 7 randomized controlled trial.pt.
20 Prospective study/ 8 controlled clinical trial.pt.
21 (crossover or cross‐over).tw. 9 randomi#ed.ab.
22 prospective.tw. 10 placebo$.ab.
23 or/5‐22 11 drug therapy.fs.
24 or/1‐4 (123) 12 randomly.ab.
25 23 and 24 13 trial.ab.
26 (compar$ or research$ or evaluat$ or outcome$ or intervent$ or 14 groups.ab.
effectiv$).mp. 15 or/7‐14
27 24 and 26 16 exp animals/not humans.sh.
28 25 or 27 17 15 not 16
2020: added 18 6 or 17
29 (201* or 202*).dc,dp,yr. 19 5 and 18
30 28 and 29 2020: added
20 (201* or 202*).dp,dt,ed,ep,yr.
ERIC 21 19 and 20
2010: ProQuest Dissertations & Theses Global
((multisystemic ADJ therap$) OR (multi‐systemic ADJ therap$) OR diskw(multisystemic OR multi‐systemic) AND (therapy OR treat-
(multi ADJ systemic ADJ therap$) OR (multisystemic ADJ treatment ment)
$) OR (multi‐systemic ADJ treatment$) OR (multi ADJ systemic ADJ
treatment$)) AND ((Intervention.DE.) OR (Outcomes‐of‐ PsycINFO
Treatment.DE.) OR (EXPERIMENTAL‐GROUPS.DE. OR CONTROL‐ 2010:
GROUPS.DE. OR PROGRAM‐EFFECTIVENESS.DE. OR S25 S23 or S24
COMPARATIVE‐ANALYSIS.DE. OR EVALUATION.W. DE. OR S24 S21 and S22
EVALUATION‐METHODS.DE.) OR (random$ OR compar$ OR re- S23 S14 and S21
search$ OR evaluat$ OR outcome$ OR intervent$ OR effectiv$)) S22 (compar* or research* or evaluat* or outcome* or intervent* or
2020: effectiv*)
1 ((multisystemic adj therap$) or (multi‐systemic adj therap$) or S21 S15 or S16 or S17 or S18 or S19 or S20
(multi adj systemic adj therap$) or (multisystemic adj treatment$) or S20 multi systemic treatment*
(multi‐systemic adj treatment$) or (multi adj systemic adj treatment S19 multi‐systemic treatment*
$)).mp. [mp=abstract, title, heading word, identifiers] S18 multisystemic treatment*
2 intervention.de. S17 multi systemic therap*
3 Outcomes‐of‐Treatment.de. S16 multi‐systemic therap*
4 EXPERIMENTAL‐GROUPS.de. S15 multisystemic therap*
5 CONTROL‐GROUPS.de. S14 S1 or S2 or S3 or S4 or S5 or S6 or S7 or S8 or S9 or S10 or S11
6 PROGRAM‐EFFECTIVENESS.de. or S12 or S13
7 COMPARATIVE‐ANALYSIS.de. S13 "cross over*"
8 EVALUATION.de. S12 crossover*
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 119 of 192

S11 (MH "Crossover Design") #4 #2 OR #1


S10 (tripl* N3 mask*) or (tripl* N3 blind*) #5 TS = (compar* or research* or intervent* or evaluat* or effectiv* or
S9 (trebl* N3 mask*) or (trebl* N3 blind*) outcome* or trial* or random* or group* or control*)
S8 (doubl* N3 mask*) or (doubl* N3 blind*) #6 #5 AND #4
S7 (singl* N3 mask*) or (singl* N3 blind*) 2020:
S6 (clinic* N3 trial*) or (control* N3 trial*) # 3 #2 AND #1
S5 (random* N3 allocat*) or (random* N3 assign*) Indexes=SSCI Timespan=2010‐2020
S4 randomis* or randomiz* # 2 TOPIC: ((compar* or research* or intervent* or evaluat* or ef-
S3 (MH "Meta Analysis") fectiv* or outcome* or trial* or random* or group* or control*))
S2 (MH "Clinical Trials + ") Indexes=SSCI Timespan=All years
S1 MH random assignment # 1 TS = (((multisystemic NEAR/1 therap*) or (multisystemic NEAR/1
2020: Search adapted to OVID format. Search filter adapted to re- treatment*))) OR TS = (((multi‐systemic NEAR/1 therap*) or (multi‐
commended Cochrane filter, but with the previous cross‐over terms. systemic NEAR/1 treatment*))) OR TS = (((multi NEAR/1 systemic
1 Treatment Effectiveness Evaluation/ NEAR/1 therap*) or (multi NEAR/1 systemic NEAR/1 treatment*)))
2 exp Treatment Outcomes/ Indexes=SSCI Timespan=All years
3 Placebo/
4 Followup Studies/ Social Services Abstracts
5 (placebo* or random* or "comparative stud*" or (clinical adj3 trial*) 2010:
or (research adj3 design) or (evaluat* adj3 stud*) or (prospectiv* adj3 (KW = (multisystemic therap* or multi‐systemic therap*)) or(KW = (
stud*) or ((singl* or doubl* or trebl* or tripl*) adj3 (blind* or multisystemic treatment* or multi‐systemic treatment*))
mask*))).mp. 2020:
6 (crossover* or cross‐over*).mp. multisystemic therap* OR multi‐systemic therap* OR multisystemic
7 1 or 2 or 3 or 4 or 5 or 6 treatment* OR multi‐systemic treatment*
8 multisystemic therap*.mp. Limits applied (Date range 2010‐2020)
9 multi‐systemic therap*.mp. Sociological Abstracts
10 multisystemic treatment*.mp. 2010:
11 multi‐systemic treatment.mp. (KW = (multisystemic therap* or multi‐systemic therap*)) or(KW = (
12 8 or 9 or 10 or 11 multisystemic treatment* or multi‐systemic treatment*))
13 (compar* or research* or evaluat* or outcome* or intervent* or 2020:
effectiv*).mp. multisystemic therap* OR multi‐systemic therap* OR multisystemic
14 12 and 13 treatment* OR multi‐systemic treatment*
15 7 and 12 Limits applied (Date range 2010‐2020)
16 14 or 15
17 limit 16 to yr = "2010 ‐Current" Trials (formerly CENTRAL, the Cochrane Central Register of Con-
Science Direct trolled Trials)
2010: 2010:
(multisystemic therapy AND random*) #1 (multisystemic therap*) or (multi‐systemic therap*) or (multi-
2020: systemic treatment) or (multi‐systemic treatment)
“multisystemic therapy” AND random 2020: Added proximity operator and All Text command, due to
“multi‐systemic therapy” AND random change in database interface.
Social Care Online #1 (multisystemic NEXT therap*) or (multi‐systemic NEXT therap*) or
2010: (multisystemic NEXT treatment) or (multi‐systemic NEXT treatment)
(freetext = "multisystemic " or freetext = "multi‐systemic "or free- in All Text
text = "multi systemic ") and (freetext = "therap* " or freetext = " WorldCat dissertations and theses
treatment* ") 2010: (kw: multi‐systemic OR kw: multisystemic) and (kw: treatment
2020: OR kw: therapy)
All fields: multisystemic or "multi‐systemic "or "multi systemic" 2020: (kw: multi‐systemic OR kw: multisystemic) and (kw: treatment
AND All fields: therap* OR treatment* OR kw: therapy) and yr: 2010‐2020
AND Publication year: 2010‐2020 Websites
Social Sciences Citation Index (SSCI) NCJRS 2010: (multisystemic OR multi‐systemic)
2010: NCJRS 2020: Multisystemic, Multi‐systemic (PHRASE)
#1 TS = (multisystemic therap* or multisystemic treatment*) U.S. Department of Health and Human Services: “multisystemic
#2 TS = (multi‐systemic therap* or multi‐systemic treatment*) therapy”, “multi‐systemic therapy”, “multisystemic treatment”, "multi
#3 TS = (multi systemic therap* or multi systemic treatment*) systemic treatment"
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
120 of 192 | LITTELL ET AL.

U.S. National Institutes of Health, RePORTer database (formerly Enter relevant data into RevMan
CRISP) Text Search: "multisystemic therapy" OR "multi systemic Level 1: Initial Screening (document level; from titles
therapy" OR "multi‐systemic therapy" OR "multisystemic treatment" and abstracts)
OR "multi systemic treatment" OR "multi‐systemic treatment" (Ad- 1.1. Is this document about MST (perhaps in addition to other topics)?
vanced), Search in: Projects,Publications and publication year limited
between 2019 and 2020, Admin IC: All, Fiscal Year: Active Projects, • 1 Yes
2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010 • 0 No [STOP, code as unrelated]
U.S. Centers for Disease Control: "multi‐systemic therapy", "multi • 9 Can't tell [RETRIEVE FULL TEXT]
systemic therapy", multi systemic treatment", "multi‐systemic treat-
ment", "multisystemic therapy". 1.2. What is this? [select no more than ONE answer per document]
U.S. Government Printing Office (gpo.gov, 13): "multisystemic",
"multi‐systemic", "multi systemic". • 1 Study of MST for medical conditions [STOP]
UK Home Office: "Multisystemic", “Multi systemic” • 2 Review of MST outcome studies [RETRIEVE FULL TEXT, HAR-
Search engine VEST REFERENCES, STOP]
We conducted a Google Scholar search and examined the top 200 hits • 3 Descriptive, correlational, single‐group, or case study [STOP]
sorted by relevance in September 2010. On 31 March 2020, we up- • 4 Theoretical or position paper, editorial, or book review [STOP]
dated this search, limiting the date range to 2010‐2020 and using the • 5 Practice guidelines or treatment manual [STOP]
following search string: "(multisystemic OR multi‐systemic OR "multi • 9 Can't tell [RETRIEVE FULL TEXT]
systemic") AND (therapy OR treatment)". We examined the top 100 • 0 None of the above [ASSIGN THE DOCUMENT A STUDY ID AND
hits, sorted by relevance. REPORT ID]

Level 2: Eligibility Decision (study level; based on full


B. Screening and data extraction forms text, multiple reports if available)
Instructions: Steps in the screening and data Study ID _____ Coder's initials _____________ Date ____________
extraction process Reports associated with this study:

Level 1: Independent screening of documents based on titles and


Report ID First 3 Authors Date
abstracts
1

• If document is not excluded, retrieve full text and go to Level 2 2


• Link multiple documents/reports to studies, assign study ID and 3
report IDs
[ADD INFORMATION ON ADDITIONAL REPORTS AS NEEDED]
Level 2: Independent coding of study eligibility criteria based on all
documents associated with the study 2.1. Is the primary presenting problem a non‐medical condition (e.g., a
social, emotional, or behavioral problem)?
• Compare results, resolve differences, and determine eligibility
• 1 Yes
Level 3: Independent coding of study characteristics based on all • 0 No [STOP]
documents associated with an eligible study • 9 Can't tell

• Compare results and resolve differences 2.2. Does this study include two or more parallel cohorts (groups that
received different treatments and were assessed at the same points
Level 4: Independent coding of data on outcome measures (4a) and in time)?
outcome data (4b)
• 1 Yes
• 4a. Code data on data collection measurement (questionnaires, • 0 No [STOP, excluded]
interviews) and sources, compare results, and resolve • 9 Can't tell
differences
• 4b. Enter outcome data, compare results, and resolve differences 2.3. Is it a randomized experiment?

Level 5: Independent coding of data on study‐level risk of bias • 1 Yes


• Compare results and resolve differences • 0 No [STOP, excluded]
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 121 of 192

• 9 Can't tell Dates


3.5 Start and end dates of enrolment in the study
2.4. Does this study include an eligible MST program (MST not Funding
combined with other interventions)? 3.6 Funding sources (list all) and contract/grant/project numbers if
available
• 1 Yes [include MST, MST‐CAN, MST‐JDC, MST‐PSB, MST‐psychiatric] Settings
• 0 No [STOP; exclude MST‐CM, MST‐BSF, MST‐FIT, BlueSky, MST‐ 3.7. How many separate treatment sites were included in the study?
CRAFT] __________
• 9 Can't tell
3.8. Location of interventions
2.5. Does it focus on youth ages 10‐17?
• 1 Urban
• 1 Yes • 2 Suburban
• 0 No [STOP] • 3 Rural
• 9 Can't tell • 4 Mixed
• 9 Unclear
Level 3: Data Extraction (study level; based on full
text, multiple reports if available) 3.9. Location details (city, state, country)
PLEASE IDENTIFY REPORT NUMBER/DATE AND PAGE NUMBERS
WHERE INFORMATION WAS FOUND (IN RIGHT MARGIN OR IN 3.10. Primary service sector
NOTES ON EXCEL SHEET)
• 1 Juvenile justice
Research methods • 2 Mental Health
3.1. How were comparison/control groups formed? • 3 Child Welfare
• 4 Multiple or other (describe)
• 1 Random assignment • 9 Unclear
• 0 Other [STOP]
Sample Characteristics
3.2. Specify RCT design 3.11. Describe sample inclusion criteria
3.11a. Code primary presenting problems of youth participants
• 1 Simple/systematic (individuals/families)
• 2 Stratified/blocked (IDENTIFY STRATIFYING/BLOCKING • 1 sexual offences
VARIABLES) • 2 other criminal offences
• 3 Yoked pairs (created by timing of enrolment into the study) • 3 status offences or delinquent behaviour
• 4 Matched pairs (IDENTIFY MATCHING VARIABLES) • 4 victim of physical abuse or neglect
• 5 Cluster (group) randomized [CHECK UNIT OF ANALYSIS] • 5 conduct disorder
• 6 Other (SPECIFY) • 6 other psychiatric/mental disorders
• 9 Unclear • 7 other (describe)
• 9 unclear
3.3. Who performed group assignment?
3.12. Sample size
• 1 Research staff
Pg# &
• 2 Program staff Number of cases MST Control Total Notes
• 3 Other (SPECIFY)
Referred to study NA NA
• 9 Unclear
Consented NA NA

3.4. How was random assignment performed? Randomly assigned

Completed treatment
• 1 Computer generated
• 2 Random numbers table
• 3 Coins or dice 3.13. Are outcome data available for program (treatment) com-
• 4 Envelopes pleters only?
• 5 Other (DESCRIBE)
• 9 Unclear • 1 Yes
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
122 of 192 | LITTELL ET AL.

• 0 No • 1 Yes (describe differences)


• 9 Unclear • 0 No (how do we know?)
• 9 Unclear
3.14. Demographic characteristics
MST Service Characteristics

Total Pg# & Pg# &


sample Notes Min Max Mean SD Notes

Gender % Male 3.18. Duration of


services (days)
Youth ages Mean (sd)
Min and Max 3.19. Total hours
of contact
Race/ethnicity % White
% Black
% Hispanic/Latinx
% Asian/Pacific
3.20. Other characteristics of MST services
% Other
3.20.a Code MST program type
Socioeconomic % college educ
status (parents) % unemployed • 1 MST original
% receive
• 2 MST‐PSB (problem sexual behaviour)
public aid
• 3 MST‐CAN (child abuse and neglect
Family composition % single parent
• 4 MST‐JDC (juvenile drug court)
household
• 5 MST = psychiatric
Primary caregiver
relation to youth • 6 Other (describe)
Mean number of • 9 Unclear
children
Other information 3.21. Characteristics of MST staff (education, demographics, etc.)
Other sample
characteristics 3.22. Describe methods used to insure quality of MST services (su-
pervision, training, consultation)

3.15. Were there any significant differences between program and 3.23. Is there any information on program adherence (fidelity) to MST?
control groups at baseline (p‐value < .05 and/or d > 0.25)?
• 1 Yes (describe)
• 1 Yes (describe differences) • 0 No
• 0 No (how do we know?) • 9 Unclear
• 9 Unclear
3.24. Were TAM scores collected?
3.16. Were there any significant differences between MST program
completers and drop‐outs (p < .05 and/or d > 0.25)? • 1 Yes
• 0 No
• 1 Yes (describe differences) • 9 Unclear
• 0 No (how do we know?)
• 9 Unclear 3.25. Were TAM scores reported?

3.17. Were there any significant differences between completers and • 1 Yes
drop‐outs in the control group(s) (p < .05 and/or d > 0.25)? • 0 No
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 123 of 192

• 9 Unclear • 4 Unclear

3.26. Were there any implementation differences between sites? 3.35. Who participated in interviews (check all that apply)?
(TAM scores OR any qualitative/quantitative differences)
• Youth
• 1 Yes (describe differences) • Parent(s)/Caregiver(s)
• 0 No (how do we know?) • Teachers
• 9 Unclear • Clinicians
• Other (SPECIFY)
3.27. Any information on MST program costs?
3.36. When were interview data collected? Identify all data collection
• 1 Yes (describe costs per case and/or total costs) points, including:
• 0 No
• Baseline
Services provided to control cases • Immediately following treatment
3.28. Type of control group • Follow‐ups (identify by number of months after random assignment)
• Other (describe)
• 1 Usual services (treatment as usual, management as usual): able
to access any available service other than MST 3.37. Were interview data collected in the same manner for MST and
• 2 Defined alternative service: individual treatment control groups?
• 3 Defined alternative service: group treatment
• 4 No service • 1 Yes
• 9 Unclear • 0 No (what were the differences?)
• 9 Can't tell
3.29. Describe services provided to
control group 3.38. Were administrative records collected?

3.30. Duration of services provided to control group (describe) • 1 Yes


• 0 No
3.31. Mean hours of contact (total) • 9 Can't tell

3.32. Characteristics of staff who provided services to control cases 3.39. Describe types of records
(education, demographics, etc.)
3.40. Describe timing of collection of admin records
Outcome measures (study level)
3.33. Were interviews conducted? Level 4a: Outcome measures (each row represents an
instrument (scale or subscale) or source used to ob-
• 1 Yes tain outcome data)
• 0 No [SKIP to 3.38] Instructions: If multiple reports are available, enter information from
• 9 Unclear reports in chronological order. Enter outcome measures in the order in
which they are described in measurement and results sections. Enter
3.34. Who conducted interviews? each conceptually distinct measure (e.g., distinct subscales within instru-
ments, plus overall scales) regardless of whether data were collected (at
• 1 Research staff the time of the report) or reported. Note that a single outcome measure
• 2 Program staff can be completed by multiple sources, at multiple points in time (data
• 3 Other (describe) from specific sources and multiple time‐points will be entered later).
Timing of data collection Reliability and Sources Pg# &
(all planned) validity Format Direction (identify all) Mode of Admin Blind? notes
124 of 192

Conceptual Baseline Info from • Dichotomy High score or • Youth • Self (direct to • Yes
|

domain code: Post‐tx (check all): (e.g., event) event is • Parent paper/computer) • No
Description: Follow‐ups (identify all) • Other • Continuous • Positive • Teacher • Interview • Unclear
samples (e.g., scale) • Negative • Clinician • Other
• This • Unclear • Admin data
sample • Other
• Unclear • Unclear
• None
Info provided:
Instrument or Multiple sources
Definition: reported?
• Total or • Separate
composite • Averaged
• Scale • Selected
• Subscalea (which?)

Conceptual domains codes:


1. Placement (jail, hospital, residential treatment, foster care)
2. Arrest/conviction
3. Substance use
4. Delinquency
5. Peer relations (include social skills)
6. Youth behaviour and symptoms
7. Parent behaviour and symptoms
8. Family functioning (include parental supervision, communication, discipline)
9. School (grades, attendance)
a
Total/composite scores include multiple scales and scales may include subscales.
LITTELL
ET AL.

18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 125 of 192

Level 4b: Outcome data (each row is a single effect size)


Outcome # refers to the conceptual domain codes described above.
Dichotomous outcome data
Enter data only if it is provided (do not perform calculations). OR = odds ratio. Enter exact p‐value if available. If covariates (control variables) are used in
the analysis, please identify these variables under Statistics (cov).

Outc # Timing Source Valid Ns n w/event %w event Statistics Pg# and notes

• Baseline • Youth MST MST MST OR


• Post tx • Parent Control Control Control 95% CI (LB UB)
• Follow‐up (when?) • Teacher χ2
• Other • Clinician Df
Subsample? (describe) • Admin data p value
• Other Other
• Unclear Model
Covariates

*Repeated as often as needed.

Continuous outcome data


If change/gain scores are provided, enter under “other data”. If covariates (control variables) are used in the analysis, please identify these
variables under Statistics (cov).

Outc # Timing Source Valid Ns Means SDs Statistics Pg# and notes

• Baseline • Youth MST MST MST p


• Post tx • Parent Control Control Control t
• Follow‐up (when?) • Teacher F
• Other • Clinician df
Subsample? (describe) • Admin data ES
• Other Other
• Unclear Model
Covariates

*Repeated as often as needed.

Level 5: Risk of bias (study level; based on full text, 5.2. Adequate allocation concealment (selection bias): participants and
multiple reports if available)col investigators could not foresee assignment, because randomisation
IDENTIFY REPORT # AND PAGE NUMBERS WHERE INFORMA- was performed at central site remote from the trial location or in-
TION WAS FOUND vestigators monitored use of assignments contained in sequentially
5.1. Adequate (random) sequence generation (selection bias): in- numbered, sealed, opaque envelopes.
vestigators describe a random component in the sequence of as-
signments (e.g., computer‐generated random numbers, table of • 1 Yes = Low risk
random numbers, drawing lots or envelopes, coin tossing, shuffling • 2 Unclear: insufficient information (e.g., random assignment is
cards, or throwing dice). mentioned, but not described in detail) or adequacy of concealment
is unclear (e.g., use of coin toss, card shuffle, dice, envelopes with
• 1 Yes = Low risk of bias unspecified characteristics)
• 2 Unclear: insufficient information (e.g., random assignment is • 3 No = High risk: allocation was not adequately concealed;
mentioned, but not described in detail) for example, investigators used open random number lists, trans-
• 3 No = High risk: investigators describe a non‐random compo- parent or unsealed envelopes, or quasi‐randomisation methods
nent in the sequence of assignments (e.g., alternation or rota- such as alternation or rotation, date of birth, date of admission or
tion, date of birth, date of admission or referral, case record referral, case record number, or service availability.
number, clinical judgement, client preference, or service
availability; nonrandom addition, replacement, or removal 5.3. Baseline equivalence: initial differences between groups were small
of cases) or moderate (d < 0.25).
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
126 of 192 | LITTELL ET AL.

• 1 Yes = Low risk 5.9. Validated outcome measures: use of instruments with demon-
• 2 Unclear risk: insufficient information (e.g., group‐level on back- strated reliability (e.g., α/κ > .7) or validity in this sample or similar
ground characteristics were not provided, d cannot be computed) samples, or use of external administrative data on events (e.g., ar-
• 3 No = High risk: there were baseline differences between groups rests, incarceration, hospitalisation).
with d > 0.25.
• 1 Yes for all outcomes = Low risk
5.4. Avoidance of performance bias (confounding): no systematic dif- • 2 Yes for some outcomes = Unclear
ferences between groups in levels of care or attention, or in exposure • 2 Unclear (insufficient information)
to factors other than the interventions of interest. • 3 No = High risk

• 1 Yes = Low risk 5.10. Free of selective reporting: the study protocol is available and all
• 2 Unclear (insufficient information) pre‐specified outcomes are reported in the pre‐specified way; all
• 3 No = High risk: one group received more attention, care, or expected outcomes are reported in full and for all cases (regardless
surveillance than another; or factors likely to be related to outcomes of direction and significance of results).
(confounding factors) were unequally distributed between groups.
• 1 Yes = Low risk
5.5. Avoidance of detection bias (blinding): assessor is unaware of • 2 Unclear (e.g., protocol is not available)
group assignment when collecting outcome data. • 3 No = High risk: some outcomes are not reported or are reported
incompletely (e.g., for subgroups only, or without sufficient detail
• 1 Yes for all outcomes = Low risk for meta‐analysis).
• 2 Yes for some outcomes = Unclear
• 2 Unclear (insufficient information) 5.11. Free of conflicts of interest: investigators state that they have no
• 3 No = High risk conflicts of interest. Investigators would not benefit if results fa-
voured MST or control/comparison groups. None of the study au-
5.6. Avoidance of attrition bias: Losses to follow up were < 25% and thors, data collection staff, or data analysts were paid to develop,
equally distributed (< 10% difference in response rates) across supervise, or provide services to the MST or to the comparison
groups. Group equivalence on important baseline characteristics was group; none of these investigators are members of consulting firms
retained after losses to follow‐up (d < 0.25). linked to MST or comparison conditions.

• 1 Yes for all outcomes = Low risk • 1 Yes = Low risk


• 2 Yes for some outcomes = Unclear • 2 Unclear (insufficient information, no conflict of interest
• 2 Unclear (insufficient information) statement)
• 3 No = High risk: loss of baseline equivalence (d > 0.25), losses to • 3 No = High risk
follow up > 25% overall, or losses were unequally distributed
(>10% difference) across groups.
C. Computation and estimation methods
5.7. Intention‐to‐treat: data were analysed according to participants’ Study 8 Henggeler 1999a
initial group assignment, regardless of whether assigned services Composite scores: SRD, annualized crime. Researchers reported
were received or completed. means and SDs for SRD subscales on self‐reported aggressive
crimes and property crimes. Annualized convictions (means and
• 1 Yes for all outcomes = Low risk SDs) were reported separately for aggressive crimes and property
• 2 Yes for some outcomes = Unclear crimes. We created composite measures of self‐reported crime
• 2 Unclear (insufficient information) from SRD subscales and annualized convictions for each treat-
• 3 No = High risk ment condition. Each composite mean is the sum of the means of
the two subscales for that group. The composite SD is the square
5.8. Standardised observation periods: follow‐up data were collected root of the sum of the variances (SD 2 ) for each of the two
from each case at a fixed point in time after random assignment, or subscales.
analyses included controls for variable observation periods. Study 10 Leschied 2002
Group SDs were estimated from values of t tests, df, and exact
• 1 Yes for all outcomes = Low risk p values (2002, pp. 62–64), using Cochrane's Finding_SDs.xlx calculator.
• 2 Yes for some outcomes = Unclear Study 14 Timmons‐Mitchell 2006
• 2 Unclear (insufficient information) SMDs were estimated from reported values of t tests and df (2003,
• 3 No = High risk pp. 21–22), using David Wilson's effect size calculator.
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 127 of 192

Study 32 Glisson 2010 D. R code: Computing effect sizes


Valid ns for subgroups. Data provided: Of 674 randomised, 349 were library(metafor)
assigned to MST and 325 to the usual services control group. Of 615 library(clubSandwich)
with baseline data, there are 316 MST cases, 299 controls; 284 in library(tidyverse)
ARC counties, 331 in non‐ARC counties. Other case counts are # read in data
provided for MST and control groups only. Using baseline data, we allMST < ‐ read.csv("MST review data updated 6jan2021.csv")
applied marginal percentages for ARC/non‐ARC counties to these # create new variable for direction of the effect size
counts to estimate valid ns for each of the four subgroups. We as- # ‐1 indicates that smaller values favor MST, 1 indicates that larger
sumed that attrition does not vary between ARC and non‐ARC values favor MST
counties. For example, ARC counties included = 284/615 = 46.2% of allMST$effdir < ‐ allMST$posdir
cases at baseline. Of 349 cases randomly assigned to MST, we esti- allMST$effdir < ‐ recode(allMST$effdir, "0" = ‐1, "1" = 1)
mate that 161 (46.2%) were in ARC counties and 150 in non‐ARC table(allMST$Continuous)
counties. #subset data to smd(continuous) and lor(discrete)
Number of placements. Investigators used HLM to model data smddata < ‐ subset(allMST, allMST$Continuous = = 1)
clustered in 14 counties. For placements at 18 months, a 2‐level lordata < ‐ subset(allMST, allMST$Continuous = = 0)
HLM used the Bernoulli distribution with a logit link function. #####################
Investigators calculated the probability of placement in the control #####################################
group from the exponent of the adjusted log‐odds intercept, # Computations for computed smd
p = exp(b)/(1 + (exp(b)). Exponents of adjusted log‐odds for other ##########################
groups were used to compute probabilities for other groups ################################
relative to the control group. The probability of placement is .341 # make smd variable a numeric value
for controls, .161 for MST only, .190 for ARC only, and .129 for smddata$SMD < ‐ as.numeric(as.character(smddata$SMD))
ARC + MST. # create data set with SMD computed
We estimate numbers of placements in each group by applying smdcomputed < ‐ subset(smddata, is.na(smddata$SMD)! = T)
the probability of placement to the estimated number of cases within # create data set that need SMD
each group. smdneeded < ‐ subset(smddata, is.na(smddata$SMD) = = T)
We compared placement rates in the MST‐only group with rates # compute the effect sizes for the smd computed cases
for controls to estimate the impact of MST in non‐ARC counties. We # start with computing variance for smd
compared MST + ARC to ARC‐only to estimate impact of MST in ARC smdcomputed$var1 < ‐ ((smdcomputed$MSTN + smdcomputed$Cnt
counties. lN)/(smdcomputed$MSTN*smdcomputed$CntlN)) +
Study 43 Asscher 2013
Standard deviations. We estimated missing standard deviations using ((smdcomputed$SMD^2)/(2*(smdcomputed$MSTN + smdcomputed
reported UB and LB for SMDs and Cochrane's Finding_SMDs.xls. $CntlN)))
Study 48 Fonagy 2018
Cumulative numbers of events (means and SDs). Means and SDs were # adjust smd and var for small sample bias
provided for number of offences in three time periods: 0–6, 6–12 smdcomputed$jcorr < ‐ 1 ‐ (3/(4*(smdcomputed$MSTN +
and 12–18 months post referral. We created composite measures smdcomputed$CntlN‐2)‐1))
of the number of offences that occurred in the 0–12 and 0–18 smdcomputed$smdprelim < ‐ smdcomputed$jcorr * smdcom-
month periods for each treatment condition. Each composite mean puted$SMD
is the sum of the means of the component time periods for that smdcomputed$varsmd < ‐ smdcomputed$jcorr * smdcomputed$var1
group. The composite SD is the square root of the sum of the # adjust the sign of the effect size
variances (SD2) for each of those means. Similar procedures were # commented out for now ‐ need to adjust depending on outcome
used to combine data on total number of days in placements (at 18 # smdcomputed$smd1 < ‐ smdcomputed$effdir * smdcomputed
and 48 months) and convictions, violent offences, and nonviolent $smdprelim
offences over 5 years. # create data file to combine with other dataset
Total SRD scores computed from subscales. Means and SDs were smdcomputed1 < ‐ smdcomputed%>%select(‐jcorr, ‐var1)
provided for SRD subscales. We computed total SRD mean scores for ######################
each treatment condition, by addition the subgroup means. SDs for #######################
total scores are computed as described above. #########################
Standard deviations. For some measures, group means were re- # Compute effect sizes with escalc for remaining smds
ported without standard deviations. We estimated missing standard #######################
deviations using reported UB and LB for SMDs and Cochrane's ##########################
Finding_SMDs.xls. ####################
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
128 of 192 | LITTELL ET AL.

# use escalc to get smd from means and sds # create a data set for each smd outcome for the analysis
smdneeded < ‐ escalc(measure = "SMD", placement < ‐ subset(allsmd, allsmd$outcometype = = 1)
arrest < ‐ subset(allsmd, allsmd$outcometype = =2)
m1i = smdneeded$MSTmean, sd1i = smdneeded$MSTsd, n1i = substance < ‐ subset(allsmd, allsmd$outcometype = =3)
smdneeded$MSTN, delinq < ‐ subset(allsmd, allsmd$outcometype = = 4)
m2i = smdneeded$Cntlmean, sd2i = smdneeded$Cntlsd, n2i = peerrel < ‐ subset(allsmd, allsmd$outcometype = = 5)
smdneeded$CntlN, youthbeh < ‐ subset(allsmd, allsmd$outcometype = = 6)
data = smdneeded, var.names = c("smdprelim", "varsmd"), replace = F) parentbeh < ‐ subset(allsmd, allsmd$outcometype = = 7)
family < ‐ subset(allsmd, allsmd$outcometype = =8)
# change sign of smdprelim school < ‐ subset(allsmd, allsmd$outcometype = =9)
# commented out to change sign of smd depending on outcome ############################
# smdneeded$smd1 < ‐ smdneeded$smdprelim * smdneeded$effdir ###################################
summary(smdneeded$smdprelim) # Overall analysis for placement
##################### #############################
################################ ##################################
################# # explore the number of effect sizes per study
# Put smd data together by_study1 < ‐ group_by(placement, studyid)
########################## effcount1 < ‐ summarize(by_study1, count1 = n())
############################ table(effcount1$count1)
################ table(placement$effdir)
MSTsmd < ‐ rbind.data.frame(smdneeded, smdcomputed1) # create smd for analysis
# write out smd data set placement$smd1 < ‐ placement$smdprelim
write.csv(MSTsmd, file = "MSTsmd.csv") summary(placement$smd1)
############################ # CE model for mean effect size, corelation assumed 0.8
########################## cemodel1 < ‐ robu(formula = smd1 ~ 1, data = placement,
###############
# Examine lor data studynum = studyid, var.eff.size = varsmd,
########################## rho = 0.8, small = TRUE)
###########################################
summary(lordata$MSTevent) cemodel1
######################## sensitivity(cemodel1)
# compute LOR # exploring moderators
######################## # time since referral
# get lor for all cases hist(placement$timemnths)
# direction for effect size will be corrected by outcome summary(placement$timemnths)
lordata < ‐ escalc(measure = "OR", placement$timemean < ‐ placement$timemnths ‐ 39.97
# time since referral
ai = MSTevent, n1i = MSTN, cemodel11 < ‐ robu(smd1 ~ timemean,
ci = Cntlevent, n2i = CntlN,
data = lordata, var.names = c("lor1", "varlor"), replace = F) var.eff.size = varsmd, studynum = studyid,
data = placement,
summary(lordata$lor1) rho = 0.8)
write.csv(lordata, file = "lordata.csv")
cemodel11
# Developer‐involved and US‐based
E. R code: CE models for SMDs (only) # Checking to see if the moderators are confounded
library(metafor) table(placement$developinv)
library(clubSandwich) table(placement$US)
library(tidyverse) # Conducting the moderator analysis with US‐Based only
library(robumeta) # US study
# read in data created in the effect size code cemodel12 < ‐ robu(smd1 ~ US,
allsmd < ‐ read.csv("MSTsmd.csv")
table(allsmd$outcometype) var.eff.size = varsmd, studynum = studyid,
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 129 of 192

data = placement, cemodel17


rho = 0.8) # ROB baseline equivalence
table(placement$baseline)
cemodel12 table(placement$studyid, placement$baseline)
table(placement$datasource) # recode to low + unclear risk = 0 and high risk = 1
# recode datasource to admin data vs all other placement$newbase < ‐ placement$baseline
placement$newsource < ‐ placement$datasource placement$newbase < ‐ recode(placement$newbase, '1' = 0,
placement$newsource < ‐ recode(placement$newsource, '2' = 0, '2' = 0, '3' = 1)
'4' = 1, '5' = 0) table(placement$newbase)
table(placement$newsource) # CE MR with ROB Baseline Equivalence
# source of data ‐ reporter cemodel18 < ‐ robu(smd1 ~ newbase,
cemodel14 < ‐ robu(smd1 ~ newsource,
var.eff.size = varsmd, studynum = studyid,
var.eff.size = varsmd, studynum = studyid, data = placement,
data = placement, rho = 0.8)
rho = 0.8)
cemodel18
cemodel14 # ROB performance bias
# explore risk of bias variables table(placement$perfbias)
# Overall attrition and differential attrition table(placement$studyid, placement$perfbias)
summary(placement$overatt) # recode to low + unclear risk = 0 and high risk = 1
hist(placement$overatt) placement$newperf < ‐ placement$perfbias
summary(placement$diffatt) placement$newperf < ‐ recode(placement$newperf, '1' = 0, '2' = 0,
hist(placement$diffatt) '3' = 1)
# CE MR with overall attrition table(placement$newperf)
cemodel15 < ‐ robu(smd1 ~ overatt, # CE MR with ROB Performance Bias
cemodel19 < ‐ robu(smd1 ~ newperf,
var.eff.size = varsmd, studynum = studyid,
data = placement, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = placement,
rho = 0.8)
cemodel15
# CE MR with differential attrition cemodel19
cemodel16 < ‐ robu(smd1 ~ diffatt, # ROB selective reporting
table(placement$selreport)
var.eff.size = varsmd, studynum = studyid, table(placement$studyid, placement$selreport)
data = placement, # recode to low + unclear risk = 0 and high risk = 1
rho = 0.8) placement$newselreport < ‐ placement$selreport
placement$newselreport < ‐ recode(placement$newselreport, '1' = 0,
cemodel16 '2' = 0, '3' = 1)
# IIT analysis table(placement$newselreport)
table(placement$itt) # CE MR with ROB Selective Reporting
# recode to low risk = 0 and high risk = 1 cemodel110 < ‐ robu(smd1 ~ selreport,
placement$newitt < ‐ placement$itt
placement$newitt < ‐ recode(placement$newitt, '1' = 0, '3' = 1) var.eff.size = varsmd, studynum = studyid,
table(placement$newitt) data = placement,
# CE MR with ROB ITT rho = 0.8)
cemodel17 < ‐ robu(smd1 ~ newitt,
cemodel110
var.eff.size = varsmd, studynum = studyid, # ROB Sequence Generation
data = placement, table(placement$seqgen)
rho = 0.8) # recode sequence generation to low vs high+unclear
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
130 of 192 | LITTELL ET AL.

placement$newseqgen < ‐ placement$seqgen cemodel22 < ‐ robu(smd1 ~ developinv,


placement$newseqgen < ‐recode(placement$newseqgen, '1' = 0, '2' = 1,
'3' = 1) var.eff.size = varsmd, studynum = studyid,
table(placement$newseqgen) data = arrest,
# Sequence generation rho = 0.8)
cemodel13 < ‐ robu(smd1 ~ newseqgen,
cemodel22
var.eff.size = varsmd, studynum = studyid, # US‐based
data = placement, table(arrest$US)
rho = 0.8) # US based
cemodel23 < ‐ robu(smd1 ~ US,
cemodel13
############################ var.eff.size = varsmd, studynum = studyid,
####################### data = arrest,
############ rho = 0.8)
# Overall analysis for arrest
############################# cemodel23
################################## # data source
# explore the number of effect sizes per study # all are admin data
by_study2 < ‐ group_by(arrest, studyid) table(arrest$datasource)
effcount2 < ‐ summarize(by_study2, count2 = n()) # explore risk of bias variables
table(effcount2$count2) # Overall attrition and differential attrition
# explore direction of effect size summary(arrest$overatt)
table(arrest$effdir) hist(arrest$overatt)
# all effect sizes are negative summary(arrest$diffatt)
# create effect size for analysis hist(arrest$diffatt)
arrest$smd1 < ‐ arrest$smdprelim # CE MR with overall attrition
summary(arrest$smd1) cemodel24 < ‐ robu(smd1 ~ overatt,
# Overall analysis
# CE model var.eff.size = varsmd, studynum = studyid,
cemodel2 < ‐ robu(formula = smd1 ~ 1, data = arrest, data = arrest,
rho = 0.8)
studynum = studyid, var.eff.size = varsmd,
rho = 0.8, small = TRUE) cemodel24
# CE MR with differential attrition
cemodel2 cemodel25 < ‐ robu(smd1 ~ diffatt,
sensitivity(cemodel2)
#explore moderators var.eff.size = varsmd, studynum = studyid,
# first center the timemnths variable data = arrest,
summary(arrest$timemnths) rho = 0.8)
hist(arrest$timemnths)
arrest$timemean = arrest$timemnths ‐ 60.96 cemodel25
# time since referral # IIT analysis
cemodel21 < ‐ robu(smd1 ~ timemean, table(arrest$itt)
# recode to low risk = 0 and high risk = 1
var.eff.size = varsmd, studynum = studyid, arrest$newitt < ‐ arrest$itt
data = arrest, arrest$newitt < ‐ recode(arrest$newitt, '1' = 0, '2'= 1, '3' = 1)
rho = 0.8) table(arrest$newitt)
# CE MR with ROB ITT
cemodel21 cemodel26 < ‐ robu(smd1 ~ newitt,
# Developer‐involved
table(arrest$developinv) var.eff.size = varsmd, studynum = studyid,
# developer involved data = arrest,
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 131 of 192

rho = 0.8) arrest$newseqgen < ‐recode(arrest$newseqgen, '1' = 0, '2' = 1,


'3' = 1)
cemodel26 table(arrest$newseqgen)
# ROB baseline equivalence # new sequence generation
table(arrest$baseline) cemodel27 < ‐ robu(smd1 ~ newseqgen,
# recode to low + unclear risk = 0 and high risk = 1
arrest$newbase < ‐ arrest$baseline var.eff.size = varsmd, studynum = studyid,
arrest$newbase < ‐ recode(arrest$newbase, '1' = 0, '2' = 0, '3' = 1) data = arrest,
table(arrest$newbase) rho = 0.8)
# CE MR with ROB Baseline Equivalence
cemodel27 < ‐ robu(smd1 ~ newbase, cemodel27
################################
var.eff.size = varsmd, studynum = studyid, ###############################
data = arrest, # Overall analysis for substance abuse
rho = 0.8) #############################
##################################
cemodel27 # explore the number of effect sizes per study
# ROB performance bias by_study3 < ‐ group_by(substance, studyid)
table(arrest$perfbias) effcount3 < ‐ summarize(by_study3, count3 = n())
table(arrest$studyid, arrest$perfbias) table(effcount3$count3)
# recode to low + unclear risk = 0 and high risk = 1 table(substance$effdir)
arrest$newperf < ‐ arrest$perfbias # all effect sizes are negative
arrest$newperf < ‐ recode(arrest$newperf, '1' = 0, '2' = 0, '3' = 1) # create effect size for analysis
table(arrest$newperf) substance$smd1 < ‐ substance$smdprelim
# CE MR with ROB Performance Bias summary(substance$smd1)
cemodel18 < ‐ robu(smd1 ~ newperf, # CE model
cemodel3 < ‐ robu(formula = smd1 ~ 1, data = substance,
var.eff.size = varsmd, studynum = studyid,
data = arrest, studynum = studyid, var.eff.size = varsmd,
rho = 0.8) rho = 0.8, small = TRUE)

cemodel18 cemodel3
# selective reporting sensitivity(cemodel3)
table(arrest$studyid, arrest$selreport) # explore moderators
table(arrest$selreport) # time since referral
# recode to low + unclear risk = 0 and high risk = 1 hist(substance$timemnths)
arrest$newselreport < ‐ arrest$selreport summary(substance$timemnths)
arrest$newselreport < ‐ recode(arrest$newselreport, '1' = 0, substance$timemean < ‐ substance$timemnths ‐ 22.24
'2' = 0, '3' = 1) # time since referral
table(arrest$newselreport) cemodel31 < ‐ robu(smd1 ~ timemean,
# CE MR with ROB selective reporting
cemodel19 < ‐ robu(smd1 ~ newselreport, var.eff.size = varsmd, studynum = studyid,
data = substance,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = arrest,
rho = 0.8) cemodel31
# explore the US and developer moderators
cemodel19 table(substance$developinv)
# sequence generation table(substance$US)
table(arrest$seqgen) # these two variables are not completely overlapping so both will be
table(arrest$studyid, arrest$seqgen) used in separate analyses
#recode seqgen # developer involved
arrest$newseqgen < ‐ arrest$seqgen cemodel33 < ‐ robu(smd1 ~ developinv,
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
132 of 192 | LITTELL ET AL.

var.eff.size = varsmd, studynum = studyid, substance$newitt < ‐ substance$itt


data = substance, substance$newitt < ‐ recode(substance$newitt, '1' = 0, '3' = 1)
rho = 0.8) table(substance$newitt)
# CE MR with ROB ITT
cemodel33 cemodel37 < ‐ robu(smd1 ~ newitt,
# US‐based
cemodel32 < ‐ robu(smd1 ~ US, var.eff.size = varsmd, studynum = studyid,
data = substance,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = substance,
rho = 0.8) cemodel37
# baseline equivalence
cemodel32 table(substance$studyid, substance$baseline)
# data source table(substance$baseline)
table(substance$datasource) # recode to low + unclear risk = 0 and high risk = 1
substance$newsource < ‐ substance$datasource substance$newbase < ‐ substance$baseline
substance$newsource < ‐ recode(substance$newsource, '1' = 1, substance$newbase < ‐ recode(substance$newbase, '1' = 0, '2' = 0,
'2' = 0, '5' = 0) '3' = 1)
table(substance$newsource) table(substance$newbase)
#data source # CE MR with ROB Baseline Equivalence
cemodel34 < ‐ robu(smd1 ~ newsource, cemodel38 < ‐ robu(smd1 ~ newbase,

var.eff.size = varsmd, studynum = studyid, var.eff.size = varsmd, studynum = studyid,


data = substance, data = substance,
rho = 0.8) rho = 0.8)

cemodel34 cemodel38
# explore risk of bias variables # ROB performance bias
# Overall attrition and differential attrition table(substance$perfbias)
summary(substance$overatt) table(substance$studyid, substance$perfbias)
hist(substance$overatt) # recode to low + unclear risk = 0 and high risk = 1
summary(substance$diffatt) substance$newperf < ‐ substance$perfbias
hist(substance$diffatt) substance$newperf < ‐ recode(substance$newperf, '1' = 0, '2' = 0,
# CE MR with overall attrition '3' = 1)
cemodel35 < ‐ robu(smd1 ~ overatt, table(substance$newperf)
# CE MR with ROB Performance Bias
var.eff.size = varsmd, studynum = studyid, cemodel39 < ‐ robu(smd1 ~ newperf,
data = substance,
rho = 0.8) var.eff.size = varsmd, studynum = studyid,
data = substance,
cemodel35 rho = 0.8)
# CE MR with differential attrition
cemodel36 < ‐ robu(smd1 ~ diffatt, cemodel39
# ROB selective reporting
var.eff.size = varsmd, studynum = studyid, table(substance$selreport)
data = substance, table(substance$studyid, substance$selreport)
rho = 0.8) # recode to low + unclear risk = 0 and high risk = 1
substance$newselreport < ‐ substance$selreport
cemodel36 substance$newselreport < ‐ recode(substance$newselreport, '1' = 0,
# rob itt '2' = 0, '3' = 1)
table(substance$studyid, substance$itt) table(substance$newselreport)
table(substance$itt) # CE MR with ROB Selective Reporting
# recode to low risk = 0 and high risk = 1 cemodel310 < ‐ robu(smd1 ~ selreport,
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 133 of 192

var.eff.size = varsmd, studynum = studyid, rho = 0.8)


data = substance,
rho = 0.8) cemodel41
# developer involved and US‐based
cemodel310 table(delinq$developinv)
# sequence generation table(delinq$US)
table(substance$seqgen) # developer‐involved
table(substance$studyid, substance$seqgen) cemodel42 < ‐ robu(smd1 ~ developinv,
substance$newseqgen < ‐ substance$seqgen
substance$newseqgen < ‐recode(substance$newseqgen, var.eff.size = varsmd, studynum = studyid,
'1' = 0, '2' = 1) data = delinq,
table(substance$newseqgen) rho = 0.8)
# sequence generation
cemodel311 < ‐ robu(smd1 ~ newseqgen, cemodel42
# US‐based
var.eff.size = varsmd, studynum = studyid, cemodel43 < ‐ robu(smd1 ~ US,
data = substance,
rho = 0.8) var.eff.size = varsmd, studynum = studyid,
data = delinq,
cemodel311 rho = 0.8)
###############################
################################ cemodel43
# Overall analysis for delinquency # data source
############################### table(delinq$datasource)
################################ # explore risk of bias variables
# explore the number of effect sizes per study # Overall attrition and differential attrition
by_study4 < ‐ group_by(delinq, studyid) summary(delinq$overatt)
effcount4 < ‐ summarize(by_study4, count4 = n()) hist(delinq$overatt)
table(effcount4$count4) summary(delinq$diffatt)
table(delinq$effdir) hist(delinq$diffatt)
# all effect sizes are negative # CE MR with overall attrition
# create effect size for analysis cemodel44 < ‐ robu(smd1 ~ overatt,
delinq$smd1 < ‐ delinq$smdprelim
summary(delinq$smd1) var.eff.size = varsmd, studynum = studyid,
# CE model data = delinq,
cemodel4 < ‐ robu(formula = smd1 ~ 1, data = delinq, rho = 0.8)

studynum = studyid, var.eff.size = varsmd, cemodel44


rho = 0.8, small = TRUE) # CE MR with differential attrition
cemodel45 < ‐ robu(smd1 ~ diffatt,
cemodel4
sensitivity(cemodel4) var.eff.size = varsmd, studynum = studyid,
# explore moderators data = delinq,
# time from referral rho = 0.8)
hist(delinq$timemnths)
summary(delinq$timemnths) cemodel45
delinq$timemean < ‐ delinq$timemnths ‐ 14.44 # IIT analysis
# CE model table(delinq$itt)
cemodel41 < ‐ robu(smd1 ~ timemean, # recode to low risk = 0 and unclear + high risk = 1
delinq$newitt < ‐ delinq$itt
var.eff.size = varsmd, studynum = studyid, delinq$newitt < ‐ recode(delinq$newitt, '1' = 0,'2' = 1, '3' = 1)
data = delinq, table(delinq$newitt)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
134 of 192 | LITTELL ET AL.

# CE MR with ROB ITT cemodel49


cemodel46 < ‐ robu(smd1 ~ newitt, # ROB Sequence Generation
table(delinq$seqgen)
var.eff.size = varsmd, studynum = studyid, table(delinq$studyid, delinq$seqgen)
data = delinq, # recode sequence generation to low vs high+unclear
rho = 0.8) delinq$newseqgen < ‐ delinq$seqgen
delinq$newseqgen < ‐recode(delinq$newseqgen, '1' = 0, '2' = 1,
cemodel46 '3' = 1)
# ROB baseline equivalence table(delinq$newseqgen)
table(delinq$baseline) # Sequence generation
table(delinq$studyid, delinq$baseline) cemodel410 < ‐ robu(smd1 ~ newseqgen,
# recode to low + unclear risk = 0 and high risk = 1
delinq$newbase < ‐ delinq$baseline var.eff.size = varsmd, studynum = studyid,
delinq$newbase < ‐ recode(delinq$newbase, '1' = 0, '2' = 0, data = delinq,
'3' = 1) rho = 0.8)
table(delinq$newbase)
# CE MR with ROB Baseline Equivalence cemodel410
cemodel47 < ‐ robu(smd1 ~ newbase, ##########################
#####################################
var.eff.size = varsmd, studynum = studyid, # Overall analysis for PEER RELATIONS
data = delinq, ############################
rho = 0.8) ###################################
# explore the number of effect sizes per study
cemodel47 by_study5 < ‐ group_by(peerrel, studyid)
# ROB performance bias effcount5 < ‐ summarize(by_study5, count5 = n())
table(delinq$perfbias) table(effcount5$count5)
table(delinq$studyid, delinq$perfbias) table(peerrel$effdir)
# recode to low + unclear risk = 0 and high risk = 1 # some ES are coded in negative direction
delinq$newperf < ‐ delinq$perfbias # Negative ES are recoded to positive direction
delinq$newperf < ‐ recode(delinq$newperf, '1' = 0, '2' = 0, '3' = 1) # create effect size for analysis
table(delinq$newperf) peerrel$smd1 < ‐ peerrel$effdir * peerrel$smdprelim
# CE MR with ROB Performance Bias summary(peerrel$smd1)
cemodel48 < ‐ robu(smd1 ~ newperf, # CE model
cemodel5 < ‐ robu(formula = smd1 ~ 1, data = peerrel,
var.eff.size = varsmd, studynum = studyid,
data = delinq, studynum = studyid, var.eff.size = varsmd,
rho = 0.8) rho = 0.8, small = TRUE)

cemodel48 cemodel5
# ROB selective reporting sensitivity(cemodel5)
table(delinq$selreport) #explore moderators
table(delinq$studyid, delinq$selreport) # time since referral
# recode to low + unclear risk = 0 and high risk = 1 summary(peerrel$timemnths)
delinq$newselreport < ‐ delinq$selreport hist(peerrel$timemnths)
delinq$newselreport < ‐ recode(delinq$newselreport, '1' = 0, peerrel$timemean < ‐ peerrel$timemnths ‐ 17.49
'2' = 0, '3' = 1) # time from referral
table(delinq$newselreport) cemodel51 < ‐ robu(smd1 ~ timemean,
# CE MR with ROB Selective Reporting
cemodel49 < ‐ robu(smd1 ~ selreport, var.eff.size = varsmd, studynum = studyid,
data = peerrel,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = delinq,
rho = 0.8) cemodel51
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 135 of 192

# US‐based and developer‐involved peerrel$newitt < ‐ recode(peerrel$newitt, '1' = 0,'2'= 1, '3' = 1)


table(peerrel$US) table(peerrel$newitt)
table(peerrel$developinv) # CE MR with ROB ITT
# US‐based cemodel56 < ‐ robu(smd1 ~ newitt,
cemodel52 < ‐ robu(smd1 ~ US,
var.eff.size = varsmd, studynum = studyid,
var.eff.size = varsmd, studynum = studyid, data = peerrel,
data = peerrel, rho = 0.8)
rho = 0.8)
cemodel56
cemodel52 # ROB baseline equivalence
# data source table(peerrel$baseline)
table(peerrel$datasource) table(peerrel$studyid, peerrel$baseline)
peerrel$newsource < ‐ peerrel$datasource # recode to low + unclear risk = 0 and high risk = 1
peerrel$newsource < ‐ recode(peerrel$newsource, '1' = 0, '2' = 1, peerrel$newbase < ‐ peerrel$baseline
'5' = 1) peerrel$newbase < ‐ recode(peerrel$newbase, '1' = 0, '2' = 0,
table(peerrel$newsource) '3' = 1)
# data source table(peerrel$newbase)
cemodel53 < ‐ robu(smd1 ~ newsource, # CE MR with ROB Baseline Equivalence
cemodel57 < ‐ robu(smd1 ~ newbase,
var.eff.size = varsmd, studynum = studyid,
data = peerrel, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = peerrel,
rho = 0.8)
cemodel53
# explore risk of bias variables cemodel57
# Overall attrition and differential attrition # ROB performance bias
summary(peerrel$overatt) table(peerrel$perfbias)
hist(peerrel$overatt) table(peerrel$studyid, peerrel$perfbias)
summary(peerrel$diffatt) # recode to low + unclear risk = 0 and high risk = 1
hist(peerrel$diffatt) peerrel$newperf < ‐ peerrel$perfbias
# CE MR with overall attrition peerrel$newperf < ‐ recode(peerrel$newperf, '1' = 0, '2' = 0,
cemodel54 < ‐ robu(smd1 ~ overatt, '3' = 1)
table(peerrel$newperf)
var.eff.size = varsmd, studynum = studyid, # CE MR with ROB Performance Bias
data = peerrel, cemodel58 < ‐ robu(smd1 ~ newperf,
rho = 0.8)
var.eff.size = varsmd, studynum = studyid,
cemodel54 data = peerrel,
# CE MR with differential attrition rho = 0.8)
cemodel55 < ‐ robu(smd1 ~ diffatt,
cemodel58
var.eff.size = varsmd, studynum = studyid, # ROB selective reporting
data = peerrel, table(peerrel$selreport)
rho = 0.8) table(peerrel$studyid, peerrel$selreport)
# recode to low + unclear risk = 0 and high risk = 1
cemodel55 peerrel$newselreport < ‐ peerrel$selreport
# IIT analysis peerrel$newselreport < ‐ recode(peerrel$newselreport, '1' = 0,
table(peerrel$itt) '2' = 0, '3' = 1)
# recode to low risk = 0 and unclear and high risk = 1 table(peerrel$newselreport)
peerrel$newitt < ‐ peerrel$itt # CE MR with ROB Selective Reporting
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
136 of 192 | LITTELL ET AL.

cemodel59 < ‐ robu(smd1 ~ selreport, # time from referral


cemodel61 < ‐ robu(smd1 ~ timemean,
var.eff.size = varsmd, studynum = studyid,
data = peerrel, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = youthbeh,
rho = 0.8)
cemodel59
# ROB Sequence Generation cemodel61
table(peerrel$seqgen) # US‐based and developer‐involved
table(peerrel$studyid, peerrel$seqgen) table(youthbeh$developinv)
# recode sequence generation to low vs high+unclear table(youthbeh$US)
peerrel$newseqgen < ‐ peerrel$seqgen # developer‐involved
peerrel$newseqgen < ‐recode(peerrel$newseqgen, '1' = 0, '2' = 1, cemodel62 < ‐ robu(smd1 ~ developinv,
'3' = 1)
table(peerrel$newseqgen) var.eff.size = varsmd, studynum = studyid,
# Sequence generation data = youthbeh,
cemodel510 < ‐ robu(smd1 ~ newseqgen, rho = 0.8)

var.eff.size = varsmd, studynum = studyid, cemodel62


data = peerrel, # US‐based
rho = 0.8) cemodel63 < ‐ robu(smd1 ~ US,

cemodel510 var.eff.size = varsmd, studynum = studyid,


############################## data = youthbeh,
################################# rho = 0.8)
# Overall analysis for YOUTH BEHAVIOR SYMPTOMS
############################# cemodel63
################################## # data source
# explore the number of effect sizes per study table(youthbeh$datasource)
by_study6 < ‐ group_by(youthbeh, studyid) table(youthbeh$studyid, youthbeh$datasource)
effcount6 < ‐ summarize(by_study6, count6 = n()) youthbeh$youths < ‐ youthbeh$datasource
table(effcount6$count6) youthbeh$parents < ‐ youthbeh$datasource
table(youthbeh$effdir) youthbeh$youths < ‐ recode(youthbeh$youths, '1' = 1, '2' = 0,
# negative values indicate better outcomes for youths '3' = 0, '5' = 0)
# switching the positive values to negative for youth behavior youthbeh$parents < ‐ recode(youthbeh$parents, '1' = 0, '2' = 1,
youthbeh$smd1 < ‐ ‐1 * youthbeh$effdir * youthbeh$smdprelim '3' = 0, '5' = 0)
hist(youthbeh$smdprelim) table(youthbeh$youths)
summary(youthbeh$smdprelim) table(youthbeh$parents)
hist(youthbeh$smd1) # data source
summary(youthbeh$smd1) cemodel64 < ‐ robu(smd1 ~ youths + parents,
# CE model
cemodel6 < ‐ robu(formula = smd1 ~ 1, data = youthbeh, var.eff.size = varsmd, studynum = studyid,
data = youthbeh,
studynum = studyid, var.eff.size = varsmd, rho = 0.8)
rho = 0.8, small = TRUE)
cemodel64
cemodel6 # explore risk of bias variables
sensitivity(cemodel6) # Overall attrition and differential attrition
# exploring moderators summary(youthbeh$overatt)
# time since referral hist(youthbeh$overatt)
summary(youthbeh$timemnths) summary(youthbeh$diffatt)
hist(youthbeh$timemnths) hist(youthbeh$diffatt)
youthbeh$timemean < ‐ youthbeh$timemnths ‐ 23.55 # CE MR with overall attrition
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 137 of 192

cemodel65 < ‐ robu(smd1 ~ overatt, youthbeh$newperf < ‐ recode(youthbeh$newperf, '1' = 0, '2' = 0,


'3' = 1)
var.eff.size = varsmd, studynum = studyid, table(youthbeh$newperf)
data = youthbeh, # CE MR with ROB Performance Bias
rho = 0.8) cemodel69 < ‐ robu(smd1 ~ newperf,

cemodel65 var.eff.size = varsmd, studynum = studyid,


# CE MR with differential attrition data = youthbeh,
cemodel66 < ‐ robu(smd1 ~ diffatt, rho = 0.8)

var.eff.size = varsmd, studynum = studyid, cemodel69


data = youthbeh, # ROB selective reporting
rho = 0.8) table(youthbeh$selreport)
table(youthbeh$studyid, youthbeh$selreport)
cemodel66 # recode to low + unclear risk = 0 and high risk = 1
# IIT analysis youthbeh$newselreport < ‐ youthbeh$selreport
table(youthbeh$itt) youthbeh$newselreport < ‐ recode(youthbeh$newselreport, '1' = 0,
table(youthbeh$studyid, youthbeh$itt) '2' = 0, '3' = 1)
# recode to low risk = 0 and unclear + high risk = 1 table(youthbeh$newselreport)
youthbeh$newitt < ‐ youthbeh$itt # CE MR with ROB Selective Reporting
youthbeh$newitt < ‐ recode(youthbeh$newitt, '1' = 0, '2' = 1, cemodel610 < ‐ robu(smd1 ~ selreport,
'3' = 1)
table(youthbeh$newitt) var.eff.size = varsmd, studynum = studyid,
# CE MR with ROB ITT data = youthbeh,
cemodel67 < ‐ robu(smd1 ~ newitt, rho = 0.8)

var.eff.size = varsmd, studynum = studyid, cemodel610


data = youthbeh, # ROB Sequence Generation
rho = 0.8) table(youthbeh$seqgen)
table(youthbeh$studyid, youthbeh$seqgen)
cemodel67 # recode sequence generation to low vs high+unclear
# ROB baseline equivalence youthbeh$newseqgen < ‐ youthbeh$seqgen
table(youthbeh$baseline) youthbeh$newseqgen < ‐recode(youthbeh$newseqgen, '1' = 0, '2' = 1,
table(youthbeh$studyid, youthbeh$baseline) '3' = 1)
# recode to low + unclear risk = 0 and high risk = 1 table(youthbeh$newseqgen)
youthbeh$newbase < ‐ youthbeh$baseline # Sequence generation
youthbeh$newbase < ‐ recode(youthbeh$newbase, '1' = 0, '2' = 0, cemodel611 < ‐ robu(smd1 ~ newseqgen,
'3' = 1)
table(youthbeh$newbase) var.eff.size = varsmd, studynum = studyid,
# CE MR with ROB Baseline Equivalence data = youthbeh,
cemodel68 < ‐ robu(smd1 ~ newbase, rho = 0.8)

var.eff.size = varsmd, studynum = studyid, cemodel611


data = youthbeh, ###############################
rho = 0.8) ################################
# Overall analysis for PARENT BEHAVIOR SYMPTOMS
cemodel68 ################################
# ROB performance bias ###############################
table(youthbeh$perfbias) # explore the number of effect sizes per study
table(youthbeh$studyid, youthbeh$perfbias) by_study7 < ‐ group_by(parentbeh, studyid)
# recode to low + unclear risk = 0 and high risk = 1 effcount7 < ‐ summarize(by_study7, count7 = n())
youthbeh$newperf < ‐ youthbeh$perfbias table(effcount7$count7)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
138 of 192 | LITTELL ET AL.

table(parentbeh$effdir) parentbeh$newsource < ‐ parentbeh$datasource


# Both negative and positive directions occur for parent behavior parentbeh$newsource < ‐ recode(parentbeh$newsource, '1'= 0, '2' = 1,
symptoms '5' = 0, '6' = 0)
# as in youth behavior symptoms, will change to negative means table(parentbeh$newsource)
better outcomes # data source
parentbeh$smd1 < ‐ ‐1 * parentbeh$effdir * parentbeh$smdprelim cemodel74 < ‐ robu(smd1 ~ newsource,
hist(parentbeh$smdprelim)
summary(parentbeh$smdprelim) var.eff.size = varsmd, studynum = studyid,
hist(parentbeh$smd1) data = parentbeh,
summary(parentbeh$smd1) rho = 0.8)
# CE model
cemodel7 < ‐ robu(formula = smd1 ~ 1, data = parentbeh, cemodel74
# explore risk of bias variables
studynum = studyid, var.eff.size = varsmd, # Overall attrition and differential attrition
rho = 0.8, small = TRUE) summary(parentbeh$overatt)
hist(parentbeh$overatt)
cemodel7 summary(parentbeh$diffatt)
sensitivity(cemodel7) hist(parentbeh$diffatt)
# exploring moderators # CE MR with overall attrition
# time from referral cemodel75 < ‐ robu(smd1 ~ overatt,
hist(parentbeh$timemnths)
summary(parentbeh$timemnths) var.eff.size = varsmd, studynum = studyid,
parentbeh$timemean < ‐ parentbeh$timemnths ‐ 17.64 data = parentbeh,
# CE meta‐regression rho = 0.8)
# time from referral
cemodel71 < ‐ robu(smd1 ~ timemean, cemodel75
# CE MR with differential attrition
var.eff.size = varsmd, studynum = studyid, cemodel76 < ‐ robu(smd1 ~ diffatt,
data = parentbeh,
rho = 0.8) var.eff.size = varsmd, studynum = studyid,
data = parentbeh,
cemodel71 rho = 0.8)
# US‐based and developer‐involved
table(parentbeh$US) cemodel76
table(parentbeh$developinv) # IIT analysis
# US‐based table(parentbeh$itt)
cemodel73 < ‐ robu(smd1 ~ US, # recode to low risk = 0 and unclear +high risk = 1
parentbeh$newitt < ‐ parentbeh$itt
var.eff.size = varsmd, studynum = studyid, parentbeh$newitt < ‐ recode(parentbeh$newitt, '1' = 0, '2' = 1,
data = parentbeh, '3' = 1)
rho = 0.8) table(parentbeh$newitt)
# CE MR with ROB ITT
cemodel73 cemodel77 < ‐ robu(smd1 ~ newitt,
# developer‐involved
cemodel72 < ‐ robu(smd1 ~ developinv, var.eff.size = varsmd, studynum = studyid,
data = parentbeh,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = parentbeh,
rho = 0.8) cemodel77
# ROB baseline equivalence
cemodel72 table(parentbeh$baseline)
# data source table(parentbeh$studyid, parentbeh$baseline)
table(parentbeh$datasource) # recode to low + unclear risk = 0 and high risk = 1
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 139 of 192

parentbeh$newbase < ‐ parentbeh$baseline var.eff.size = varsmd, studynum = studyid,


parentbeh$newbase < ‐ recode(parentbeh$newbase, '1' = 0, data = parentbeh,
'2' = 0, '3' = 1) rho = 0.8)
table(parentbeh$newbase)
# CE MR with ROB Baseline Equivalence cemodel711
cemodel78 < ‐ robu(smd1 ~ newbase, ##############################
#################################
var.eff.size = varsmd, studynum = studyid, # Overall analysis for FAMILY FUNCTION
data = parentbeh, ###################################
rho = 0.8) ############################
# explore the number of effect sizes per study
cemodel78 by_study8 < ‐ group_by(family, studyid)
# ROB performance bias effcount8 < ‐ summarize(by_study8, count8 = n())
table(parentbeh$perfbias) table(effcount8$count8)
table(parentbeh$studyid, parentbeh$perfbias) table(family$effdir)
# recode to low + unclear risk = 0 and high risk = 1 table(family$effdir, family$outcometype2)
parentbeh$newperf < ‐ parentbeh$perfbias # negative effect sizes are family function general and family conflict
parentbeh$newperf < ‐ recode(parentbeh$newperf, '1' = 0, '2' = 0, # switching the direction to positive for the negative effect sizes
'3' = 1) # positive effect indicates better outcomes
table(parentbeh$newperf) family$smd1 < ‐ family$effdir * family$smdprelim
# CE MR with ROB Performance Bias hist(family$smdprelim)
cemodel79 < ‐ robu(smd1 ~ newperf, summary(family$smdprelim)
hist(family$smd1)
var.eff.size = varsmd, studynum = studyid, summary(family$smd1)
data = parentbeh, # CE model
rho = 0.8) cemodel8 < ‐ robu(formula = smd1 ~ 1, data = family,

cemodel79 studynum = studyid, var.eff.size = varsmd,


# ROB selective reporting rho = 0.8, small = TRUE)
table(parentbeh$selreport)
table(parentbeh$studyid, parentbeh$selreport) cemodel8
# recode to low + unclear risk = 0 and high risk = 1 sensitivity(cemodel8)
parentbeh$newselreport < ‐ parentbeh$selreport #exploring moderators
parentbeh$newselreport < ‐ recode(parentbeh$newselreport, '1' = 0, hist(family$timemnths)
'2' = 0, '3' = 1) summary(family$timemnths)
table(parentbeh$newselreport) family$timemean < ‐ family$timemnths ‐ 15.92
# CE MR with ROB Selective Reporting # time since referral
cemodel710 < ‐ robu(smd1 ~ selreport, cemodel81 < ‐ robu(smd1 ~ timemean,

var.eff.size = varsmd, studynum = studyid, var.eff.size = varsmd, studynum = studyid,


data = parentbeh, data = family,
rho = 0.8) rho = 0.8)

cemodel710 cemodel81
# sequence generation # developer‐involved and US‐based
table(parentbeh$seqgen) table(family$US)
table(parentbeh$studyid, parentbeh$seqgen) table(family$developinv)
parentbeh$newseqgen < ‐ parentbeh$seqgen # US‐based
parentbeh$newseqgen < ‐recode(parentbeh$newseqgen, '1' = 0, '2' = 1, cemodel82 < ‐ robu(smd1 ~ US,
'3'= 1)
table(parentbeh$newseqgen) var.eff.size = varsmd, studynum = studyid,
# sequence generation data = family,
cemodel711 < ‐ robu(smd1 ~ newseqgen, rho = 0.8)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
140 of 192 | LITTELL ET AL.

cemodel82 cemodel87 < ‐ robu(smd1 ~ newitt,


# developer‐involved
cemodel83 < ‐ robu(smd1 ~ developinv, var.eff.size = varsmd, studynum = studyid,
data = family,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = family,
rho = 0.8) cemodel87
# ROB baseline equivalence
cemodel83 table(family$baseline)
# data source table(family$studyid, family$baseline)
table(family$datasource) # recode to low + unclear risk = 0 and high risk = 1
family$newsource < ‐ family$datasource family$newbase < ‐ family$baseline
family$newsource < ‐ recode(family$newsource, '1' = 0, '2' = 1, family$newbase < ‐ recode(family$newbase, '1' = 0, '2' = 0, '3' = 1)
'5' = 0, '6' = 0) table(family$newbase)
table(family$newsource) # CE MR with ROB Baseline Equivalence
# data source cemodel88 < ‐ robu(smd1 ~ newbase,
cemodel84 < ‐ robu(smd1 ~ newsource,
var.eff.size = varsmd, studynum = studyid,
var.eff.size = varsmd, studynum = studyid, data = family,
data = family, rho = 0.8)
rho = 0.8)
cemodel88
cemodel84 # ROB performance bias
# explore risk of bias variables table(family$perfbias)
# Overall attrition and differential attrition table(family$studyid, family$perfbias)
summary(family$overatt) # recode to low + unclear risk = 0 and high risk = 1
hist(family$overatt) family$newperf < ‐ family$perfbias
summary(family$diffatt) family$newperf < ‐ recode(family$newperf, '1' = 0, '2' = 0, '3' = 1)
hist(family$diffatt) table(family$newperf)
# CE MR with overall attrition # CE MR with ROB Performance Bias
cemodel85 < ‐ robu(smd1 ~ overatt, cemodel89 < ‐ robu(smd1 ~ newperf,

var.eff.size = varsmd, studynum = studyid, var.eff.size = varsmd, studynum = studyid,


data = family, data = family,
rho = 0.8) rho = 0.8)

cemodel85 cemodel89
# CE MR with differential attrition # ROB selective reporting
cemodel86 < ‐ robu(smd1 ~ diffatt, table(family$selreport)
table(family$studyid, family$selreport)
var.eff.size = varsmd, studynum = studyid, # recode to low + unclear risk = 0 and high risk = 1
data = family, family$newselreport < ‐ family$selreport
rho = 0.8) family$newselreport < ‐ recode(family$newselreport, '1' = 0,
'2' = 0, '3' = 1)
cemodel86 table(family$newselreport)
# IIT analysis # CE MR with ROB Selective Reporting
table(family$itt) cemodel810 < ‐ robu(smd1 ~ selreport,
table(family$studyid, family$itt)
# recode to low risk = 0 and unclear and high risk = 1 var.eff.size = varsmd, studynum = studyid,
family$newitt < ‐ family$itt data = family,
family$newitt < ‐ recode(family$newitt, '1' = 0, '2' = 1, '3' = 1) rho = 0.8)
table(family$newitt)
# CE MR with ROB ITT cemodel810
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 141 of 192

# sequence generation table(school$US)


table(family$seqgen) table(school$studyid, school$US)
family$newseqgen < ‐ family$seqgen table(school$developinv)
family$newseqgen < ‐recode(family$newseqgen, '1' = 0, '2' = 1, '3'= 1) # developer‐involved
table(family$newseqgen) cemodel93 < ‐ robu(smd1 ~ developinv,
# sequence‐generation
cemodel811 < ‐ robu(smd1 ~ newseqgen, var.eff.size = varsmd, studynum = studyid,
data = school,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = family,
rho = 0.8) cemodel93
# data source
cemodel811 table(school$datasource)
############################# school$newsource < ‐ school$datasource
################################## school$newsource < ‐ recode(school$newsource, '1' = 1, '2' = 0,
# Overall analysis for school '4' = 0, '5' = 0)
############################# table(school$newsource)
################################## # data source
# explore the number of effect sizes per study cemodel94 < ‐ robu(smd1 ~ newsource,
by_study9 < ‐ group_by(school, studyid)
effcount9 < ‐ summarize(by_study9, count9 = n()) var.eff.size = varsmd, studynum = studyid,
table(effcount9$count9) data = school,
# table of effect direction rho = 0.8)
table(school$effdir)
table(school$effdir, school$outcometype2) cemodel94
# switching the direction to positive for the negative effect sizes # explore risk of bias variables
# positive effect indicates better outcomes # Overall attrition and differential attrition
school$smd1 < ‐ school$effdir * school$smdprelim summary(school$overatt)
hist(school$smdprelim) hist(school$overatt)
summary(school$smdprelim) summary(school$diffatt)
hist(school$smd1) hist(school$diffatt)
summary(school$smd1) # CE MR with overall attrition
# CE model cemodel95 < ‐ robu(smd1 ~ overatt,
cemodel9 < ‐ robu(formula = smd1 ~ 1, data = school,
var.eff.size = varsmd, studynum = studyid,
studynum = studyid, var.eff.size = varsmd, data = school,
rho = 0.8, small = TRUE) rho = 0.8)

cemodel9 cemodel95
sensitivity(cemodel9) # CE MR with differential attrition
# time from referral cemodel96 < ‐ robu(smd1 ~ diffatt,
hist(school$timemnths)
summary(school$timemnths) var.eff.size = varsmd, studynum = studyid,
school$timemean < ‐ school$timemnths ‐ 22.23 data = school,
# time from referral rho = 0.8)
cemodel91 < ‐ robu(smd1 ~ timemean,
cemodel96
var.eff.size = varsmd, studynum = studyid, # IIT analysis
data = school, table(school$itt)
rho = 0.8) # recode to low risk = 0 and high risk = 1
school$newitt < ‐ school$itt
cemodel91 school$newitt < ‐ recode(school$newitt, '1' = 0, '3' = 1)
# US and developer‐involved table(school$newitt)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
142 of 192 | LITTELL ET AL.

# CE MR with ROB ITT cemodel910


cemodel97 < ‐ robu(smd1 ~ newitt, table(school$seqgen)
school$newseqgen < ‐ school$seqgen
var.eff.size = varsmd, studynum = studyid, school$newseqgen < ‐recode(school$newseqgen, '1' = 0, '2' = 1)
data = school, table(school$newseqgen)
rho = 0.8) # sequence generation
cemodel911 < ‐ robu(smd1 ~ newseqgen,
cemodel97
# ROB baseline equivalence var.eff.size = varsmd, studynum = studyid,
table(school$baseline) data = school,
table(school$studyid, school$baseline) rho = 0.8)
# all low risk effect sizes are from a single study
# recode to low + unclear risk = 0 and high risk = 1 cemodel911
#school$newbase < ‐ school$baseline ##############################
#school$newbase < ‐ recode(school$newbase, '1' = 0, '2' = 0, ##############################
'3' = 1) # save all of the individual data sets for overall analysis
#table(school$newbase) ##############################
# CE MR with ROB Baseline Equivalence ##################################
#cemodel98 < ‐ robu(smd1 ~ newbase, write.csv(placement, "placement.csv")
# var.eff.size = varsmd, studynum = studyid, write.csv(arrest, "arrest.csv")
# data = school, write.csv(substance, "substance.csv")
# rho = 0.8) write.csv(delinq, "delinquency.csv")
#cemodel98 write.csv(peerrel, "peerel.csv")
# ROB performance bias write.csv(youthbeh, "youthbeh.csv")
table(school$perfbias) write.csv(parentbeh, "parenbeh.csv")
table(school$studyid, school$perfbias) write.csv(family, "family.csv")
# recode to low + unclear risk = 0 and high risk = 1 write.csv(school, "school.csv")
school$newperf < ‐ school$perfbias
school$newperf < ‐ recode(school$newperf, '1' = 0, '2' = 0, '3' = 1)
table(school$newperf) F. R code: CE models for LORs (only)
# CE MR with ROB Performance Bias library(metafor)
cemodel99 < ‐ robu(smd1 ~ newperf, library(clubSandwich)
library(tidyverse)
var.eff.size = varsmd, studynum = studyid, library(robumeta)
data = school, # read in the lor data saved in the MST Effect Size code
rho = 0.8) alllor < ‐ read.csv("lordata.csv")
table(alllor$outcometype)
cemodel99 # create an effect size for each outcome for the analysis
# ROB selective reporting lorplacement < ‐ subset(alllor, alllor$outcometype = = 1)
table(school$selreport) lorarrest < ‐ subset(alllor, alllor$outcometype = =2)
table(school$studyid, school$selreport) lorsubstance < ‐ subset(alllor, alllor$outcometype = =3)
# recode to low + unclear risk = 0 and high risk = 1 lordelinq < ‐ subset(alllor, alllor$outcometype = = 4)
school$newselreport < ‐ school$selreport lorpeerrel < ‐ subset(alllor, alllor$outcometype = = 5)
school$newselreport < ‐ recode(school$newselreport, '1' = 0, loryouthbeh < ‐ subset(alllor, alllor$outcometype = = 6)
'2' = 0, '3' = 1) lorparentbeh < ‐ subset(alllor, alllor$outcometype = = 7)
table(school$newselreport) lorfamily < ‐ subset(alllor, alllor$outcometype = =8)
# CE MR with ROB Selective Reporting lorschool < ‐ subset(alllor, alllor$outcometype = =9)
cemodel910 < ‐ robu(smd1 ~ selreport, ############################
###################################
var.eff.size = varsmd, studynum = studyid, # Overall analysis for placement
data = school, ############################
rho = 0.8) ###################################
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 143 of 192

# explore the number of effect sizes per study exp(cemodell12$reg_table[,8])


lorby_study1 < ‐ group_by(lorplacement, studyid) # data source
leffcount1 < ‐ summarize(lorby_study1, count1 = n()) table(lorplacement$datasource)
table(leffcount1$count1) # recode datasource to admin data vs all other
table(lorplacement$effdir) lorplacement$newsource < ‐ lorplacement$datasource
# all effect sizes are negative so we keep the direction of the effect lorplacement$newsource < ‐ recode(lorplacement$newsource, '2' = 0,
sizes '4' = 1, '5' = 0)
summary(lorplacement$lor1) table(lorplacement$newsource)
# CE model # data source
cemodelL1 < ‐ robu(formula = lor1 ~ 1, data = lorplacement, cemodell13 < ‐ robu(lor1 ~ newsource,

studynum = studyid, var.eff.size = varlor, var.eff.size = varlor, studynum = studyid,


rho = 0.8, small = TRUE) data = lorplacement,
rho = 0.8)
cemodelL1
sensitivity(cemodelL1) cemodell13
# transform back to OR # transform to odds ratio
# mean # coefficients: intercept and US
exp(cemodelL1$reg_table[2]) exp(cemodell13$reg_table[,2])
# CI.lower # CI.lower: intercept and US
exp(cemodelL1$reg_table[7]) exp(cemodell13$reg_table[,7])
# CI.upper # CI.upper: intercept and US
exp(cemodelL1$reg_table[8]) exp(cemodell13$reg_table[,8])
# exploring moderators # explore risk of bias variables
# time since referral # Overall attrition and differential attrition
hist(lorplacement$timemnths) summary(lorplacement$overatt)
summary(lorplacement$timemnths) hist(lorplacement$overatt)
lorplacement$timemean < ‐ lorplacement$timemnths ‐ 19.8 summary(lorplacement$diffatt)
# time since referral hist(lorplacement$diffatt)
cemodell11 < ‐ robu(lor1 ~ timemean, # CE MR with overall attrition
cemodell14 < ‐ robu(lor1 ~ overatt,
var.eff.size = varlor, studynum = studyid,
data = lorplacement, var.eff.size = varlor, studynum = studyid,
rho = 0.8) data = lorplacement,
rho = 0.8)
cemodell11
# developer involved & US‐based cemodell14
table(lorplacement$developinv) # transform to odds ratio
table(lorplacement$US) # coefficients: intercept and coeff
# US‐based exp(cemodell14$reg_table[,2])
cemodell12 < ‐ robu(lor1 ~ US, # CI.lower: intercept and coeff
exp(cemodell14$reg_table[,7])
var.eff.size = varlor, studynum = studyid, # CI.upper: intercept and coeff
data = lorplacement, exp(cemodell14$reg_table[,8])
rho = 0.8) # CE MR with differential attrition
cemodell15 < ‐ robu(lor1 ~ diffatt,
cemodell12
# transform to odds ratio var.eff.size = varlor, studynum = studyid,
# coefficients: intercept and US data = lorplacement,
exp(cemodell12$reg_table[,2]) rho = 0.8)
# CI.lower: intercept and US
exp(cemodell12$reg_table[,7]) cemodell15
# CI.upper: intercept and US # transform to odds ratio
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
144 of 192 | LITTELL ET AL.

# coefficients: intercept and coeff table(lorplacement$perfbias)


exp(cemodell15$reg_table[,2]) table(lorplacement$studyid, lorplacement$perfbias)
# CI.lower: intercept and coeff # recode to low + unclear risk = 0 and high risk = 1
exp(cemodell15$reg_table[,7]) lorplacement$newperf < ‐ lorplacement$perfbias
# CI.upper: intercept and coeff lorplacement$newperf < ‐ recode(lorplacement$newperf, '1' = 0,
exp(cemodell15$reg_table[,8]) '2' = 0, '3' = 1)
# IIT analysis table(lorplacement$newperf)
table(lorplacement$itt) # CE MR with ROB Performance Bias
table(lorplacement$studyid, lorplacement$itt) cemodell19 < ‐ robu(lor1 ~ newperf,
# recode to low risk = 0 and unclear + high risk = 1
lorplacement$newitt < ‐ lorplacement$itt var.eff.size = varlor, studynum = studyid,
lorplacement$newitt < ‐ recode(lorplacement$newitt, '1' = 0, data = lorplacement,
'2'=1, '3' = 1) rho = 0.8)
table(lorplacement$newitt)
# CE MR with ROB ITT cemodell19
cemodell17 < ‐ robu(lor1 ~ newitt, # transform to odds ratio
# coefficients: intercept and coeff
var.eff.size = varlor, studynum = studyid, exp(cemodell19$reg_table[,2])
data = lorplacement, # CI.lower: intercept and coeff
rho = 0.8) exp(cemodell19$reg_table[,7])
# CI.upper: intercept and coeff
cemodell17 exp(cemodell19$reg_table[,8])
# transform to odds ratio # ROB selective reporting
# coefficients: intercept and coeff table(lorplacement$selreport)
exp(cemodell17$reg_table[,2]) table(lorplacement$studyid, lorplacement$selreport)
# CI.lower: intercept and coeff # recode to low + unclear risk = 0 and high risk = 1
exp(cemodell17$reg_table[,7]) lorplacement$newselreport < ‐ lorplacement$selreport
# CI.upper: intercept and coeff lorplacement$newselreport < ‐ recode(lorplacement$newselreport,
exp(cemodell17$reg_table[,8]) '1' = 0, '2' = 0, '3' = 1)
# ROB baseline equivalence table(lorplacement$newselreport)
table(lorplacement$baseline) # CE MR with ROB Selective Reporting
table(lorplacement$studyid, lorplacement$baseline) cemodell110 < ‐ robu(lor1 ~ selreport,
# recode to low + unclear risk = 0 and high risk = 1
lorplacement$newbase < ‐ lorplacement$baseline var.eff.size = varlor, studynum = studyid,
lorplacement$newbase < ‐ recode(lorplacement$newbase, '1' = 0, data = lorplacement,
'2' = 0, '3' = 1) rho = 0.8)
table(lorplacement$newbase)
# CE MR with ROB Baseline Equivalence cemodell110
cemodell18 < ‐ robu(lor1 ~ newbase, # transform to odds ratio
# coefficients: intercept and coeff
var.eff.size = varlor, studynum = studyid, exp(cemodell110$reg_table[,2])
data = lorplacement, # CI.lower: intercept and coeff
rho = 0.8) exp(cemodell110$reg_table[,7])
# CI.upper: intercept and coeff
cemodell18 exp(cemodell110$reg_table[,8])
# transform to odds ratio # sequence generation
# coefficients: intercept and coeff table(lorplacement$seqgen)
exp(cemodell18$reg_table[,2]) # recode seqgen to 1 or 2
# CI.lower: intercept and coeff lorplacement$newseqgen < ‐ lorplacement$seqgen
exp(cemodell18$reg_table[,7]) lorplacement$newseqgen < ‐recode(lorplacement$newseqgen,
# CI.upper: intercept and coeff '1' = 0, '2' = 1, '3' = 1)
exp(cemodell18$reg_table[,8]) table(lorplacement$newseqgen)
# ROB performance bias # sequence generation
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 145 of 192

cemodell111 < ‐ robu(lor1 ~ newseqgen, data = lorarrest,


rho = 0.8)
var.eff.size = varlor, studynum = studyid,
data = lorplacement, cemodell21
rho = 0.8) # transform to odds ratio
# coefficients: intercept and coeff
cemodell111 exp(cemodell21$reg_table[,2])
# transform to odds ratio # CI.lower: intercept and coeff
# coefficients: intercept and coeff exp(cemodell21$reg_table[,7])
exp(cemodell111$reg_table[,2]) # CI.upper: intercept and coeff
# CI.lower: intercept and coeff exp(cemodell21$reg_table[,8])
exp(cemodell111$reg_table[,7]) # developer involved & US‐based
# CI.upper: intercept and coeff table(lorarrest$developinv)
exp(cemodell111$reg_table[,8]) table(lorarrest$US)
##################### # developer‐involved
################################# cemodell22 < ‐ robu(lor1 ~ developinv,
#########################
# Overall analysis for arrests var.eff.size = varlor, studynum = studyid,
################################## data = lorarrest,
############################################ rho = 0.8)
# explore the number of effect sizes per study
lorby_study2 < ‐ group_by(lorarrest, studyid) cemodell22
lorby_study2 < ‐ summarize(lorby_study2, count2 = n()) # transform to odds ratio
table(lorby_study2$count2) # coefficients: intercept and coeff
# explore direction of effect sizes exp(cemodell22$reg_table[,2])
table(lorarrest$effdir) # CI.lower: intercept and coeff
# all are negative and we keep the direction as lower arrests are a exp(cemodell22$reg_table[,7])
positive outcome # CI.upper: intercept and coeff
summary(lorarrest$lor1) exp(cemodell22$reg_table[,8])
# CE model # US‐based
cemodell2 < ‐ robu(formula = lor1 ~ 1, data = lorarrest, cemodell23 < ‐ robu(lor1 ~ US,

studynum = studyid, var.eff.size = varlor, var.eff.size = varlor, studynum = studyid,


rho = 0.8, small = TRUE) data = lorarrest,
rho = 0.8)
cemodell2
sensitivity(cemodell2) cemodell23
# transform to odds ratio # transform to odds ratio
# mean # coefficients: intercept and coeff
exp(cemodell2$reg_table[2]) exp(cemodell23$reg_table[,2])
# CI.lower # CI.lower: intercept and coeff
exp(cemodell2$reg_table[7]) exp(cemodell23$reg_table[,7])
# CI.upper # CI.upper: intercept and coeff
exp(cemodell2$reg_table[8]) exp(cemodell23$reg_table[,8])
# exploring moderators # data source
# time since referral table(lorarrest$datasource)
hist(lorarrest$timemnths) # explore risk of bias variables
summary(lorarrest$timemnths) # Overall attrition and differential attrition
lorarrest$timemean < ‐ lorarrest$timemnths ‐ 55.23 summary(lorarrest$overatt)
# time since referral hist(lorarrest$overatt)
cemodell21 < ‐ robu(lor1 ~ timemean, summary(lorarrest$diffatt)
hist(lorarrest$diffatt)
var.eff.size = varlor, studynum = studyid, # CE MR with overall attrition
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
146 of 192 | LITTELL ET AL.

cemodell24 < ‐ robu(lor1 ~ overatt, table(lorarrest$baseline)


table(lorarrest$studyid, lorarrest$baseline)
var.eff.size = varlor, studynum = studyid, # recode to low + unclear risk = 0 and high risk = 1
data = lorarrest, lorarrest$newbase < ‐ lorarrest$baseline
rho = 0.8) lorarrest$newbase < ‐ recode(lorarrest$newbase, '1' = 0, '2' = 0,
'3' = 1)
cemodell24 table(lorarrest$newbase)
# transform to odds ratio # CE MR with ROB Baseline Equivalence
# coefficients: intercept and coeff cemodell27 < ‐ robu(lor1 ~ newbase,
exp(cemodell24$reg_table[,2])
# CI.lower: intercept and coeff var.eff.size = varlor, studynum = studyid,
exp(cemodell24$reg_table[,7]) data = lorarrest,
# CI.upper: intercept and coeff rho = 0.8)
exp(cemodell24$reg_table[,8])
# CE MR with differential attrition cemodell27
cemodell25 < ‐ robu(lor1 ~ diffatt, # transform to odds ratio
# coefficients: intercept and coeff
var.eff.size = varlor, studynum = studyid, exp(cemodell27$reg_table[,2])
data = lorarrest, # CI.lower: intercept and coeff
rho = 0.8) exp(cemodell27$reg_table[,7])
# CI.upper: intercept and coeff
cemodell25 exp(cemodell27$reg_table[,8])
# transform to odds ratio # ROB performance bias
# coefficients: intercept and coeff table(lorarrest$perfbias)
exp(cemodell25$reg_table[,2]) table(lorarrest$studyid, lorarrest$perfbias)
# CI.lower: intercept and coeff # recode to low + unclear risk = 0 and high risk = 1
exp(cemodell25$reg_table[,7]) lorarrest$newperf < ‐ lorarrest$perfbias
# CI.upper: intercept and coeff lorarrest$newperf < ‐ recode(lorarrest$newperf, '1' = 0, '2' = 0,
exp(cemodell25$reg_table[,8]) '3' = 1)
# IIT analysis table(lorarrest$newperf)
table(lorarrest$itt) # CE MR with ROB Performance Bias
table(lorarrest$studyid, lorarrest$itt) cemodell28 < ‐ robu(lor1 ~ newperf,
# recode to low risk = 0 and high risk = 1
lorarrest$newitt < ‐ lorarrest$itt var.eff.size = varlor, studynum = studyid,
lorarrest$newitt < ‐ recode(lorarrest$newitt, '1' = 0, '2' = 1, data = lorarrest,
'3' = 1) rho = 0.8)
table(lorarrest$newitt)
# CE MR with ROB ITT cemodell28
cemodell26 < ‐ robu(lor1 ~ newitt, # transform to odds ratio
# coefficients: intercept and coeff
var.eff.size = varlor, studynum = studyid, exp(cemodell28$reg_table[,2])
data = lorarrest, # CI.lower: intercept and coeff
rho = 0.8) exp(cemodell28$reg_table[,7])
# CI.upper: intercept and coeff
cemodell26 exp(cemodell28$reg_table[,8])
# transform to odds ratio # ROB selective reporting
# coefficients: intercept and coeff table(lorarrest$selreport)
exp(cemodell26$reg_table[,2]) table(lorarrest$studyid, lorarrest$selreport)
# CI.lower: intercept and coeff # recode to low + unclear risk = 0 and high risk = 1
exp(cemodell26$reg_table[,7]) lorarrest$newselreport < ‐ lorarrest$selreport
# CI.upper: intercept and coeff lorarrest$newselreport < ‐ recode(lorarrest$newselreport, '1' = 0,
exp(cemodell26$reg_table[,8]) '2' = 0, '3' = 1)
# ROB baseline equivalence table(lorarrest$newselreport)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 147 of 192

# CE MR with ROB Selective Reporting summary(lorsubstance$lor1)


cemodell29 < ‐ robu(lor1 ~ newselreport, hist(lorsubstance$lor1)
summary(lorsubstance$lor2)
var.eff.size = varlor, studynum = studyid, hist(lorsubstance$lor2)
data = lorarrest, # only 3 studies so use fixed effects
rho = 0.8) # Use metafor since only 5 effect sizes
substancemean < ‐ rma(yi = lor2,
cemodell29
# transform to odds ratio vi = varlor,
# coefficients: intercept and coeff data = lorsubstance,
exp(cemodell29$reg_table[,2]) method = "FE")
# CI.lower: intercept and coeff
exp(cemodell29$reg_table[,7]) substancemean
# CI.upper: intercept and coeff # transform to odds ratio
exp(cemodell29$reg_table[,8]) # mean
# ROB Sequence Generation exp(substancemean$b)
table(lorarrest$seqgen) # CI.lower
# recode sequence generation to low vs high+unclear exp(substancemean$ci.lb)
lorarrest$newseqgen < ‐ lorarrest$seqgen # CI.upper: intercept and coeff
lorarrest$newseqgen < ‐recode(lorarrest$newseqgen, '1' = 0, '2' = 1) exp(substancemean$ci.ub)
table(lorarrest$newseqgen) ###########################
# Sequence generation ##############################
cemodell210 < ‐ robu(lor1 ~ newseqgen, ######################
# Overall analysis for delinquency outcomes
var.eff.size = varlor, studynum = studyid, ###########################
data = lorarrest, ###########################
rho = 0.8) ########################
# explore the number of effect sizes per study
cemodell210 lorby_study4 < ‐ group_by(lordelinq, studyid)
# transform to odds ratio leffcount4 < ‐ summarize(lorby_study4, count2 = n())
# coefficients: intercept and coeff table(leffcount4$count2)
exp(cemodell210$reg_table[,2]) # check the direction of the effect sizes
# CI.lower: intercept and coeff table(lordelinq$effdir)
exp(cemodell210$reg_table[,7]) summary(lordelinq$lor1)
# CI.upper: intercept and coeff # Use metafor since only 3 effect sizes
exp(cemodell210$reg_table[,8]) delinqmean < ‐ rma(yi = lor1,
########################
################################### vi = varlor,
#################### data = lordelinq,
# Overall analysis for substance abuse method = "FE")
#########################
################################# delinqmean
#################### # transform to odds ratio
# explore the number of effect sizes per study # mean
lorby_study3 < ‐ group_by(lorsubstance, studyid) exp(delinqmean$b)
leffcount3 < ‐ summarize(lorby_study3, count2 = n()) # CI.lower
table(leffcount3$count2) exp(delinqmean$ci.lb)
# effect size direction # CI.upper: intercept and coeff
table(lorsubstance$effdir) exp(delinqmean$ci.ub)
# two effect sizes are in the positive direction ############################
# change direction of these effect sizes #############################
lorsubstance$lor2 < ‐ ‐1 * lorsubstance$effdir * lorsubstance$lor1 ######################
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
148 of 192 | LITTELL ET AL.

# Overall analysis for youth behavior outcomes method = "FE")


###########################
############################# parentbehmean
###################### # transform to odds ratio
# explore the number of effect sizes per study # mean
lorby_study6 < ‐ group_by(loryouthbeh, studyid) exp(parentbehmean$b)
leffcount6 < ‐ summarize(lorby_study6, count2 = n()) # CI.lower
table(leffcount6$count2) exp(parentbehmean$ci.lb)
table(loryouthbeh$effdir) # CI.upper: intercept and coeff
# three effect sizes in the positive direction so we change them to exp(parentbehmean$ci.ub)
negative ############################
loryouthbeh$lor2 < ‐ ‐1 * loryouthbeh$effdir * loryouthbeh$lor1 ############################
summary(loryouthbeh$lor1) #######################
summary(loryouthbeh$lor2) # Overall analysis for school outcomes
# Use metafor since only 3 studies ###########################
youthbehmean < ‐ rma(yi = lor2, ##############################
#####################
vi = varlor, # explore the number of effect sizes per study
data = loryouthbeh, lorby_study9 < ‐ group_by(lorschool, studyid)
method = "FE") leffcount9 < ‐ summarize(lorby_study9, count2 = n())
table(leffcount9$count2)
youthbehmean table(lorschool$effdir)
# transform to odds ratio # three effect sizes are in the negative direction but they are all missing
# mean # we keep effect sizes in this direction
exp(youthbehmean$b) # Use metafor since only 4 effect sizes in 2 studies
# CI.lower schoolmean < ‐ rma(yi = lor1,
exp(youthbehmean$ci.lb)
# CI.upper: intercept and coeff vi = varlor,
exp(youthbehmean$ci.ub) data = lorschool,
########################### method = "FE")
################################
#################### schoolmean
# Overall analysis for parent behavior outcomes # transform to odds ratio
######################### # mean
################################## exp(schoolmean$b)
################### # CI.lower
# explore the number of effect sizes per study exp(schoolmean$ci.lb)
lorby_study7 < ‐ group_by(lorparentbeh, studyid) # CI.upper: intercept and coeff
leffcount7 < ‐ summarize(lorby_study7, count2 = n()) exp(schoolmean$ci.ub)
table(leffcount7$count2) ############################
table(lorparentbeh$effdir) ################################
# four effect sizes in the positive direction and are transformed to # Need to save the lor data sets for combination with smds
the negative write.csv(lorplacement, "lorplacement.csv")
lorparentbeh$lor2 < ‐ ‐1 * lorparentbeh$effdir * lorparentbeh$lor1 write.csv(lordelinq, "lordelinq.csv")
summary(lorparentbeh$lor1) write.csv(lorarrest, "lorarrest.csv")
summary(lorparentbeh$lor2) write.csv(lorfamily, "lorfamily.csv")
# Use metafor since only 5 effect sizes within 2 studies write.csv(lorparentbeh, "lorparentbeh.csv")
parentbehmean < ‐ rma(yi = lor1, write.csv(lorpeerrel, "lorpeerel.csv")
write.csv(lorschool, "lorschool.csv")
vi = varlor, write.csv(lorsubstance, "lorsubstance.csv")
data = lorparentbeh, write.csv(loryouthbeh, "loryouthbeh.csv")
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 149 of 192

G. R code: Combined CE models (all ES converted to SMDs) data = allplacement,


library(metafor) rho = 0.8)
library(clubSandwich)
library(tidyverse) cemodel11
library(robumeta) # Developer‐involved and US‐based
# This code uses the data sets created in the SMD Analysis and LOR # Checking to see if the moderators are confounded
Analysis programs table(allplacement$developinv)
########################## table(allplacement$US)
############################# # Conducting the moderator analysis with US‐Based only
############################# # US study
# Combined analysis for placement cemodel12 < ‐ robu(smd1 ~ US,
##############################
################################### var.eff.size = varsmd, studynum = studyid,
################### data = allplacement,
# get the two data sets rho = 0.8)
placement < ‐ read.csv("placement.csv")
lorplacement < ‐ read.csv("lorplacement.csv") cemodel12
# transform lors in placement and arrest to smds table(allplacement$datasource)
lorplacement$smd1 < ‐ lorplacement$lor1 * (sqrt(3)/pi) table(allplacement$newsource)
lorplacement$varsmd < ‐ lorplacement$varlor * (3/(pi^2)) # source of data ‐ reporter
# take out variables not needed cemodel14 < ‐ robu(smd1 ~ newsource,
lorplacement1 < ‐ lorplacement%>%select(‐lor1, ‐varlor)
placement1 < ‐ placement%>%select(‐smdprelim) var.eff.size = varsmd, studynum = studyid,
#combine all placement outcomes data = allplacement,
allplacement < ‐ rbind.data.frame(lorplacement1, placement1) rho = 0.8)
# explore the number of effect sizes per study
by_study1 < ‐ group_by(allplacement, studyid) cemodel14
effcount1 < ‐ summarize(by_study1, count1 = n()) # explore risk of bias variables
table(effcount1$count1) # Overall attrition and differential attrition
summary(allplacement$smd1) summary(allplacement$overatt)
# CE model for mean effect size, corelation assumed 0.8 hist(allplacement$overatt)
cemodel1 < ‐ robu(formula = smd1 ~ 1, data = allplacement, summary(allplacement$diffatt)
hist(allplacement$diffatt)
studynum = studyid, var.eff.size = varsmd, # CE MR with overall attrition
rho = 0.8, small = TRUE) cemodel15 < ‐ robu(smd1 ~ overatt,

cemodel1 var.eff.size = varsmd, studynum = studyid,


sensitivity(cemodel1) data = allplacement,
# prediction interval rho = 0.8)
# lower pi
tau21 < ‐ as.numeric(cemodel1$mod_info[3]) cemodel15
cemodel1$b.r ‐ 1.96 * sqrt(tau21) # CE MR with differential attrition
# upper pi cemodel16 < ‐ robu(smd1 ~ diffatt,
cemodel1$b.r + 1.96 * sqrt(tau21)
# exploring moderators var.eff.size = varsmd, studynum = studyid,
# time since referral data = allplacement,
hist(allplacement$timemnths) rho = 0.8)
summary(allplacement$timemnths)
allplacement$timemean < ‐ allplacement$timemnths ‐ 31.07 cemodel16
# time since referral # IIT analysis
cemodel11 < ‐ robu(smd1 ~ timemean, table(allplacement$itt)
table(allplacement$newitt)
var.eff.size = varsmd, studynum = studyid, # CE MR with ROB ITT
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
150 of 192 | LITTELL ET AL.

cemodel17 < ‐ robu(smd1 ~ newitt, ###########################


######################
var.eff.size = varsmd, studynum = studyid, ##############
data = allplacement, # Overall analysis for arrest combined smd and lor
rho = 0.8) ##########################
###################
cemodel17 ##################
# ROB baseline equivalence # get the two data sets
table(allplacement$baseline) arrest < ‐ read.csv("arrest.csv")
table(allplacement$newbase) lorarrest < ‐ read.csv("lorarrest.csv")
# CE MR with ROB Baseline Equivalence # transform lors in placement and arrest to smds
cemodel18 < ‐ robu(smd1 ~ newbase, lorarrest$smd1 < ‐ lorarrest$lor1 * (sqrt(3)/pi)
lorarrest$varsmd < ‐ lorarrest$varlor * (3/(pi^2))
var.eff.size = varsmd, studynum = studyid, # take out variables not needed
data = allplacement, lorarrest1 < ‐ lorarrest%>%select(‐lor1, ‐varlor)
rho = 0.8) arrest1 < ‐ arrest%>%select(‐smdprelim)
#combine all arrest outcomes
cemodel18 allarrest < ‐ rbind.data.frame(lorarrest1, arrest1)
# ROB performance bias # explore the number of effect sizes per study
table(allplacement$perfbias) by_study2 < ‐ group_by(allarrest, studyid)
table(allplacement$newperf) effcount2 < ‐ summarize(by_study2, count2 = n())
# CE MR with ROB Performance Bias table(effcount2$count2)
cemodel19 < ‐ robu(smd1 ~ newperf, summary(allarrest$smd1)
# CE model
var.eff.size = varsmd, studynum = studyid, cemodelall2 < ‐ robu(formula = smd1 ~ 1, data = allarrest,
data = allplacement,
rho = 0.8) studynum = studyid, var.eff.size = varsmd,
rho = 0.8, small = TRUE)
cemodel19
# ROB selective reporting cemodelall2
table(allplacement$selreport) sensitivity(cemodelall2)
table(allplacement$newselreport) # prediction interval
# CE MR with ROB Selective Reporting # lower pi
cemodel110 < ‐ robu(smd1 ~ selreport, tau22 < ‐ as.numeric(cemodelall2$mod_info[3])
cemodelall2$b.r ‐ 1.96 * sqrt(tau22)
var.eff.size = varsmd, studynum = studyid, # upper pi
data = allplacement, cemodelall2$b.r + 1.96 * sqrt(tau22)
rho = 0.8) #explore moderators
# first center the timemnths variable
cemodel110 summary(allarrest$timemnths)
# ROB Sequence Generation hist(allarrest$timemnths)
table(allplacement$seqgen) allarrest$timemean = allarrest$timemnths ‐ 58.35
table(allplacement$newseqgen) # time since referral
# Sequence generation cemodel21 < ‐ robu(smd1 ~ timemean,
cemodel13 < ‐ robu(smd1 ~ newseqgen,
var.eff.size = varsmd, studynum = studyid,
var.eff.size = varsmd, studynum = studyid, data = allarrest,
data = allplacement, rho = 0.8)
rho = 0.8)
cemodel21
cemodel13 # Developer‐involved
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 151 of 192

table(allarrest$developinv) cemodel26
# developer involved # ROB baseline equivalence
cemodel22 < ‐ robu(smd1 ~ developinv, table(allarrest$baseline)
table(allarrest$newbase)
var.eff.size = varsmd, studynum = studyid, # CE MR with ROB Baseline Equivalence
data = allarrest, cemodel27 < ‐ robu(smd1 ~ newbase,
rho = 0.8)
var.eff.size = varsmd, studynum = studyid,
cemodel22 data = allarrest,
# US‐based rho = 0.8)
table(allarrest$US)
# US based cemodel27
cemodel23 < ‐ robu(smd1 ~ US, # ROB performance bias
table(allarrest$perfbias)
var.eff.size = varsmd, studynum = studyid, table(allarrest$newperf)
data = allarrest, # CE MR with ROB Performance Bias
rho = 0.8) cemodel18 < ‐ robu(smd1 ~ newperf,

cemodel23 var.eff.size = varsmd, studynum = studyid,


# data source data = allarrest,
# all are admin data rho = 0.8)
table(allarrest$datasource)
# explore risk of bias variables cemodel18
# Overall attrition and differential attrition # selective reporting
summary(allarrest$overatt) table(allarrest$selreport)
hist(allarrest$overatt) table(allarrest$newselreport)
summary(allarrest$diffatt) # CE MR with ROB selective reporting
hist(allarrest$diffatt) cemodel19 < ‐ robu(smd1 ~ newselreport,
# CE MR with overall attrition
cemodel24 < ‐ robu(smd1 ~ overatt, var.eff.size = varsmd, studynum = studyid,
data = allarrest,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = allarrest,
rho = 0.8) cemodel19
# sequence generation
cemodel24 table(allarrest$seqgen)
# CE MR with differential attrition table(allarrest$newseqgen)
cemodel25 < ‐ robu(smd1 ~ diffatt, # new sequence generation
cemodel27 < ‐ robu(smd1 ~ newseqgen,
var.eff.size = varsmd, studynum = studyid,
data = allarrest, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = allarrest,
rho = 0.8)
cemodel25
# IIT analysis cemodel27
table(allarrest$itt) ##############################
table(allarrest$newitt) #################################
# CE MR with ROB ITT # Overall analysis for substance abuse
cemodel26 < ‐ robu(smd1 ~ newitt, ###############################
################################
var.eff.size = varsmd, studynum = studyid, # get the two data sets
data = allarrest, substance < ‐ read.csv("substance.csv")
rho = 0.8) lorsubstance < ‐ read.csv("lorsubstance.csv")
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
152 of 192 | LITTELL ET AL.

# transform lors in placement and arrest to smds # CE model


lorsubstance$smd1 < ‐ lorsubstance$lor2 * (sqrt(3)/pi) cemodelall3 < ‐ robu(formula = smd1 ~ 1, data = allsubstance,
lorsubstance$varsmd < ‐ lorsubstance$varlor * (3/(pi^2))
## Recode the moderator and ROB variables in the lorsubstance data studynum = studyid, var.eff.size = varsmd,
set for rho = 0.8, small = TRUE)
## combined analysis
# data source ‐ compare youth to other cemodelall3
table(lorsubstance$datasource) sensitivity(cemodelall3)
lorsubstance$newsource < ‐ lorsubstance$datasource # prediction interval
lorsubstance$newsource < ‐ recode(lorsubstance$newsource, '1' = 1, # lower pi
'4' = 0) tau23 < ‐ as.numeric(cemodelall3$mod_info[3])
# rob itt cemodelall3$b.r ‐ 1.96 * sqrt(tau23)
table(lorsubstance$itt) # upper pi
# recode to low risk = 0 and high risk = 1 cemodelall3$b.r + 1.96 * sqrt(tau23)
lorsubstance$newitt < ‐ lorsubstance$itt # explore moderators
lorsubstance$newitt < ‐ recode(lorsubstance$newitt, '1' = 0, '3' = 1) # time since referral
# baseline equivalence hist(allsubstance$timemnths)
table(lorsubstance$baseline) summary(allsubstance$timemnths)
# recode to low + unclear risk = 0 and high risk = 1 allsubstance$timemean < ‐ allsubstance$timemnths ‐ 22.41
lorsubstance$newbase < ‐ lorsubstance$baseline # time since referral
lorsubstance$newbase < ‐ recode(lorsubstance$newbase, '1' = 0, cemodel31 < ‐ robu(smd1 ~ timemean,
'2' = 0, '3' = 1)
# ROB performance bias var.eff.size = varsmd, studynum = studyid,
table(lorsubstance$perfbias) data = allsubstance,
# recode to low + unclear risk = 0 and high risk = 1 rho = 0.8)
lorsubstance$newperf < ‐ lorsubstance$perfbias
lorsubstance$newperf < ‐ recode(lorsubstance$newperf, '1' = 0, cemodel31
'2' = 0, '3' = 1) # explore the US and developer moderators
# ROB selective reporting table(allsubstance$developinv)
table(lorsubstance$selreport) table(allsubstance$US)
# recode to low + unclear risk = 0 and high risk = 1 # these two variables are not completely overlapping so both will be
lorsubstance$newselreport < ‐ lorsubstance$selreport used in separate analyses
lorsubstance$newselreport < ‐ recode(lorsubstance$newselreport, # developer involved
'1' = 0, cemodel33 < ‐ robu(smd1 ~ developinv,
'2' = 0, '3' = 1)
# sequence generation var.eff.size = varsmd, studynum = studyid,
table(lorsubstance$seqgen) data = allsubstance,
# recode to low = 0 and unclear + high risk = 1 rho = 0.8)
lorsubstance$newseqgen < ‐ lorsubstance$seqgen
lorsubstance$newseqgen < ‐recode(lorsubstance$newseqgen, cemodel33
'1' = 0, '2' = 1) # US‐based
# take out variables not needed cemodel32 < ‐ robu(smd1 ~ US,
lorsubstance1 < ‐ lorsubstance%>%select(‐lor1, ‐varlor, ‐lor2)
substance1 < ‐ substance%>%select(‐smdprelim, ‐timemean) var.eff.size = varsmd, studynum = studyid,
#combine all substance outcomes data = allsubstance,
allsubstance < ‐ rbind.data.frame(lorsubstance1, substance1) rho = 0.8)
# explore the number of effect sizes per study
by_study2 < ‐ group_by(allsubstance, studyid) cemodel32
effcount2 < ‐ summarize(by_study2, count2 = n()) #data source
table(effcount2$count2) table(allsubstance$datasource)
summary(allsubstance$smd1) table(allsubstance$newsource)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 153 of 192

cemodel34 < ‐ robu(smd1 ~ newsource, table(allsubstance$perfbias)


table(allsubstance$newperf)
var.eff.size = varsmd, studynum = studyid, # CE MR with ROB Performance Bias
data = allsubstance, cemodel39 < ‐ robu(smd1 ~ newperf,
rho = 0.8)
var.eff.size = varsmd, studynum = studyid,
cemodel34 data = allsubstance,
# explore risk of bias variables rho = 0.8)
# Overall attrition and differential attrition
summary(allsubstance$overatt) cemodel39
hist(allsubstance$overatt) # ROB selective reporting
summary(allsubstance$diffatt) table(allsubstance$selreport)
hist(allsubstance$diffatt) table(allsubstance$newselreport)
# CE MR with overall attrition # CE MR with ROB Selective Reporting
cemodel35 < ‐ robu(smd1 ~ overatt, cemodel310 < ‐ robu(smd1 ~ selreport,

var.eff.size = varsmd, studynum = studyid, var.eff.size = varsmd, studynum = studyid,


data = allsubstance, data = allsubstance,
rho = 0.8) rho = 0.8)

cemodel35 cemodel310
# CE MR with differential attrition # sequence generation
cemodel36 < ‐ robu(smd1 ~ diffatt, table(allsubstance$seqgen)
table(allsubstance$newseqgen)
var.eff.size = varsmd, studynum = studyid, # sequence generation
data = allsubstance, cemodel311 < ‐ robu(smd1 ~ newseqgen,
rho = 0.8)
var.eff.size = varsmd, studynum = studyid,
cemodel36 data = allsubstance,
# rob itt rho = 0.8)
table(allsubstance$itt)
table(allsubstance$newitt) cemodel311
# CE MR with ROB ITT ###########################
cemodel37 < ‐ robu(smd1 ~ newitt, ####################################
# Overall analysis for delinquency combined smd and lor
var.eff.size = varsmd, studynum = studyid, ##########################
data = allsubstance, #####################################
rho = 0.8) # get the two data sets
delinquency < ‐ read.csv("delinquency.csv")
cemodel37 lordelinquency < ‐ read.csv("lordelinq.csv")
# baseline equivalence # transform lors in placement and arrest to smds
table(allsubstance$baseline) lordelinquency$smd1 < ‐ lordelinquency$lor1 * (sqrt(3)/pi)
table(allsubstance$newbase) lordelinquency$varsmd < ‐ lordelinquency$varlor * (3/(pi^2))
# CE MR with ROB Baseline Equivalence ## Recode the moderator and ROB variables in the lorsubstance data
cemodel38 < ‐ robu(smd1 ~ newbase, set for
## combined analysis
var.eff.size = varsmd, studynum = studyid, # data source ‐ all but 2 effect sizes are from youth
data = allsubstance, table(lordelinquency$datasource)
rho = 0.8) table(delinquency$datasource)
# rob itt
cemodel38 table(lordelinquency$itt)
# ROB performance bias # recode to low risk = 0 and high risk = 1
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
154 of 192 | LITTELL ET AL.

lordelinquency$newitt < ‐ lordelinquency$itt summary(alldelinq$timemnths)


lordelinquency$newitt < ‐ recode(lordelinquency$newitt, '1' = 0, alldelinq$timemean < ‐ alldelinq$timemnths ‐ 14.42
'3' = 1) # CE model
# baseline equivalence cemodel41 < ‐ robu(smd1 ~ timemean,
table(lordelinquency$baseline)
# recode to low + unclear risk = 0 and high risk = 1 var.eff.size = varsmd, studynum = studyid,
lordelinquency$newbase < ‐ lordelinquency$baseline data = alldelinq,
lordelinquency$newbase < ‐ recode(lordelinquency$newbase, '1' = 0, rho = 0.8)
'2' = 0, '3' = 1)
# ROB performance bias cemodel41
table(lordelinquency$perfbias) # developer involved and US‐based
# recode to low + unclear risk = 0 and high risk = 1 table(alldelinq$developinv)
lordelinquency$newperf < ‐ lordelinquency$perfbias table(alldelinq$US)
lordelinquency$newperf < ‐ recode(lordelinquency$newperf, '1' = 0, # developer‐involved
'2' = 0, '3' = 1) cemodel42 < ‐ robu(smd1 ~ developinv,
# ROB selective reporting
table(lordelinquency$selreport) var.eff.size = varsmd, studynum = studyid,
# recode to low + unclear risk = 0 and high risk = 1 data = alldelinq,
lordelinquency$newselreport < ‐ lordelinquency$selreport rho = 0.8)
lordelinquency$newselreport < ‐ recode(lordelinquency$newselreport,
'1' = 0, '2' = 0, '3' = 1) cemodel42
# sequence generation # US‐based
table(lordelinquency$seqgen) cemodel43 < ‐ robu(smd1 ~ US,
# recode to low = 0 and unclear + high risk = 1
lordelinquency$newseqgen < ‐ lordelinquency$seqgen var.eff.size = varsmd, studynum = studyid,
lordelinquency$newseqgen < ‐recode(lordelinquency$newseqgen, data = alldelinq,
'1' = 0, '2' = 1) rho = 0.8)
# take out variables not needed
lordelinquency1 < ‐ lordelinquency%>%select(‐lor1, ‐varlor) cemodel43
delinquency1 < ‐ delinquency%>%select(‐smdprelim, ‐timemean) # data source
alldelinq < ‐ rbind.data.frame(lordelinquency1, delinquency1) table(alldelinq$datasource)
# explore the number of effect sizes per study # explore risk of bias variables
by_study4 < ‐ group_by(alldelinq, studyid) # Overall attrition and differential attrition
effcount4 < ‐ summarize(by_study4, count4 = n()) summary(alldelinq$overatt)
table(effcount4$count4) hist(alldelinq$overatt)
summary(alldelinq$smd1) summary(alldelinq$diffatt)
# CE model hist(alldelinq$diffatt)
cemodel4 < ‐ robu(formula = smd1 ~ 1, data = alldelinq, # CE MR with overall attrition
cemodel44 < ‐ robu(smd1 ~ overatt,
studynum = studyid, var.eff.size = varsmd,
rho = 0.8, small = TRUE) var.eff.size = varsmd, studynum = studyid,
data = alldelinq,
cemodel4 rho = 0.8)
sensitivity(cemodel4)
# prediction interval cemodel44
# lower pi # CE MR with differential attrition
tau24 < ‐ as.numeric(cemodel4$mod_info[3]) cemodel45 < ‐ robu(smd1 ~ diffatt,
cemodel4$b.r ‐ 1.96 * sqrt(tau24)
# upper pi var.eff.size = varsmd, studynum = studyid,
cemodel4$b.r + 1.96 * sqrt(tau24) data = alldelinq,
# explore moderators rho = 0.8)
# time from referral
hist(alldelinq$timemnths) cemodel45
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 155 of 192

# IIT analysis cemodel410


table(alldelinq$itt) ############################
table(alldelinq$newitt) ###################################
# CE MR with ROB ITT # Overall analysis for PEER RELATIONS
cemodel46 < ‐ robu(smd1 ~ newitt, #############################
##################################
var.eff.size = varsmd, studynum = studyid, # there are no lor outcomes for peer relations
data = alldelinq, # this analysis is the same as the smd analysis
rho = 0.8) ################################
###############################
cemodel46 # Overall analysis for YOUTH BEHAVIOR SYMPTOMS
# ROB baseline equivalence ################################
table(alldelinq$baseline) ###############################
table(alldelinq$newbase) # get the two data sets
# CE MR with ROB Baseline Equivalence youthbeh < ‐ read.csv("youthbeh.csv")
cemodel47 < ‐ robu(smd1 ~ newbase, loryouthbeh < ‐ read.csv("loryouthbeh.csv")
# transform lors in placement and arrest to smds
var.eff.size = varsmd, studynum = studyid, loryouthbeh$smd1 < ‐ loryouthbeh$lor2 * (sqrt(3)/pi)
data = alldelinq, loryouthbeh$varsmd < ‐ loryouthbeh$varlor * (3/(pi^2))
rho = 0.8) # recode the moderators for the lor analysis
# data source
cemodel47 table(loryouthbeh$datasource)
# ROB performance bias loryouthbeh$youths < ‐ loryouthbeh$datasource
table(alldelinq$perfbias) loryouthbeh$parents < ‐ loryouthbeh$datasource
table(alldelinq$newperf) loryouthbeh$youths < ‐ recode(loryouthbeh$youths, '1' = 1, '2' = 0,
# CE MR with ROB Performance Bias '3' = 0, '5' = 0)
cemodel48 < ‐ robu(smd1 ~ newperf, loryouthbeh$parents < ‐ recode(loryouthbeh$parents, '1' = 0, '2' = 1,
'3' = 0, '5' = 0)
var.eff.size = varsmd, studynum = studyid, # IIT analysis
data = alldelinq, table(loryouthbeh$itt)
rho = 0.8) # recode to low risk = 0 and unclear + high risk = 1
loryouthbeh$newitt < ‐ loryouthbeh$itt
cemodel48 loryouthbeh$newitt < ‐ recode(loryouthbeh$newitt, '1' = 0, '2' = 1,
# ROB selective reporting '3' = 1)
table(alldelinq$selreport) # ROB baseline equivalence
table(alldelinq$newselreport) table(loryouthbeh$baseline)
# CE MR with ROB Selective Reporting # recode to low + unclear risk = 0 and high risk = 1
cemodel49 < ‐ robu(smd1 ~ selreport, loryouthbeh$newbase < ‐ loryouthbeh$baseline
loryouthbeh$newbase < ‐ recode(loryouthbeh$newbase, '1' = 0,
var.eff.size = varsmd, studynum = studyid, '2' = 0, '3' = 1)
data = alldelinq, # ROB performance bias
rho = 0.8) table(loryouthbeh$perfbias)
# recode to low + unclear risk = 0 and high risk = 1
cemodel49 loryouthbeh$newperf < ‐ loryouthbeh$perfbias
# ROB Sequence Generation loryouthbeh$newperf < ‐ recode(loryouthbeh$newperf, '1' = 0,
table(alldelinq$seqgen) '2' = 0, '3' = 1)
table(alldelinq$newseqgen) # ROB selective reporting
# Sequence generation table(loryouthbeh$selreport)
cemodel410 < ‐ robu(smd1 ~ newseqgen, # recode to low + unclear risk = 0 and high risk = 1
loryouthbeh$newselreport < ‐ loryouthbeh$selreport
var.eff.size = varsmd, studynum = studyid, loryouthbeh$newselreport < ‐ recode(loryouthbeh$newselreport,
data = alldelinq, '1' = 0, '2' = 0, '3' = 1)
rho = 0.8) # ROB Sequence Generation
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
156 of 192 | LITTELL ET AL.

table(loryouthbeh$seqgen) # US‐based
# recode sequence generation to low vs high+unclear cemodel63 < ‐ robu(smd1 ~ US,
loryouthbeh$newseqgen < ‐ loryouthbeh$seqgen
loryouthbeh$newseqgen < ‐recode(loryouthbeh$newseqgen, '1' = 0, var.eff.size = varsmd, studynum = studyid,
'2' = 1, '3' = 1) data = allyouthbeh,
# take out variables not needed rho = 0.8)
loryouthbeh1 < ‐ loryouthbeh%>%select(‐lor1, ‐varlor, ‐lor2)
youthbeh1 < ‐ youthbeh%>%select(‐smdprelim, ‐timemean) cemodel63
allyouthbeh < ‐ rbind.data.frame(loryouthbeh1, youthbeh1) # datasource
# explore the number of effect sizes per study table(allyouthbeh$datasource)
by_study6 < ‐ group_by(allyouthbeh, studyid) table(allyouthbeh$youths)
effcount6 < ‐ summarize(by_study6, count6 = n()) table(allyouthbeh$parents)
table(effcount6$count6) # data source
summary(allyouthbeh$smd1) cemodel64 < ‐ robu(smd1 ~ youths + parents,
hist(allyouthbeh$smd1)
# CE model var.eff.size = varsmd, studynum = studyid,
cemodel6 < ‐ robu(formula = smd1 ~ 1, data = allyouthbeh, data = allyouthbeh,
rho = 0.8)
studynum = studyid, var.eff.size = varsmd,
rho = 0.8, small = TRUE) cemodel64
# explore risk of bias variables
cemodel6 # Overall attrition and differential attrition
sensitivity(cemodel6) summary(allyouthbeh$overatt)
# prediction interval hist(allyouthbeh$overatt)
# lower pi summary(allyouthbeh$diffatt)
tau26 < ‐ as.numeric(cemodel6$mod_info[3]) hist(allyouthbeh$diffatt)
cemodel6$b.r ‐ 1.96 * sqrt(tau26) # CE MR with overall attrition
# upper pi cemodel65 < ‐ robu(smd1 ~ overatt,
cemodel6$b.r + 1.96 * sqrt(tau26)
# exploring moderators var.eff.size = varsmd, studynum = studyid,
# time since referral data = allyouthbeh,
summary(allyouthbeh$timemnths) rho = 0.8)
hist(allyouthbeh$timemnths)
allyouthbeh$timemean < ‐ allyouthbeh$timemnths ‐ 22.95 cemodel65
# time from referral # CE MR with differential attrition
cemodel61 < ‐ robu(smd1 ~ timemean, cemodel66 < ‐ robu(smd1 ~ diffatt,

var.eff.size = varsmd, studynum = studyid, var.eff.size = varsmd, studynum = studyid,


data = allyouthbeh, data = allyouthbeh,
rho = 0.8) rho = 0.8)

cemodel61 cemodel66
# US‐based and developer‐involved # CE MR with ROB ITT
table(allyouthbeh$developinv) table(allyouthbeh$newitt)
table(allyouthbeh$US) cemodel67 < ‐ robu(smd1 ~ newitt,
# developer‐involved
cemodel62 < ‐ robu(smd1 ~ developinv, var.eff.size = varsmd, studynum = studyid,
data = allyouthbeh,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = allyouthbeh,
rho = 0.8) cemodel67
# CE MR with ROB Baseline Equivalence
cemodel62 table(allyouthbeh$newbase)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 157 of 192

cemodel68 < ‐ robu(smd1 ~ newbase, # recode to low risk = 0 and unclear +high risk = 1
lorparentbeh$newitt < ‐ lorparentbeh$itt
var.eff.size = varsmd, studynum = studyid, lorparentbeh$newitt < ‐ recode(lorparentbeh$newitt, '1' = 0,
data = allyouthbeh, '2' = 1, '3' = 1)
rho = 0.8) # ROB baseline equivalence
table(lorparentbeh$baseline)
cemodel68 # recode to low + unclear risk = 0 and high risk = 1
# CE MR with ROB Performance Bias lorparentbeh$newbase < ‐ lorparentbeh$baseline
table(allyouthbeh$newperf) lorparentbeh$newbase < ‐ recode(lorparentbeh$newbase, '1' = 0,
cemodel69 < ‐ robu(smd1 ~ newperf, '2' = 0, '3' = 1)
# ROB performance bias
var.eff.size = varsmd, studynum = studyid, table(lorparentbeh$perfbias)
data = allyouthbeh, # recode to low + unclear risk = 0 and high risk = 1
rho = 0.8) lorparentbeh$newperf < ‐ lorparentbeh$perfbias
lorparentbeh$newperf < ‐ recode(lorparentbeh$newperf, '1' = 0,
cemodel69 '2' = 0, '3' = 1)
# CE MR with ROB Selective Reporting # ROB selective reporting
table(allyouthbeh$newselreport) table(lorparentbeh$selreport)
cemodel610 < ‐ robu(smd1 ~ selreport, # recode to low + unclear risk = 0 and high risk = 1
lorparentbeh$newselreport < ‐ lorparentbeh$selreport
var.eff.size = varsmd, studynum = studyid, lorparentbeh$newselreport < ‐ recode(lorparentbeh$newselreport,
data = allyouthbeh, '1' = 0, '2' = 0, '3' = 1)
rho = 0.8) # sequence generation
table(lorparentbeh$seqgen)
cemodel610 lorparentbeh$newseqgen < ‐ lorparentbeh$seqgen
# Sequence generation lorparentbeh$newseqgen < ‐recode(lorparentbeh$newseqgen,
table(allyouthbeh$newseqgen) '1' = 0, '2' = 1, '3'= 1)
cemodel611 < ‐ robu(smd1 ~ newseqgen, # take out variables not needed
lorparentbeh1 < ‐ lorparentbeh%>%select(‐lor1, ‐varlor, ‐lor2)
var.eff.size = varsmd, studynum = studyid, parentbeh1 < ‐ parentbeh%>%select(‐smdprelim, ‐timemean)
data = allyouthbeh, allparentbeh < ‐ rbind.data.frame(lorparentbeh1, parentbeh1)
rho = 0.8) # explore the number of effect sizes per study
by_study7 < ‐ group_by(allparentbeh, studyid)
cemodel611 effcount7 < ‐ summarize(by_study7, count7 = n())
############################# table(effcount7$count7)
################################## summary(allparentbeh$smd1)
# Overall analysis for PARENT BEHAVIOR SYMPTOMS # CE model
################################# cemodel7 < ‐ robu(formula = smd1 ~ 1, data = allparentbeh,
##############################
# get the two data sets studynum = studyid, var.eff.size = varsmd,
parentbeh < ‐ read.csv("parenbeh.csv") rho = 0.8, small = TRUE)
lorparentbeh < ‐ read.csv("lorparentbeh.csv")
# transform lors in placement and arrest to smds cemodel7
lorparentbeh$smd1 < ‐ lorparentbeh$lor2 * (sqrt(3)/pi) sensitivity(cemodel7)
lorparentbeh$varsmd < ‐ lorparentbeh$varlor * (3/(pi^2)) # prediction interval
# recode all the lor moderator # lower pi
# data source tau27 < ‐ as.numeric(cemodel7$mod_info[3])
table(lorparentbeh$datasource) cemodel7$b.r ‐ 1.96 * sqrt(tau27)
lorparentbeh$newsource < ‐ lorparentbeh$datasource # upper pi
lorparentbeh$newsource < ‐ recode(lorparentbeh$newsource, '4'= cemodel7$b.r + 1.96 * sqrt(tau27)
0, '2' = 1) # exploring moderators
# IIT analysis # time from referral
table(lorparentbeh$itt) hist(allparentbeh$timemnths)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
158 of 192 | LITTELL ET AL.

summary(allparentbeh$timemnths) cemodel75
allparentbeh$timemean < ‐ allparentbeh$timemnths ‐ 17.38 # CE MR with differential attrition
# CE meta‐regression cemodel76 < ‐ robu(smd1 ~ diffatt,
# time from referral
cemodel71 < ‐ robu(smd1 ~ timemean, var.eff.size = varsmd, studynum = studyid,
data = allparentbeh,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = allparentbeh,
rho = 0.8) cemodel76
# IIT analysis
cemodel71 table(allparentbeh$itt)
# US‐based and developer‐involved table(allparentbeh$newitt)
table(allparentbeh$US) # CE MR with ROB ITT
table(allparentbeh$developinv) cemodel77 < ‐ robu(smd1 ~ newitt,
# US‐based
cemodel73 < ‐ robu(smd1 ~ US, var.eff.size = varsmd, studynum = studyid,
data = allparentbeh,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = allparentbeh,
rho = 0.8) cemodel77
# ROB baseline equivalence
cemodel73 table(allparentbeh$baseline)
# developer‐involved table(allparentbeh$newbase)
cemodel72 < ‐ robu(smd1 ~ developinv, # CE MR with ROB Baseline Equivalence
cemodel78 < ‐ robu(smd1 ~ newbase,
var.eff.size = varsmd, studynum = studyid,
data = allparentbeh, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = allparentbeh,
rho = 0.8)
cemodel72
# data source cemodel78
table(allparentbeh$datasource) # ROB performance bias
table(allparentbeh$newsource) table(allparentbeh$perfbias)
# data source table(allparentbeh$newperf)
cemodel74 < ‐ robu(smd1 ~ newsource, # CE MR with ROB Performance Bias
cemodel79 < ‐ robu(smd1 ~ newperf,
var.eff.size = varsmd, studynum = studyid,
data = allparentbeh, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = allparentbeh,
rho = 0.8)
cemodel74
# explore risk of bias variables cemodel79
# Overall attrition and differential attrition # ROB selective reporting
summary(allparentbeh$overatt) table(allparentbeh$selreport)
hist(allparentbeh$overatt) table(allparentbeh$newselreport)
summary(allparentbeh$diffatt) # CE MR with ROB Selective Reporting
hist(allparentbeh$diffatt) cemodel710 < ‐ robu(smd1 ~ selreport,
# CE MR with overall attrition
cemodel75 < ‐ robu(smd1 ~ overatt, var.eff.size = varsmd, studynum = studyid,
data = allparentbeh,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = allparentbeh,
rho = 0.8) cemodel710
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 159 of 192

# sequence generation lorfamily$newseqgen < ‐ lorfamily$seqgen


table(allparentbeh$seqgen) lorfamily$newseqgen < ‐recode(lorfamily$newseqgen, '1' = 0, '2' = 1,
table(allparentbeh$newseqgen) '3'= 1)
# sequence generation # take out variables not needed
cemodel711 < ‐ robu(smd1 ~ newseqgen, lorfamily1 < ‐ lorfamily%>%select(‐lor1, ‐varlor)
family1 < ‐ family%>%select(‐smdprelim, ‐timemean)
var.eff.size = varsmd, studynum = studyid, allfamily < ‐ rbind.data.frame(lorfamily1, family1)
data = allparentbeh, # explore the number of effect sizes per study
rho = 0.8) by_study8 < ‐ group_by(allfamily, studyid)
effcount8 < ‐ summarize(by_study8, count8 = n())
cemodel711 table(effcount8$count8)
############################### summary(allfamily$smd1)
################################ # CE model
# Overall analysis for FAMILY FUNCTION cemodel8 < ‐ robu(formula = smd1 ~ 1, data = allfamily,
#####################################
########################## studynum = studyid, var.eff.size = varsmd,
# get the two data sets rho = 0.8, small = TRUE)
family < ‐ read.csv("family.csv")
lorfamily < ‐ read.csv("lorfamily.csv") cemodel8
# transform lors in placement and arrest to smds sensitivity(cemodel8)
lorfamily$smd1 < ‐ lorfamily$lor1 * (sqrt(3)/pi) # prediction interval
lorfamily$varsmd < ‐ lorfamily$varlor * (3/(pi^2)) # lower pi
# recode all the lor moderator tau28 < ‐ as.numeric(cemodel8$mod_info[3])
# data source cemodel8$b.r ‐ 1.96 * sqrt(tau28)
table(lorfamily$datasource) # upper pi
lorfamily$newsource < ‐ lorfamily$datasource cemodel8$b.r + 1.96 * sqrt(tau28)
lorfamily$newsource < ‐ recode(lorfamily$newsource, '4'= 0, '2' = 1) #exploring moderators
# IIT analysis hist(allfamily$timemnths)
table(lorfamily$itt) summary(allfamily$timemnths)
# recode to low risk = 0 and unclear +high risk = 1 allfamily$timemean < ‐ allfamily$timemnths ‐ 15.92
lorfamily$newitt < ‐ lorfamily$itt # time since referral
lorfamily$newitt < ‐ recode(lorfamily$newitt, '1' = 0, '2' = 1, cemodel81 < ‐ robu(smd1 ~ timemean,
'3' = 1)
# ROB baseline equivalence var.eff.size = varsmd, studynum = studyid,
table(lorfamily$baseline) data = allfamily,
# recode to low + unclear risk = 0 and high risk = 1 rho = 0.8)
lorfamily$newbase < ‐ lorfamily$baseline
lorfamily$newbase < ‐ recode(lorfamily$newbase, '1' = 0, '2' = 0, cemodel81
'3' = 1) # developer‐involved and US‐based
# ROB performance bias table(allfamily$US)
table(lorfamily$perfbias) table(allfamily$developinv)
# recode to low + unclear risk = 0 and high risk = 1 # US‐based
lorfamily$newperf < ‐ lorfamily$perfbias cemodel82 < ‐ robu(smd1 ~ US,
lorfamily$newperf < ‐ recode(lorfamily$newperf, '1' = 0, '2' = 0,
'3' = 1) var.eff.size = varsmd, studynum = studyid,
# ROB selective reporting data = allfamily,
table(lorfamily$selreport) rho = 0.8)
# recode to low + unclear risk = 0 and high risk = 1
lorfamily$newselreport < ‐ lorfamily$selreport cemodel82
lorfamily$newselreport < ‐ recode(lorfamily$newselreport, '1' = 0, # developer‐involved
'2' = 0, '3' = 1) cemodel83 < ‐ robu(smd1 ~ developinv,
# sequence generation
table(lorfamily$seqgen) var.eff.size = varsmd, studynum = studyid,
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
160 of 192 | LITTELL ET AL.

data = allfamily, var.eff.size = varsmd, studynum = studyid,


rho = 0.8) data = allfamily,
rho = 0.8)
cemodel83
# data source cemodel88
table(allfamily$datasource) # ROB performance bias
table(allfamily$newsource) table(allfamily$perfbias)
# data source table(allfamily$newperf)
cemodel84 < ‐ robu(smd1 ~ newsource, # CE MR with ROB Performance Bias
cemodel89 < ‐ robu(smd1 ~ newperf,
var.eff.size = varsmd, studynum = studyid,
data = allfamily, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = allfamily,
rho = 0.8)
cemodel84
# explore risk of bias variables cemodel89
# Overall attrition and differential attrition # ROB selective reporting
summary(allfamily$overatt) table(allfamily$selreport)
hist(allfamily$overatt) table(allfamily$newselreport)
summary(allfamily$diffatt) # CE MR with ROB Selective Reporting
hist(allfamily$diffatt) cemodel810 < ‐ robu(smd1 ~ selreport,
# CE MR with overall attrition
cemodel85 < ‐ robu(smd1 ~ overatt, var.eff.size = varsmd, studynum = studyid,
data = allfamily,
var.eff.size = varsmd, studynum = studyid, rho = 0.8)
data = allfamily,
rho = 0.8) cemodel810
# sequence generation
cemodel85 table(allfamily$seqgen)
# CE MR with differential attrition table(allfamily$newseqgen)
cemodel86 < ‐ robu(smd1 ~ diffatt, # sequence‐generation
cemodel811 < ‐ robu(smd1 ~ newseqgen,
var.eff.size = varsmd, studynum = studyid,
data = allfamily, var.eff.size = varsmd, studynum = studyid,
rho = 0.8) data = allfamily,
rho = 0.8)
cemodel86
# IIT analysis cemodel811
table(allfamily$itt) ###############################
table(allfamily$newitt) ################################
# CE MR with ROB ITT # Overall analysis for school
cemodel87 < ‐ robu(smd1 ~ newitt, #############################
##################################
var.eff.size = varsmd, studynum = studyid, # get the two data sets
data = allfamily, school < ‐ read.csv("school.csv")
rho = 0.8) lorschool < ‐ read.csv("lorschool.csv")
# transform lors in placement and arrest to smds
cemodel87 lorschool$smd1 < ‐ lorschool$lor1 * (sqrt(3)/pi)
# ROB baseline equivalence lorschool$varsmd < ‐ lorschool$varlor * (3/(pi^2))
table(allfamily$baseline) # recode all the lor moderator
table(allfamily$newbase) # data source
# CE MR with ROB Baseline Equivalence table(lorschool$datasource)
cemodel88 < ‐ robu(smd1 ~ newbase, lorschool$newsource < ‐ lorschool$datasource
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 161 of 192

lorschool$newsource < ‐ recode(lorschool$newsource, '1' = 1, sensitivity(cemodel9)


'4' = 0) # prediction interval
# IIT analysis # lower pi
table(lorschool$itt) tau29 < ‐ as.numeric(cemodel9$mod_info[3])
# recode to low risk = 0 and unclear +high risk = 1 cemodel9$b.r ‐ 1.96 * sqrt(tau29)
lorschool$newitt < ‐ lorschool$itt # upper pi
lorschool$newitt < ‐ recode(lorschool$newitt, '1' = 0, '2' = 1, cemodel9$b.r + 1.96 * sqrt(tau29)
'3' = 1) # time from referral
# ROB baseline equivalence hist(allschool$timemnths)
table(lorschool$baseline) summary(allschool$timemnths)
# recode to low + unclear risk = 0 and high risk = 1 allschool$timemean < ‐ allschool$timemnths ‐ 21.81
lorschool$newbase < ‐ lorschool$baseline # time from referral
lorschool$newbase < ‐ recode(lorschool$newbase, '1' = 0, '2' = 0, cemodel91 < ‐ robu(smd1 ~ timemean,
'3' = 1)
# do this for smds though all low risk effect sizes may be from one var.eff.size = varsmd, studynum = studyid,
study data = allschool,
table(school$baseline) rho = 0.8)
school$newbase < ‐ school$baseline
school$newbase < ‐ recode(school$newbase, '1' = 0, '2' = 0, cemodel91
'3' = 1) # US and developer‐involved
# ROB performance bias table(allschool$US)
table(lorschool$perfbias) table(allschool$developinv)
# recode to low + unclear risk = 0 and high risk = 1 # developer‐involved
lorschool$newperf < ‐ lorschool$perfbias cemodel93 < ‐ robu(smd1 ~ developinv,
lorschool$newperf < ‐ recode(lorschool$newperf, '1' = 0, '2' = 0,
'3' = 1) var.eff.size = varsmd, studynum = studyid,
# ROB selective reporting data = allschool,
table(lorschool$selreport) rho = 0.8)
# recode to low + unclear risk = 0 and high risk = 1
lorschool$newselreport < ‐ lorschool$selreport cemodel93
lorschool$newselreport < ‐ recode(lorschool$newselreport, '1' = 0, # US‐based
'2' = 0, '3' = 1) cemodel932 < ‐ robu(smd1 ~ US,
# sequence generation
table(lorschool$seqgen) var.eff.size = varsmd, studynum = studyid,
lorschool$newseqgen < ‐ lorschool$seqgen data = allschool,
lorschool$newseqgen < ‐recode(lorschool$newseqgen, '1' = 0, '2' = 1, rho = 0.8)
'3'= 1)
# take out variables not needed cemodel932
lorschool1 < ‐ lorschool%>%select(‐lor1, ‐varlor) # data source
school1 < ‐ school%>%select(‐smdprelim, ‐timemean) table(allschool$datasource)
allschool < ‐ rbind.data.frame(lorschool1, school1) table(allschool$newsource)
# explore the number of effect sizes per study # data source
by_study9 < ‐ group_by(allschool, studyid) cemodel94 < ‐ robu(smd1 ~ newsource,
effcount9 < ‐ summarize(by_study9, count9 = n())
table(effcount9$count9) var.eff.size = varsmd, studynum = studyid,
summary(allschool$smd1) data = allschool,
# CE model rho = 0.8)
cemodel9 < ‐ robu(formula = smd1 ~ 1, data = allschool,
cemodel94
studynum = studyid, var.eff.size = varsmd, # explore risk of bias variables
rho = 0.8, small = TRUE) # Overall attrition and differential attrition
summary(allschool$overatt)
cemodel9 hist(allschool$overatt)
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
162 of 192 | LITTELL ET AL.

summary(allschool$diffatt) data = allschool,


hist(allschool$diffatt) rho = 0.8)
# CE MR with overall attrition
cemodel95 < ‐ robu(smd1 ~ overatt, cemodel910
table(allschool$seqgen)
var.eff.size = varsmd, studynum = studyid, table(allschool$newseqgen)
data = allschool, # sequence generation
rho = 0.8) cemodel911 < ‐ robu(smd1 ~ newseqgen,

cemodel95 var.eff.size = varsmd, studynum = studyid,


# CE MR with differential attrition data = allschool,
cemodel96 < ‐ robu(smd1 ~ diffatt, rho = 0.8)

var.eff.size = varsmd, studynum = studyid, cemodel911


data = allschool,
rho = 0.8)
H. Study ID coding key
cemodel96 Sorted by study name Sorted by study ID
# IIT analysis
ID Study name ID Study name
table(allschool$itt)
table(allschool$newitt) 43 Asscher 2013 3 Borduin 1990

# CE MR with ROB ITT 3 Borduin 1990 4 Borduin 1995


cemodel97 < ‐ robu(smd1 ~ newitt, 4 Borduin 1995 5 Henggeler 1992

30 Borduin 2009 6 Henggeler 1997


var.eff.size = varsmd, studynum = studyid,
51 Butler 2011 8 Henggeler 1999a
data = allschool,
rho = 0.8) 65 Fonagy 2017 9 Henggeler 1999b

48 Fonagy 2018 10 Lescheid 2002


cemodel97
32 Glisson 2010 11 Weiss 2013
# ROB baseline equivalence
5 Henggeler 1992 12 Ogden 2004
table(allschool$baseline)
table(allschool$studyid, allschool$baseline) 6 Henggeler 1997 13 Sundell 2006
# ROB performance bias 8 Henggeler 1999a 14 Timmons‐Mitchell 2006
table(allschool$perfbias)
9 Henggeler 1999b 16 Miller 1998
table(allschool$newperf)
23 Henggeler 2006 19 Rowland 2005
# CE MR with ROB Performance Bias
cemodel99 < ‐ robu(smd1 ~ newperf, 10 Lescheid 2002 23 Henggeler 2006

34 Letourneau 2009 30 Borduin 2009


var.eff.size = varsmd, studynum = studyid, 16 Miller 1998 31 Swenson 2010
data = allschool,
12 Ogden 2004 32 Glisson 2010
rho = 0.8)
19 Rowland 2005 34 Letourneau 2009

cemodel99 13 Sundell 2006 43 Asscher 2013


# ROB selective reporting 31 Swenson 2010 48 Fonagy 2018
table(allschool$selreport)
14 Timmons‐Mitchell 2006 51 Butler 2011
table(allschool$newselreport)
# CE MR with ROB Selective Reporting 70 Wagner 2019 65 Fonagy 2017

cemodel910 < ‐ robu(smd1 ~ selreport, 11 Weiss 2013 70 Wagner 2019

var.eff.size = varsmd, studynum = studyid,


18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LITTELL ET AL. | 163 of 192

I. Glossary: Outcome measures used in MST trials IPPA Inventory of Parent and Peer Attachment

ISEL Interpersonal Support Evaluation List


ABAS Antisocial Beliefs and Attitudes Scale
LEE Levels of Expressed Emotion
ABC Adult Behavior Checklist
Loeber Loeber Caregiver Questionnaire (parenting style and
AIM Autism Impact Measure
relationship)
APQ Alabama Parenting Questionnaire
MFQ Mood and Feelings Questionnaire (depression)
APSD Antisocial Process Screening Device
MIDSA Multidimensional Inventory of Development, Sex, and
ARQ Adolescence Resilience Questionnaire Aggression

ASBI Adolescent Sexual Behaviour Inventory MPRI Missouri Peer Relations Inventory

ASR Adult Self Report (young adult version of YSR) NPQ Nijmegen Parenting Questionnaire

AUDIT Alcohol Use Disorder Identification Test NRI Network of the Relationship Inventory

BPI Behavior Problems Inventory PACS Parent‐Adolescent Communication Scale

BPQ Basic Peer Questionnaire PAI Personality Assessment Inventory (externalizing,


internalizing scales)
BSI Brief Symptom Inventory (short version of the SCL‐90)
PAQ Parental Authority Questionnaire (authoritarian,
CAFAS Child and Adolescent Functional Assessment Scale
authoritative, permissive)
CAI Child Attachment Interview
PCSYR Psychological Control Scale, Youth Report
CATQ Children's Automatic Thought Questionnaire (personal
PDI Parenting Dimensions Inventory
failure, hostility scales)
PEI Personal Experience Inventory (drug use)
CBCL Child Behavior Checklist
PPCI Parent Peer Conformity Inventory
CBCL‐ Child Behavior Checklist – Post Traumatic Stress
PTSD Disorder scale PPQ Parenting Practices Questionnaire

Connors 20‐item assessment of hyperactivity and impulsivity PSI Parenting Stress Index

CII Coder Impressions Inventory PST Positive Symptom Total of the BSI

CRPBI Children's Report of Parental Behavior Inventory PYS Pittsburgh Youth Study (Bad Friends scale)

CTS2 Conflict Tactics Scale version 2 RBPC Revised Behavior Problem Checklist

DAS Dyadic Adjustment Scale RDT Revealed Differences Task

DBD Disruptive Behaviors Disorder rating scales (ODD and SCL‐90 Symptom CheckList 90, R version is revised
CD subscales)
SCPQ Social Competence with Peers Questionnaire
DUDIT Drug Use Disorder Identification Test
SDQ Strengths and Difficulties Questionnaire
EQ‐5D EuroQol 5 dimensions (health related quality of life)
SMFQ Short Mood and Feelings Questionnaire (depression)
FACES Family Adaptability and Cohesion Evaluation Scales
SOC Sense of Coherence scale
FAM Family Assessment Measure
SRD Self‐Reported Delinquency
FFS Family, Friends, and Self scale (parental control)
SSQ Social Support Questionnaire
GHQ General Health Questionnaire (psychopathology)
SSRS Social Skills Rating System
GSI Global Severity Index (of the SCL‐90‐R or the BSI)
TRF Teacher Report form of the CBCL
HSC Hopelessness Scale for Children
TSCC Trauma Symptom Checklist for Children
ICU Inventory of Callous and Unemotional traits
YRBS Youth Risk Behavior Survey
or
ICUT YSR Youth Self Report of the CBCL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 1.1.
164 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|165 of 192

Analysis 1.2.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 1.3.

Analysis 1.4.
166 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|167 of 192

Analysis 1.5.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 1.6.

Analysis 1.7.
168 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|169 of 192

Analysis 2.1.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 2.2.
170 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|171 of 192

Analysis 2.3.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 2.4.
172 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|173 of 192

Analysis 2.5.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 2.6.
174 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|175 of 192

Analysis 2.7.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 3.1.
176 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|177 of 192

Analysis 3.2.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 3.3.
178 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|179 of 192

Analysis 4.1.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 4.2.
180 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|181 of 192

Analysis 4.3.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 5.1.
182 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|183 of 192

Analysis 6.1.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 6.2.
184 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|185 of 192

Analysis 6.3.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 6.4.

Analysis 6.5.
186 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|187 of 192

Analysis 6.6.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 7.1.
188 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|189 of 192

Analysis 7.2.

Analysis 7.3.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 8.1.
190 of 192
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|191 of 192

Analysis 8.2.
ET AL.
LITTELL
18911803, 2021, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cl2.1158 by Readcube (Labtiva Inc.), Wiley Online Library on [28/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
| ET AL.
LITTELL

Analysis 9.1.

Analysis 9.2.
192 of 192

You might also like