Youngstrom Et Al. (2018) PGI Short Forms Anchor Pub

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Journal of Clinical Child & Adolescent Psychology

ISSN: 1537-4416 (Print) 1537-4424 (Online) Journal homepage: http://www.tandfonline.com/loi/hcap20

Developing and Validating Short Forms of the


Parent General Behavior Inventory Mania
and Depression Scales for Rating Youth Mood
Symptoms

Eric A. Youngstrom, Anna Van Meter, Thomas W. Frazier, Jennifer Kogos


Youngstrom & Robert L. Findling

To cite this article: Eric A. Youngstrom, Anna Van Meter, Thomas W. Frazier, Jennifer Kogos
Youngstrom & Robert L. Findling (2018): Developing and Validating Short Forms of the Parent
General Behavior Inventory Mania and Depression Scales for Rating Youth Mood Symptoms,
Journal of Clinical Child & Adolescent Psychology, DOI: 10.1080/15374416.2018.1491006

To link to this article: https://doi.org/10.1080/15374416.2018.1491006

Published online: 24 Jul 2018.

Submit your article to this journal

Article views: 28

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=hcap20
Journal of Clinical Child & Adolescent Psychology, 00(00), 1–16, 2018
Copyright © Society of Clinical Child & Adolescent Psychology
ISSN: 1537-4416 print/1537-4424 online
DOI: https://doi.org/10.1080/15374416.2018.1491006

Developing and Validating Short Forms of the Parent


General Behavior Inventory Mania and Depression
Scales for Rating Youth Mood Symptoms
Eric A. Youngstrom
Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill

Anna Van Meter


Department of Psychiatry Research, Zucker Hillside Hospital

Thomas W. Frazier
Science Department, Autism Speaks

Jennifer Kogos Youngstrom


Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill

Robert L. Findling
Department of Psychiatry, Johns Hopkins University

To develop short forms of parent-rated mania and depression scales, evaluating their reliability,
content coverage, criterion validity, and diagnostic accuracy. Caregivers completed the Parent
General Behavior Inventory about their youth 5–18 years of age seeking outpatient mental health
services at either an academic medical clinic (n = 617) or urban community mental health center
(n = 530), along with other rating scales. Families also completed a semistructured Kiddie Schedule
for Affective Disorders and Schizophrenia interview, with the rating scales masked during diagnosis.
Ten-item short forms and projections of their psychometrics (vs. the full-length 46-item Depression
and 28-item Hypomanic/Biphasic scales) were built in the academic sample and then externally
cross-validated in the community sample. The mania and two depression short forms maintained
high reliability (αs > .87 across both samples); high correlations with the full-length scales (rs> .93);
excellent convergent and discriminant validity with mood, behavior, and demographic criteria; and
diagnostic accuracy undiminished compared to using the full-length scales. Present analyses devel-
oped and externally cross-validated 10-item short forms that maintain high reliability and content
coverage and show strong criterion validity and diagnostic accuracy—even when used in an
independent sample with markedly different demographics and referral patterns. The short forms
appear useful in clinical applications, including screening and initial evaluation, as well as in
research settings, where they offer an inexpensive quantitative score. Future work should further
evaluate sensitivity to treatment effects. The short forms are available in more than a dozen
translations.

Assessment of mood disorder remains an area of great


Correspondence should be addressed to Eric A. Youngstrom, unmet need in healthcare. Depression is a common comor-
Department of Psychology and Neuroscience, University of North
Carolina at Chapel Hill, CB #3270, Davie Hall, Chapel Hill, NC 27599-
bidity and often undiagnosed in pediatric settings, and bipo-
3270. E-mail: eay@unc.edu lar disorder composes a significant portion of mood
Color versions of one or more of the figures in the article can be found disorders and a disproportionate degree of the risk for
online at www.tandfonline.com/hcap.
2 YOUNGSTROM ET AL.

substance misuse, risky behavior, and self-harm (Birmaher, thus did not preserve the original factor structure (Smith,
2013; Stewart et al., 2012). The General Behavior Inventory McCarthy, & Anderson, 2000). The items on the 10M were
(GBI; Depue et al., 1981) was originally developed as a selected primarily to maximize the discriminative validity,
self-report measure of depressive and hypomanic symptoms with maximizing content coverage as a secondary goal.
for use with young adults. It has accrued substantial evi- Evidence-based medicine (Straus, Glasziou, Richardson,
dence of validity, including exceptional reliability estimates & Haynes, 2011) and psychological evidence-based assess-
(Depue et al., 1981), criterion validity including associations ment (Youngstrom, Van Meter, et al., 2017) both emphasize
with family history (Klein, Depue, & Slater, 1986), associa- the meaning of test results in terms of updated probabilities
tions with cortisol and dopamine metabolites (Depue, (not as diagnostic). The concept is literally centuries old,
Kleiman, Davis, Hutchinson, & Krauss, 1985; Depue, known as Bayes’ theorem. In older discussions of diagnostic
Luciana, Arbisi, Collins, & Leon, 1994), diagnostic discri- accuracy, the Bayesian posterior probabilities went by various
minative validity (Depue et al., 1981; Youngstrom et al., names; in psychology their most common aliases were the
2018), longitudinal prediction of progression to a mood positive and negative predictive powers, or predictive values
disorder (Alloy et al., 2011; Klein & Depue, 1984), and (Youngstrom, Van Meter, et al., 2017). They are heavily
sensitivity to treatment effects (Findling et al., 2003). influenced by the prior probability of the condition, often
The Parent General Behavior Inventory (PGBI; measured as the base rate in a given setting. A high risk
Youngstrom, Findling, Danielson, & Calabrese, 2001) is score that conveys a sevenfold increase in the odds of a
an adaptation of the GBI, where the primary caregiver diagnosis would move a 50% prior probability up to a revised
completes the GBI as a description of the child’s mood 87.5% probability, whereas the same result would connote a
and behavior. It also has shown considerable evidence of 26.9% probability in a setting where the base rate was 5%.
validity as a measure of depressive and hypomanic, bipha- Even if the test itself shows good generalizability of psycho-
sic, or mixed-mood symptoms (Youngstrom et al., 2001). metric properties across samples, the interpretation of an
The PGBI yields two scales: Depression (46 items, α > .95 individual score needs to be locally contextualized, taking
in multiple samples) and Hypomanic/Biphasic (28 items, α base rates and referral patterns into consideration. This has
> .94 in multiple samples). When parents have used the been the stumbling block preventing effective application of
PGBI to describe the mood symptoms of youths 5–18 years assessment results. A variety of solutions are now available,
of age, the scales are reliable, do an excellent job of dis- including probability nomograms, online risk calculators, and
criminating cases with mood disorders from those with smartphone apps that can combine prior probabilities with
other psychiatric conditions, and appear to be highly sensi- assessment findings (Youngstrom, Van Meter, et al., 2017).
tive to treatment effects (Findling et al., 2003; Youngstrom The preferred effect size for the assessment results to get
et al., 2001; Youngstrom et al., 2005). In a recent meta- incorporated easily in the Bayesian interpretation is a diag-
analysis comparing the discriminative validity of all pub- nostic likelihood ratio (DiLR; Straus et al., 2011). The DiLR
lished measures of manic symptoms in youth, the PGBI is the ratio of two fractions: the portion of people with the
performed in the top echelon of measures, significantly target condition scoring in the reference range, divided by
outperforming several other widely used measures the portion of people without the target also scoring in the
(Youngstrom, Genzlinger, Egerton, & Van Meter, 2015). same range. In situations where an assessment gets inter-
However, the PGBI is long—73 core items (46 on the preted in a binary way, the DiLR for a positive test would be
Depression scale, 28 on the Hypomanic/Biphasic scale, with the sensitivity divided by the false alarm rate, and the DiLR
one item included on both in the official scoring), with an for a negative test would be the false negative rate divided
11th-grade reading level and complicated content, spanning by the specificity. Multiplying the odds of a diagnosis by the
10 pages in 11-point font. This makes it cumbersome to use DiLR yields the updated odds. Clinicians can interpret
as a screener or as an outcome measure. The high alphas DiLRs by using a probability nomogram (Figure 1), avoid-
suggest it should be possible to develop a short form that is ing the need to do any computation, or by putting the DiLR
still reliable, highly correlated with the full-length scale, and into probability calculators available through evidence-
preserves other desirable characteristics, such as high dis- based medicine websites and apps (Google “probability
criminative validity. calculator”). A DiLR smaller than 1 decreases the predicted
We developed a 10-item hypomanic/biphasic scale probability, 1.0 is a neutral result (not changing the prob-
(PGBI-10M) that maintained high internal consistency and ability at all), and greater than 1 increases the predicted
actually exceeded the full length version in discriminative probability. Values greater than 2 (or < .5) can be useful in
validity, not just in the training sample (Youngstrom, combination with other information, greater than 5 (or < .2)
Frazier, Findling, & Calabrese, 2008), but also in subse- are helpful, and greater than 10 (or < .1) are often clinically
quent independent applications (see meta-analysis; decisive.
Youngstrom, Genzlinger, Egerton, & Van Meter, 2015). The PGBI-10M was designed to provide maximally
The PGBI-10M is technically not a short form of the informative DiLRs, and it still was able to maintain high
PGBI, because it did not include a depression scale and internal consistency and strong content coverage. These
SHORT FORMS FOR PARENT GBI 3

The emphasis on high separation in score distribu-


.1 99 tions between groups with and without mood disorder—
the key feature for discriminative validity—also has
.2 advantages for the Jacobson benchmarks for clinically
significant change, increasing the clinical utility of the
.5 95
same short form as an outcome measure (Jacobson &
Truax, 1991). Essentially, the items were picked to max-
1 1000
90
imize the difference between sick and well states (or at
500 least, not moody). Indeed, the 10M showed similar or
2 200 80
better sensitivity to treatment effects as achieved with
100 the industry standard structured interviews (Findling et
50 70 al., 2009; Youngstrom et al., 2013). The accumulated
5
20 60 evidence of validity led to the PGBI-10M being
10 50
endorsed by the PhenX Toolkit (https://www.phenx
10
5 toolkit.org/) as the preferred measure of manic symp-
40
toms in youth, and it was the basis for enrollment in the
20 2 30 multisite Longitudinal Assessment of Manic
% 1 % Symptoms project (Horwitz et al., 2010), as well as an
30 .50 20
outcome measure in multiple clinical trials and a key
40 .20
10
predictor in brain imaging studies (Bebko et al., 2014).
50 .10
It has also been adopted as a risk assessment tool by
60 .05
5
advocacy groups (Depression and Bipolar Support
70 .02 Alliance, n.d.).
.01 What about depression? The PGBI includes even more
80 .005
2 items with depression content (46 on the scale, plus many
.002 items with “mixed” content). We performed similar ana-
90 .001 1 lyses on the depression items and found that it was
possible to develop two 10-item forms with balanced
95 .5 content and psychometrics (Youngstrom, Frazier,
Findling, & Calabrese, 2004, May). Having two forms
.2
introduces new possibilities in terms of avoiding retest
fatigue or biases. Short Form A was used in some treat-
99 .1
ment studies on the basis of unpublished analyses, and it
Pretest Probability Likelihood Ratio Posttest Probability performed well as an outcome measure (Youngstrom et
al., 2013).
FIGURE 1 Probability nomogram for combining diagnostic likelihood The goal of this article is to report the psychometrics of the
ratios with other information about cases to revise probability estimates. two depression forms, simultaneously evaluating the 10M with
Note: The nomogram combines the baseline probability of a diagnosis— each 10-item depression scale separately as short 20-item
usually based on clinical prevalence—with one or more DiLRs to arrive at a
forms that preserve the original scale structure of the GBI.
posterior probability. For a clinical example, see Van Meter et al. (2014).
We do this using two independent samples, an academic sam-
ple accrued at a university medical center (Findling et al.,
2005), and a community sample drawn from an urban commu-
nity mental health center (Youngstrom, Meyers, et al., 2005).
psychometric properties have been demonstrated in an We have used this combination of samples to evaluate the
extracted format in an independent sample, where the 10- generalizability of discriminative validity for the Achenbach
item form was used rather than embedding the items in the Child Behavior Checklist scales (Van Meter et al., 2017; Van
full-length version (Freeman et al., 2012). Internal consis- Meter et al., 2014), as well as to examine the stability of
tency and breadth of coverage are competing aims, and the different assessment algorithms (including some machine
PGBI-10M probably achieved this by dint of the complex learning models; Youngstrom, Halverson, Youngstrom,
item structure; most items include multiple symptoms, Lindhiem, & Findling, 2017). This will provide a strong test
increasing intercorrelation and coverage at the same time of the psychometrics of the short forms, as the community and
(see upcoming text for examples, or to retrieve the scales, academic samples differ substantially in terms of demographic
visit https://trello.com/b/dYUKlNRP/translated-measures- and clinical characteristics. We follow the steps outlined in
dashboard). Smith et al. (2000) for best practices in short form develop-
ment. The academic sample establishes the psychometrics of
4 YOUNGSTROM ET AL.

the original long form PGBI scales and empirical estimates of (item-level κ > .85), and all KSADS findings were reviewed by a
the short form parameters in the academic (training) sample, clinician in a Longitudinal Expert evaluating All Data (Spitzer,
followed by projections for internal consistency and the 1983) consensus meeting that integrated family psychiatric his-
correlation between the short and full-length form in a new tory and prior treatment history but recused all rating scales (to
sample. Then we report the empirical estimates from the prevent criterion contamination when evaluating discriminative
external cross-validation in the community sample. We validity).
describe the short form factor structure in both samples,
and test the discriminative validity of the short forms,
using the depression scale to predict the presence of any Measures
mood disorder, and the 10M scale to predict presence of any Parent General Behavior Inventory
bipolar spectrum disorder.
The PGBI adapted the GBI to be completed by a care-
giver to describe the mood symptoms of their youth over the
past year. The PGBI contains 73 items using a 4-point scale
METHOD
from 0 (never or hardly ever) to 3 (very often; almost con-
stantly). The two main scales are the Depression, containing
Participants
46 items, and the Hypomanic/Biphasic, with 28 items
The academic sample consists of families seeking services at (Depue’s GBI scoring counts one item on both scales). The
Case Western Reserve/University Hospitals of Cleveland; the Hypomanic/Biphasic scale includes a mix of “purely” hypo-
community sample consists of families presenting to manic items along with “mixed” items that address mood
Applewood Centers, a large, urban community mental health swings and rapid changes or extremes of mood. The mixed
center. Participants were families seeking outpatient mental items cross-load in factor analyses, and they induce a high
health services for youth between 5 and 18 years of age. correlation between the Depression and the Hypomanic/
Exclusion criteria were not being conversant with spoken Biphasic scale. They also describe a hallmark feature of
English and having a pervasive developmental or cognitive bipolar disorder, and they include some of the most statisti-
disability. Inclusion criteria were deliberately broad to maximize cally discriminating items between diagnostic groups
the generalizability of results. Families were paid for completing (Youngstrom, Frazier, Findling, & Calabrese, 2008). Thus,
the full-day interview, and the projects were supported by grants whereas the items might seem problematic according to some
from the Stanley Medical Research Institute (PI: RLF) and NIH conventions for factor analysis (e.g., “avoid double-barreled
R01 MH066647 (PI: EAY). The R01 supported the interview items”; DeVellis, 1991), they are important from conceptual
team that completed all of the interviews in the community and clinical perspectives, also showing strong validity in
sample, as well as a large portion of the academic sample. other respects.
Table 1 presents the descriptive statistics for participants
in both samples, as well as effect sizes for differences Mood Severity Ratings
between the two. The academic sample was heavily
enriched for bipolar disorder both by recruitment for parti- The interviewers also completed two widely used mood
cipation in clinical trials and referrals of offspring with severity rating scales. The Young Mania Rating Scale
bipolar parents from an adult mood disorders program. (YMRS; Young, Biggs, Ziegler, & Meyer, 1978) is an 11
The community sample had somewhat lower rates of item semistructured interview, with items rated on a 0-to-4
mood disorders and fewer bipolar spectrum disorders, and or 0-to-8 scale, integrating caregiver and youth descriptions
high rates of externalizing disorders and attention deficit/ with direct observation of mental status and clinical impres-
hyperactivity disorder (ADHD). The community sample sions. The Child Depression Rating Scale–Revised (CDRS-
reflected an urban catchment area, with lower socioeco- R; Poznanski, Miller, Salguero, & Kelsh, 1984) is a 17-item
nomic status, and most families being African American. semistructured interview, with items rated 1 to 5 or 1 to 7,
also integrating both informants’ responses with direct
observation. Raters were trained to rate the symptom as
Diagnoses present only if it was attributable to a mood disorder and
not some other psychiatric condition (cf. Yee et al., 2015).
All diagnoses were determined by a semistructured interview The YMRS and CDRS-R total scores provide a widely used
involving the parent and then the youth sequentially. The criterion for evaluating the convergent validity of the PGBI
Stanley-funded protocol used the Kiddie Schedule for short forms.
Affective Disorders and Schizophrenia–Present and Lifetime
version (KSADS-PL; Kaufman et al., 1997). The R01 protocol
Achenbach Child Behavior Checklist (CBCL)
combined the PL with the depression and mania modules from
the Washington University KSADS (Geller et al., 2001) to build Caregivers completed the CBCL about their child
crosswalks to other protocols. KSADS raters were highly trained (Achenbach, 1991; Achenbach & Rescorla, 2001). There
SHORT FORMS FOR PARENT GBI 5

TABLE 1
Demographics and Clinical Characteristics by Clinic Setting

Academic Clinica Community Clinicb Effect Sizec

Youth Demographics
Female 39% 41% .02
Age, M (SD) 11.5 (3.3) 10.7 (3.4) .24***
White 79% 6% −.73***
Family Incomed $36,700 $18,400 .75***
Clinical Characteristics (M & SD, or %)
YMRS 11.61 (12.06) 6.14 (8.35) .52***
CDRS-R 35.37 (16.35) 29.86 (12.98) .37***
CBCL Externalizing T Score 66.70 (11.72) 70.35 (9.60) −.34***
CBCL Internalizing T Score 64.93 (11.41) 63.10 (10.48) .17**
PGBI – Hypo/Biphasic Raw 25.11 (16.72) 19.94 (14.41) .33***
PGBI – Depression Raw 36.66 (25.51) 25.10 (22.00) .48***
10M Raw 10.28 (7.82) 7.59 (6.37) .38***
10Da Raw 9.13 (7.03) 5.71 (5.69) .53***
10Db Raw 9.08 (7.19) 5.59 (5.69) .53***
PGBI – Hypo/Biphasic POMP 30 (20) 24 (17) .33***
10M POMP 34 (26) 25 (21) .38***
PGBI – Depression POMP 27 (18) 18 (16) .48***
10Da POMP 30 (23) 19 (19) .53***
10Db POMP 30 (24) 19 (19) .53***
Number Axis I KSADS Diagnoses 2.2 (1.3) 2.7 (1.4) −.39***
Any Mood Disorder Diagnosis 68% 42% −.26***
Unipolar Depressive Disorder 24% 29% .05
Bipolar Spectrum Diagnosis 44% 13% −.34***
Any ADHD 55% 66% .11***
Any Oppositional Defiant Disorder 31% 38% .08*
Any Conduct Disorder 8% 13% .08**
Any Anxiety Disorder 14% 27% .16***
Any Posttraumatic Stress Disorder 2% 10% .17***

Note: YMRS = Young Mania Rating Scale; CDRS-R = Child Depression Rating Scale–Revised; CBCL = Child Behavior Checklist; PGBI = Parent
General Behavior Inventory; 10M = 10-item Mania short form; 10Da = 10-item Depression Short Form A; 10Db = 10-item Depression Short Form B;
POMP = Percentage of Maximum Possible; KSADS = Kiddie Schedule for Affective Disorders and Schizophrenia; ADHD = attention deficit/hyperactivity
disorder.
a
N = 617.
b
N = 530.
c
A φ for categorical variables (sex, race, diagnostic group), Cohen’s d for continuous variables (age, number of diagnoses, rating scales). A positive
coefficient means the effect was larger in the academic sample, and a negative coefficient means that the effect was larger in the community; the academic
parameter would underestimate the corresponding value in the community.
d
Income assessed via ranked bands.
*p < .05, **p < .005, ***p < .0005, two-tailed.

are 118 items rated from 0 (not true [as far as you know]) to (e.g., Youngstrom et al., 2001; Youngstrom et al., 2005).
2 (very true or often true). The CBCL produces several After checking missing data patterns and setting up propen-
scales, including the Externalizing and Internalizing scales, sity scores, we estimated descriptive statistics and effect
which broadly describe the degree of symptomatology that sizes (Cohen’s d or φ for categorical variables) to quantify
the youth experiences in each of these domains. The differences between samples.
Externalizing and Internalizing scores provided established We used the internal consistency estimates from the full-
benchmarks for comparison with the PGBI short forms. length scales in the academic sample and a target length of
10 items (the most that would still fit on one single-sided
page at an 11-point font) with the formulae from Smith et al.
Statistical Analyses
(2000) to project the internal consistency, coverage, criter-
Analyses used the academic sample as the “training” set to ion attenuation, and time saving entailed. We used both
develop estimates used to project the performance of the rational content mapping and factor analysis to investigate
short forms upon replication in the independent community the content coverage. Item response theory (IRT) analyses
sample. The sequence emulates the typical flow of research used a graded response model to estimate item characteris-
from a university setting to application in the community tics and reliability across levels of the mood trait.
6 YOUNGSTROM ET AL.

Criterion validity looked at correlations between the the short forms against the corresponding full-length and
short forms and the various established severity ratings, CBCL scales (DeLong, DeLong, & Clarke-Pearson, 1988).
diagnoses, and demographic variables. We included an Multilevel DiLRs were calculated, based on optimal cut-
array of variables expected to show convergent or discrimi- points. This method provides clinically useful information
nant validity, and we used Steiger’s test of dependent corre- regarding the likelihood of a diagnosis across a range of
lations to see whether there was significant change in possible scores. DiLRs of less than 1.0 are associated with
criterion validity between the short and long forms. For lower risk, DiLRs of 1.0 are associated with average risk,
the mania scale, the interview ratings of manic severity DiLRs between 2 and 5 represent a small increase in risk,
(YMRS score) provide a convergent validity criterion, as DiLRs between 5.0 and 10.0 are a moderate increase, and
would the point-biserial correlation with bipolar diagnoses. DiLRs larger than 10 can be diagnostic (Straus et al., 2011)
The correlations with interview ratings of depression
(CDRS-R) and diagnoses of any mood disorder also
would be expected to be at least medium-sized, given that Procedure
depressive and mixed-mood episodes are core aspects of All procedures were reviewed and approved by the university,
bipolar disorder (although not all youth who experience hospital, and community mental health center Institutional
mania will have had depressed or mixed states). The corre- Review Boards. After caregiver consent and youth assent, the
lation with unipolar mood diagnoses would include an arti- same interviewer met with the caregiver and youth sequen-
fact induced by including the bipolar cases (with high scores tially, using clinical judgment and reinterviewing to resolve
on the PGBI) in the “not unipolar depression” group, mix- discrepancies. The primary caregiver completed the full length
ing the subset with highest scores into the comparison 73-item PGBI while the youth completed the KSADS inter-
group. The mania score should also show a high correlation view. PGBI and CBCL scores were masked during the diag-
with Externalizing, based on the robust literature that bipo- nostic consensus process, and we followed the 25 STARD
lar disorder is associated with elevated Externalizing scores design and reporting guidelines (Bossuyt et al., 2003).
(Youngstrom et al., 2015), as well as the shared method
variance with the same informant completing the CBCL and
the PGBI (Podsakoff, MacKenzie, & Podsakoff, 2012). The RESULTS
mania scores would be expected to show smaller but sig-
nificant elevations in the presence of ADHD or disruptive Missing Data Analyses
behavior disorders, consistent with them being more chal-
lenging to distinguish clinically (Kim & Miklowitz, 2002) The largest source of missing data was inclusion of the CBCL
and contributing to false positive results in screening. There scales as a criterion measure, as these were not initially
is an age trend for manic symptom levels to decrease some gathered at the academic center, and needed to be copied
with age, possibly due to the decreases in comorbid ADHD from the intake packet at the community mental health center.
or disruptive behavior (Demeter et al., 2013). Even listwise deleting on all variables included in the study
For the depression scales, convergent correlations were maintained 80% of participants, and the variable level com-
expected to be highest with the CDRS-R Total and the CBCL pletion rate exceeded 92%. We constructed propensity scores
Internalizing Score (which also shares source/method variance (Guo & Fraser, 2010)—a probability summarizing how likely
with the PGBI, as both were questionnaires completed by the the person was to have incomplete data—using clinical and
same caregiver) and diagnoses of any mood disorder (as depres- demographic variables; we used the propensity score in sen-
sive symptoms are often elevated in bipolar spectrum disorders sitivity analyses (for criterion and discriminative validity).
as well as in unipolar depression). We expected moderate cor- The only variable that made a significant incremental contri-
relations with Externalizing and YMRS scores due to comor- bution to the propensity model was for non-White participants
bidity as well as mixed-mood presentations, with smaller to be slightly more likely to have missing data in the academic
positive correlations anticipated with age and female status, as sample (p = .038).
depressive symptoms often increase around adolescence
(Cyranowski, Frank, Young, & Shear, 2000).
Demographics and Descriptives
Finally, we used Receiver Operating Characteristic
(ROC) analyses to quantify the discriminative validity of Table 1 presents descriptive statistics for demographic and
the mania scale for identifying bipolar spectrum disorder clinical variables, including effect sizes for comparisons
versus all other diagnoses and the depression scales for between the academic (training) and community (external
separating cases with any mood disorder from all others validation) samples. The academic sample was much more
presenting to the clinic. DeLong’s tests compared whether White and affluent, with youth averaging a year older in
there was significant shrinkage in the areas under the curve age. The academic sample had much higher rates of bipolar
(AUCs) when the scales were used in the community sam- disorder (reflecting study recruitment priorities), and corre-
ple, and DeLong’s test for dependent AUCs benchmarked spondingly higher scores on the mania rating scales.
SHORT FORMS FOR PARENT GBI 7

TABLE 2
Projected and Empirical Estimates of Internal Consistency Reliability, Correlation with Full-Length Scale, and Length Reduction for Short Forms

Hypomanic/Biphasic Depression

Full Length
Items 28 46
Alpha (Academic) .940 .962
M Interitem Correlation .359 .355
Short Form Mania (10M) Depression-A (10Da) 10Db
Items 10 10 10
Projected Alpha .848 .846 .846
Observed Alpha–Academic .914 .901 .912
Observed Alpha–Community .881 .875 .879
Projected Correlation With Full .797 .814 .814
*Observed Correlation—Academic .949 .954 .941
*Observed Correlation—Community .944 .948 .932
Savings in Length 64% 78% 78%
Projected Validity Reduction 20% 18% 18%
Standard Error of the Measure 2.24 2.11 2.06
Standard Error of Difference 3.17 2.98 2.91
90% Critical Change 5.24 4.93 4.80
95% Critical Change 6.22 5.85 5.70

Note: 10M = 10-item Mania short form; 10Da = 10-item Depression Short Form A; 10Db = 10-item Depression Short Form B.
*Observed correlations are based on embedded item administration.

Although rates of unipolar depressive disorders were more items. The 10 items evaluated for the mania short form drew
similar, the percentage with any mood disorder was higher, from six of eight parcels with hypomanic/biphasic content.
and thus the average depression scale scores also were The content from “extreme mood and energy” (Parcel 1)
higher in the academic sample. Conversely, rates of was adequately represented by “mood never in the middle”
ADHD, oppositional defiant disorder, conduct disorder, items (Parcel 2), and the selected items included both “pure”
and posttraumatic stress disorder all were higher in the hypomanic and mixed items. The 10Dep A form includes
community sample, with the average CBCL Externalizing items from nine of the 12 potential parcels (obviously it is
score also being higher. The effect sizes ran the gamut from not possible to cover 12 parcels with 10 items), and 10Dep
small (differences in rates of oppositional defiant disorder) B represents seven parcels, including two of those not
to large (socioeconomic status and race). Based on all these directly represented on 10Dep A. Because the GBI contains
demographic and clinical differences, the community sam- only one item asking about suicidal ideation (#73), it was
ple provides a rigorous test of external validity. omitted from both short forms, as there was no way to
maintain parallel content, and asking about suicide also
raises liability issues that could create barriers to use in
Factor Structure and Content Coverage of Short Forms
some screening contexts.
The 73 items of the full length GBI and PGBI are usually We ran exploratory factor analyses on the mania and
scored as two scales, Depression and Hypomanic/Biphasic depression scales using the three most accurate decision
(Depue et al., 1981). Factor analyses of the GBI find two rules to determine the number of factors: the scree plot,
large factors corresponding to these, although there are other Velicer’s minimum average partials and Glorfeld’s extension
smaller factors, with the most common being a “mixed” of parallel analysis (with 1,000 simulations, and using the
factor (which Depue called “biphasic”). 95th percentile as the threshold). Analyses examined the 10
Prior work (Danielson, Youngstrom, Findling, & items for the mania and 10 items for each depression form
Calabrese, 2003; Youngstrom et al., 2001) grouped the 73 both jointly (30 items) and again with each subset of depres-
items into 20 parcels of three or four items each based on sion items separately (to see if the factor structure of the full
content coverage, and our prior investigations of factor PGBI was preserved with the reduced item set, recognizing
structure used the parcels instead of directly analyzing that users would typically administer only one of the two
items. Eight of the parcels contain items from the depression forms). This resulted in six analyses (three item
Hypomanic/Biphasic scale on the original PGBI, and the sets × two samples) and 18 decision rules; 15 indicated the
other 12 consist of items from the Depression scale. The hypothesized two-factor solution fit best. Including 30 items
parcels are all statistically homogeneous. To prioritize con- (both depression sets, as well as the 10 mania items)
tent coverage, we considered the parcels when selecting the resulted in Glorfeld’s extension of parallel analysis and
8 YOUNGSTROM ET AL.

minimum average partials preferring more than two factor embedded administration format. The 10M was adminis-
solutions in the academic (but not community) sample. tered in an extracted format to a subset of cases, but in a
When it emerged, the third factor contained “mixed” retest format with 1–2 weeks between administrations
items, as anticipated. We chose to retain the mixed items that confounds stability with coverage (Freeman et al.,
on conceptual grounds (mood and energy shifting from high 2012). Only one of the 10 items showed statistically
to low being a hallmark of bipolar disorders) and statistical significant differences in thresholds between the
grounds (these items were among the most discriminating embedded and extracted administration formats, and the
between cases with and without mood disorder). effect on total scores was negligible (Freeman et al.,
We evaluated the interpretability of the two-factor solution, 2012). The eight-day retest stability for the 10M was
allowing the factors to correlate. All of the solutions produced r = .64. As just described in the item selection, all short
excellent simple structure, with all items showing adequate to forms selected items to provide good content coverage,
strong loadings on the hypothesized factor and modest cross- pulling from different item parcels. The one possible
loadings (the largest cross-loading was .319, with a primary conceptual omission from content for the 10M was
loading of .464; and only four of 140 cross-loadings were grandiosity or inflated self-esteem (cf. Van Meter,
above .3). The median correlation between factors was .55. Burke, Kowatch, Findling, & Youngstrom, 2016).
Full details are available upon request, including syntax and
data to rerun.
Criterion Validity of the Short Forms: Convergent and
Discriminant Correlations
Reliability and Precision of the Short Forms
Validity of the 10M
Based on the average interitem correlation in the academic
sample for the full-length forms, the projected reliability (using Table 3 presents the criterion correlations for both the
Formula 1 in Smith et al., 2000) of a 10-item short form 28-item and the 10-item mania scales, sorted in descend-
approached an alpha of .85 for both the hypomanic and depres- ing order of expected criterion correlation. The pattern
sive item sets (see Table 2). Using the factor loadings and content of findings is consistent with a priori expectations for
coverage from the parcels, we constructed 10-item forms with the full-length scale, and even more so for the 10M. The
alphas exceeding the projected values in the academic sample. a priori ranking of expected correlation magnitude cor-
Impressively, the alphas remained higher than projected in the related with the observed coefficients r = .96 in the
community sample as well, with α = .88 for the 10M and both academic and r = .93 in the community sample. Using
10D forms (Table 2). Corrected item-total correlations for all Steiger’s test of dependent correlations, eight of 16
items were .47 or higher in the community sample, also demon- correlations showed small but statistically significant
strating good internal consistency. IRT analysis showed that all differences; strikingly, the difference always favored
of the short forms maintained good conditional reliability the criterion validity of the short form. The Steiger test
through the bulk of the latent trait (see Figure 2), with all three had exceptional statistical power because of the combi-
showing reliability above .80 between theta levels 1 SD below nation of the sample size and r = .94 correlation between
average to 3 SDs above the average trait level for each mood the 28-item and 10-item forms, so differences in criter-
score. Table 2 also reports the standard error of the measure and ion correlations as small as .03 achieved p < .01 sig-
an estimated of the standard error of the difference score for two nificance. As hypothesized, the 10M showed
administrations of the form. The standard error of the measure significantly higher correlations with the YMRS than
values were all between 2.1 and 2.2, and the standard error of the the CDRS-R (r = .60 vs. r = .11), t(614) = 10.60,
difference score values were 2.9 to 3.2, with 6-point changes p < .000005. Similarly, 10M scores correlated more
being large enough to be 95% confident that the patient was strongly with presence of a bipolar diagnosis versus a
showing reliable change (Jacobson & Truax, 1991). unipolar depressive disorder (r = .59 vs. −.18),
t = 13.65, p < .000005. The 10M scores correlated
more highly with the Externalizing than Internalizing
Content Coverage of the Short Forms
scores (r = .62 vs. .42), t = 6.19, p < .000005.
There is a trade-off between high internal consistency The more rigorous test is the external replication in
and breadth of coverage, and it is particularly challen- the community sample. As expected, the convergent
ging to balance these with a small item set. The pro- criterion correlations tended to be smaller in the com-
jected correlation between the short and full-length munity sample, reflecting the milder presentation of
scores (Formula 2 from Smith et al., 2000) was r = .80 bipolar cases and the high rate of other disorders that
for mania and .81 for the depression scales. The could elevate scores on manic symptom scale. Even so,
observed correlations were all r > .93 in the community the community data strongly replicated the patterns seen
sample. These are inflated by being based on an in the academic sample. The criterion correlations were
SHORT FORMS FOR PARENT GBI 9

PGBI - Mania Short Form Validity of the Depression Forms

Information Reliability Table 4 presents the criterion correlations for the full-
length PGBI Depression scale and the two 10-item
15 1.0
forms, sorted in descending order of expected magnitude
0.8
of correlation. The expected pattern held across all three
Information

10

Reliability
0.6 scales and both sample (r = .88–.91 between expected
0.4 rank and observed correlations). Again, the test of dif-
5
0.2 ferences between the criterion correlations had very high
0 0.0 statistical power, and differences as small as .03 were
-3 -2 -1 0 1 2 3 statistically significant. Only one correlation behaved
Theta (θ)
significantly differently than expected: The short forms
PGBI - Depression Short Form A correlated less with Internalizing than did the full-length
Information Reliability
scores, largest difference = .05 (r = .69 for full length
vs. r = .64 for 10Dep form B in the academic sample).
15 1.0
However, these decrements are not substantively mean-
0.8 ingful, as the correlations between the Internalizing
Information

10

Reliability
0.6 scale and both the long and short forms would be
5
0.4 considered “large” effect sizes.
0.2 The convergent correlations with CDRS-R, Internalizing,
0 0.0 and presence of any mood diagnosis all were significantly
-3 -2 -1 0 1 2 3 larger than the corresponding discriminant validity coefficient
Theta (θ)
for YMRS, Externalizing, or bipolar spectrum diagnoses,
PGBI - Depression Form B smallest t(527) = 6.20, p < .000000005 .

Information Reliability

20 1.0
Discriminative Validity of the Short Forms
15 0.8
ROC analyses quantified the discriminative accuracy of the
Information

Reliability

0.6 full-length and short forms. We compared the 10M against


10
0.4 the full-length Hypomanic/Biphasic scale and the
5 0.2 Externalizing scale for discriminating cases with bipolar
0 0.0 disorder from all other cases, using DeLong’s tests to see
-3 -2 -1 0 1 2 3 if there was significant change in performance between the
Theta (θ)
academic and community samples (independent AUCs), and
FIGURE 2 Information and reliability estimates from IRT analysis of DeLong’s test for paired AUCs to compare performance of
short forms. Note: The arrow indicates the theta level where marginal the short forms versus the long or the CBCL AUCs. Based
reliability is .80. on the accuracy of the criterion diagnoses, the upper bound
of the observed AUC performance would be about .925, not
the theoretical limit of 1.00 (Kraemer, 1992). The 10M
delivered an AUC of .85 in the academic and .78 in the
community sample (see Table 5). The shrinkage in moving
significantly different in four instances, three of them to the community sample was significant (p < .05) but still
favoring the validity of the short form; the exception left the AUC in the upper half of the performance range for
was that the 10M score showed a higher correlation with caregiver report on a mania scale in a clinically general-
the CDRS-R than did the 28-item scale (r = .30 vs. .27), izable sample based on the results of a meta-analysis, 95%
t(522) = 2.03, p < .05, indicating modestly lower dis- confidence interval [.70, .81] (Youngstrom et al., 2015). The
criminant validity. The validity coefficients strongly 10M performed better than the full-length form in the aca-
aligned with the a priori ranking (r = .93), and the demic sample (p < .005) and slightly, but not significantly,
10M scores correlated significantly more with the better in the community sample; it far exceeded the perfor-
YMRS than the CDRS-R (r = .41 vs. .30), t = 2.48, mance of the Externalizing score at identifying cases with
p = .013, and with bipolar diagnoses versus unipolar bipolar in both samples (ps < .005).
depressive diagnoses (r = .32 vs. .08), t = 3.64, Both forms of the 10D performed quite similarly in both
p < .0005. The criterion correlations with Externalizing samples, earning AUCs of .84 for identifying cases with any
and Internalizing CBCL scores were both r = .39, not mood disorder in the academic sample, shrinking slightly to
significantly different. .79 for Form A (DeLong p = .046) and .80 for Form B
10 YOUNGSTROM ET AL.

TABLE 3
Criterion Correlations for 10-Item Mania Scale and Full-Length Hypomanic/Biphasic Scale

Academica Communityb

Expected Rank Criterion Variable Full Length 10M t Full Length 10M t

1 YMRS total (interview) .57 .60 −3.40** .40 .41 −0.96


2 Bipolar spectrum diagnosis .55 .59 −3.40** .31 .32 −1.53
3 CBCL Externalizing T score .65 .62 3.58*** .43 .39 3.41**
4 CBCL Internalizing T score .47 .42 4.71**** .41 .39 1.06
5 Any mood disorder Diagnosis .42 .42 −0.23 .28 .30 −1.50
6 CDRS-R total (interview) .14 .11 2.14* .27 .30 −2.03c,*
7 Count of comorbid Diagnoses .36 .35 1.05 .33 .33 −0.14
9 ADHD diagnosis .21 .20 1.19 .25 .22 2.33*
9 ODD diagnosis .18 .19 −0.22 .14 .14 −0.10
9 Conduct disorder diagnosis .15 .11 2.64* .18 .15 2.19*
12 PTSD diagnosis .05 .05 0.11 .06 .07 −1.13
12 Any anxiety diagnosis .00 .00 −0.15 .08 .09 −0.26
12 Female youth .00 .02 −1.07 −.02 −.01 −0.78
15 Youth age (years) −.09 −.11 1.19 −.08 −.09 0.96
15 White youth −.10 −.07 −2.22* .02 .01 0.46
15 Any unipolar depression −.18 −.22 3.05** .08 .08 −0.49

Note: Coefficients are point-biserial correlations for dummy-coded categorical variables, and Pearson correlations for continuous variables. Steiger’s test of
dependent correlations tested difference between full length and short form criterion correlations. Differences, where significant, favored short form validity
unless noted otherwise. YMRS = Young Mania Rating Scale; CDRS-R = Child Depression Rating Scale–Revised; CBCL = Child Behavior Checklist;
10M = 10-item Mania short form; ADHD = attention deficit/hyperactivity disorder; ODD = oppositional defiant disorder; PTSD = posttraumatic stress disorder.
*p < .05, **p < .005, ***p < .0005, ****p < .00005, two-tailed
a
N = 617.
b
N = 530.
c
Discriminant validity of 10M was slightly worse than full length scale in Community sample.

(p = .114) in the community sample. The short forms’ DiLRs for both depression scales predicting any mood dis-
performance was quite similar to the full-length 46-item order (including depression, dysthymic/persistent depressive
Depression scale, which had AUC = .86 in the academic disorder, and other specified depressive disorder, as well as
and .79 in the community sample, not significant in three of bipolar disorders).
four tests before any post hoc correction. The short forms
substantially exceeded the discriminative validity of the
Sensitivity Analyses to Effects of Propensity Scores
Internalizing score in both samples, least significant
and Outliers
p = .0013. Overall, the short forms showed negligible
change in accuracy compared to the full-length versions, Although the percentage of missing data was small and the
marked superiority to the corresponding CBCL scale, and differences between missing and complete cases tended to
robust performance in a new, independent sample that com- be small, we ran sensitivity analyses controlling for the
pares favorably with meta-analytic benchmarks. propensity score in logistic regression models. In all six
analyses, the short form continued to show highly signifi-
cant prediction of the target diagnosis, with changes in the
Diagnostic Likelihood Ratios
regression weight ranging from –.009 to .097 (Mdn
Often it is possible to split assessment scores into multiple = –.0035), indicating negligible change in performance
segments, preserving more information. We estimated adjusting for potential dropout bias.
DiLRs by first segmenting scores into quintiles based on In addition, regression analyses checked for multivariate
the academic sample and then adjusting boundaries to avoid outliers, using the short form score as the dependent vari-
degenerate score distributions in either sample (where the able and the diagnoses (including comorbidities), age and
DiLRs would not progress monotonically due to sampling sex as predictors. We used Mahalanobis’s distance to iden-
error within a particular segment). Table 6 gives the DiLRs tify outliers with unusual constellations of scores on the
for using the 10M to predict the probability of a bipolar predictors, Studentized deleted residuals to flag cases with
spectrum disorder (bipolar I, II, cyclothymic disorder, or short-form scores far from predicted values, and Cook’s
other specified bipolar and related disorder), as well as the distance to identify cases with discrepant combinations of
SHORT FORMS FOR PARENT GBI 11

TABLE 4
Criterion Correlations for 10-Item Depression Forms A and B and Full-Length Depression Scale

Academica Communityb

Expected Full Full Full Full


Rank Criterion Variable Length 10Da t Length 10Db t Length 10Da t Length 10Db t

1 CDRS-R total .58 .62 −3.83*** .58 .63 −4.30**** .54 .57 −2.79* .54 .56 −2.00*
(interview)
2 CBCL Internalizing .69 .65 4.50**** .69 .64 5.23**** .55 .51 4.23**** .55 .52 2.80*
Ta
3 Any unipolar .25 .32 −5.43**** .25 .31 −3.92*** .32 .35 −2.19* .32 .40 −5.94****
depression
4 Any mood disorder .53 .52 0.58 .53 .51 1.72 .46 .48 −1.07 .46 .50 −2.74*
5 Bipolar spectrum .27 .22 5.12**** .27 .21 4.82**** .25 .23 1.45 .25 .19 3.94****
6 CBCL .43 .36 6.65**** .43 .33 8.04**** .27 .22 3.76*** .27 .19 5.60****
Externalizing T
7 YMRS total .27 .20 5.94**** .27 .19 5.59**** .28 .27 0.85 .28 .21 5.92****
(Interview)
8 Youth age (Years) .23 .30 −6.40**** .23 .30 −5.58**** .16 .19 −2.62* .16 .25 −6.71****
9 Comorbid .22 .17 4.60**** .22 .15 5.46**** .33 .30 3.23** .33 .30 2.55*
10 Female youth .16 .19 −2.46* .16 .19 −2.09* .12 .13 −0.89 .12 .16 −2.72*
11 Any anxiety .09 .07 1.93 .09 .09 −0.19 .24 .22 1.61 .24 .23 0.24
disorder
14 White youth −.05 −.03 −1.84 −.05 −.02 −1.92 −.02 −.03 0.98 −.02 −.02 0.27
14 Any ADHD −.09 −.15 5.01**** −.09 −.17 6.31**** .00 −.05 4.18**** .00 −.07 5.37****
diagnosis
14 Any ODD −.01 −.05 3.36** −.01 −.04 2.09* .05 .04 0.89 .05 .01 2.45*
diagnosis
14 Any conduct .04 .03 0.92 .04 −.01 3.59*** .09 .07 1.17 .09 .10 −0.55
disorder
14 Any PTSD .12 .09 2.21* .12 .11 0.73 .16 .18 −1.45 .16 .16 0.56

Note: Coefficients are point-biserial correlations for dummy-coded categorical variables, and Pearson correlations for continuous variables. Steiger’s test of
dependent correlations tested difference between full-length and short form criterion correlations. Differences, where significant, favored short form validity
unless noted otherwise. YMRS = Young Mania Rating Scale; CDRS-R = Child Depression Rating Scale–Revised; CBCL = Child Behavior Checklist; 10Da
and 10Db = 10-item Depression Short Forms A and B; ADHD = attention deficit/hyperactivity disorder; ODD = oppositional defiant disorder;
PTSD = posttraumatic stress disorder. Italicized values were expected to be near zero coefficients, not significant ones.
*p < .05, **p < .005, ***p < .0005, ****p < .00005, two-tailed.
a
N = 617.
b
N = 530.
c
Convergent validity of both short forms with Internalizing was significantly lower than for full-length depression in both Academic and community
samples, largest difference r = .05.

short form score versus predictors. Outlier diagnostics iden- the parent-rated version has shown top-tier discriminative
tified a few marginal cases; dropping them and re-running validity in a recent meta-analysis (Youngstrom et al., 2015),
analyses did not change the associations between the target as well as excellent sensitivity to treatment effects.
diagnosis and respective short form. However, the GBI’s length makes it cumbersome to use in
many clinical settings. A 10-item form was carved from the
DISCUSSION 28-item Hypomanic/Biphasic scale previously (Youngstrom
et al., 2008), and it showed excellent psychometric proper-
The goal of this article was to develop and rigorously ties, including high internal consistency, good context cov-
evaluate 10-item short forms assessing mood symptoms, erage, and excellent discriminative validity. These features
pulling from the 73 items on the parent-rated GBI. The led to its widespread adoption, and it has continued to show
GBI has an extensive program of research supporting it, strong reliability and validity across a range of samples and
with a wealth of data about multiple facets of validity, and
12 YOUNGSTROM ET AL.

TABLE 5
ROC Analysis of the Discriminative Validity of the PGBI-10M, Hypomanic/Biphasic Scale (Full Length), and the CBCL Externalizing Score for
Discriminating Cases with Bipolar Spectrum Disorder from All Other Cases at Clinic

Academic Community
Predictor AUC 95% CI AUC 95% CI DeLong Test

Diagnosis ROC .925 — .925 — —


Hypo/Biphasic Full Length .83 [.80, .86] .76 [.71, .82] 1.94
10-Item Mania .84 [.81, .88] .78 [.72, .83] 2.29*
Externalizing .78 [.75, .82] .67 [.61, .74] 2.91**
Diagnosis ROC .925 — .925 — —
Depression Full Length .85 [.82, .89] .79 [.75, .82] 2.65*
10-Item Depression - Form A .84 [.81, .87] .79 [.75, .83] 2.04*
10-Item Depression - Form B .84 [.81, .88] .80 [.77, .84] 1.58
Internalizing .76 [.71, .80] .71 [.67, .76] 1.31

Note: The diagnosis receiver operating characteristic is dictated by the interrater reliability of the Longitudinal Expert evaluating All Data diagnoses (κ
= .85; Kraemer, 1992), and sets an upper border for the area under the curve (AUC) that could be empirically observed. The 10M mania short form performed
significantly better than both the full-length Hypomanic/Biphasic scale and the CBCL Externalizing score, in both the academic and the community samples
(largest p < .05). The 10Da and 10Db depression short forms performed better than the CBCL Internalizing score in both samples (largest, most conservative
p = .00005). The 10Da and 10Db performed no differently than the full-length PGBI Depression scale (smallest p > .10). ROC = receiver operating
characteristic; PGBI = Parent General Behavior Inventory; 10M = 10-item Mania short form; CBCL = Child Behavior Checklist; CI = confidence interval.
*p < .05, **p < .005, two-tailed.

TABLE 6
Multilevel DiLR for Short Forms, Using 10M to Predict Bipolar Spectrum Disorders, and 10da and 10db to Predict Any Mood Disorder

10M for Bipolar 10Da for Any Mood 10Db for Any Mood

Risk Change Label Score Range DiLR Score Range DiLR Score Range DiLR

Very Low 0–2.59 .07 0–1.99 .25 0–1.99 .22


Low 2.6–6.99 .41 2–5.99 .71 2–5.99 .71
Neutral 7–10.99 1.44 6–9.99 1.85 6–10.99 2.69
High 11–17.99 2.39 10–14.99 4.52 11–14.99 5.64
Very High 18+ 5.38 15+ 8.80 15+ 8.09

Note. Segments defined by quintiles in the academic sample and then adjusted to avoid degenerate distributions in either sample (Pepe, 2003). DiLR =
diagnostic likelihood ratios; 10M = 10-item Mania short form; 10Da and 10Db = 10-item Depression Forms A and B.

applications (e.g., Bebko et al., 2014; Findling et al., 2013; performance of the full-length version, the CBCL scales,
Findling, Youngstrom, McNamara, et al., 2012; Findling, and meta-analytic benchmarks. External cross-validation in
Youngstrom, Zhao, et al., 2012; Jo et al., 2016; Portugal a sample that has quite different demographics, referral
et al., 2016). The major contributions of this article are to (a) patterns, and clinical issues is a major strength (Konig
develop depression short forms, taking advantage of the et al., 2007).
large item pool to explore the possibility of parallel forms; Starting with the 73-item full-length version, we con-
(b) examine the combination of depression and mania short structed a 10-item Mania scale and two parallel 10-item
forms to confirm that they preserve the content coverage Depression scales, Forms A and B. Combining either of
and structure of the full-length GBI; (c) compare the the depression forms with the 10M reproduces the two-
observed psychometrics with projections based on recom- factor structure usually used to score the full-length GBI.
mended formulae (Smith et al., 2000) and using IRT to The short forms include “mixed-mood” items, and factor
examine reliability across a range of trait mood levels; (d) analyses of the 30 items produce a “mixed” or biphasic
extend the criterion validity nomothetic network for all the factor, should that be desired. The conceptual content cover-
short forms, examining convergent validity with interview age of the short forms is good, representing the bulk of the
ratings and CBCL scores, discriminant validity with con- item parcels the full item set comprises.
trasting constructs, and criterion validity with demographic We followed recommendations to project the reliability
and clinical characteristics; and (e) test the diagnostic dis- and the coverage correlation between the short and full-
criminative validity of the scales, again comparing to length forms (Smith et al., 2000). Ten-item formats offer a
SHORT FORMS FOR PARENT GBI 13

64% savings in terms of scale length for the mania and 78% presentations are milder and the cognate conditions that
for the depression scales. The correlation attenuation for- are likely to lead to elevated Externalizing behaviors for
mula projects that a well-designed short form should show other reasons are more common. The Externalizing scale
roughly a 20% decrement in validity coefficients (Smith combines the items in the Aggressive Behavior and the Rule
et al., 2000). The observed internal consistency estimates Breaking Behavior scales; the CBCL does not have mania
in the community sample exceeded projections based on the scale per se and omits many symptoms that are more spe-
psychometrics in the academic sample for all short forms. cific to mania. The pattern of findings in the present samples
So did the coverage estimates. The coverage correlations are aligns with the pattern shown in the meta-analysis of all
inflated by the fact that only an embedded item administra- available rating scales for identifying bipolar disorder in
tion format was available for primary analyses, so the results youths: Instruments that contain more diagnostically speci-
are somewhat optimistic. However, the criterion correlations fic items outperform other scales in both youths
are impressive, with almost none showing shrinkage com- (Youngstrom et al., 2015) and in adults (cf. performance
pared to the full-length form—even in the independent of Altman or Behavioral Activation Scale vs. GBI or
community sample—and with many demonstrating stronger Hypomania Checklist; Youngstrom et al., 2018).
criterion validity estimates than the full-length form. This The performance of the Depression short forms followed
unusual combination of strengths is likely due to the con- a similar pattern, with one added nuance. In both samples,
struction of the items: Rather than focusing on single symp- the short forms showed significantly higher correlations
toms, the PGBI items routinely juxtapose and combine than the full length did with the CDRS-R and with diag-
symptoms—irritability when sleeping less; mood shifting noses of unipolar depressive disorders. The nuance is that
from happy to sad rapidly—and emphasize the aspects of the short forms showed lower correlations with Internalizing
episodicity and change that are hallmarks distinguishing than did the full-length scale. The difference was small but
mood disorder from other more chronic conditions significant, indicating the greater focus on depression built
(Goldstein et al., 2017; Youngstrom, 2009). These features into the short-form item selection process. The Internalizing
create some technical challenges in evaluating factor struc- scale was not built to narrowly focus on depression. It
ture: For example, is “mood shifts rapidly from happy to combines three subscales: Anxious/Depressed, Withdrawn,
sad” a hypomanic item, or a depressed item? It will show and Somatic Complaints, and Internalizing scores are highly
local dependence with other items asking about happiness, elevated in anxiety disorders, not just depressive disorders
or about sadness. However, it is precisely the emphasis on (Ferdinand, 2008; Van Meter et al., 2014). The short forms’
change and constellations of symptoms that enables the GBI greater focus is shown by their better discriminative validity
to discriminate better between mood disorders and ADHD, versus that of the Internalizing score for identifying cases
anxiety, and other common disorders with similar presenta- with mood disorder. The short forms also include items
tions (Youngstrom et al., 2015). IRT analyses showed that related to low positive affect, low energy, and anhedonia
the scales are measuring well (i.e., high information and —the “low positive affect” dimension that is specific to
reliability values) across a wide swath of the depression and depression in the tripartite model of depression and anxiety
mania traits. Their precision is high in score ranges likely to (Clark & Watson, 1991; Gaylord-Harden, Elmore,
be useful for screening and diagnostic purposes as well as Campbell, & Wethington, 2011). This contributes to the
grading symptom level in community or population sam- excellent discrimination of depressed versus nondepressed
ples; they also would likely be useful for quantitative trait cases, even when a large number of cases with bipolar
studies and correlates with quantitative biological disorder were included in the analysis. It is likely that the
measurements. short forms would be useful for detecting and measuring
The comparisons between the PGBI scales and the depression in bipolar cases as well.
CBCL Externalizing and Internalizing scales are revealing We calculated diagnostic likelihood ratios to help inter-
in this regard. The 10M correlated r ~ .6 with both the pret short form scores in terms of predicting the prob-
YMRS and the Externalizing score in the academic sample, ability of bipolar or any mood disorder (Straus et al.,
and r ~ .4 with both in the community sample. The YMRS 2011). Low scores on the 10M are decisive in ruling
correlations are mono-trait but hetero-method correlations, bipolar disorder out, and very high scores are clinically
albeit with some overlap—the interviewer made YMRS helpful. The corresponding DiLRs could move a 10%
ratings based on interview and observation of the youth as base rate down to less than 1% with a very low score
well as interviewing the caregiver. The correlations with and up to 37% with a very high score. The two depres-
Externalizing are mono-method and hetero-trait, although sion forms are useful with low risk scores, moving a 30%
again, mania and externalizing would be expected to be prior probability down to less than 10%, and very high
correlated constructs. The ROC analyses reveal the greater risk scores would move 30% up to about 78%. These
specificity of the PGBI item content, with the AUC sub- results can be combined with other information about the
stantially outperforming the Externalizing score, and the gap case, such as family history of mood disorder, further
widening in the community sample, where the bipolar refining the probability (Algorta et al., 2013;
14 YOUNGSTROM ET AL.

Youngstrom, Van Meter, et al., 2017). Combining diag- Clinical Implications


nostic likelihood ratios makes a simplifying assumption
The new forms are short and less burdensome: They are
that the predictors have negligible correlations. In prac-
less than one third of the length of the original, can be
tice, the results still tend to be accurate and generalize
used to track change, will improve diagnostic accuracy,
well across samples (Baumer, Kaplan, & Horton, 2017;
and are free. They outperformed the well-established
Youngstrom, Halverson, et al., 2017).
CBCL scales for the specific purpose of identifying
There are important advantages to having two alternate
cases with mood disorder. Their psychometric properties
forms of a test that are highly correlated, produce the same
generalized well from an academic sample to a commu-
scores on average, have similar variances, and yet have
nity sample with large differences in terms of demogra-
different item content. The nearly identical performance of
phy and clinical referral pattern, providing a stringent
the two short forms makes it possible to use one version for
test of external validation. The diagnostic likelihood
recruitment purposes and the other to measure depression
ratios provide a case-oriented effect size that allows
levels at study baseline for eligible cases. Similarly, the
clinicians to combine the test result with the base rate
short forms could be alternated as measures of treatment
of mood disorders at their setting and other risk factors
response or outcome, reducing the repetitiveness (and cor-
to estimate the probability of mood disorder, guiding
responding burden and practice effects) for participants. In
clinical decisions about more intensive assessment and
computer adaptive testing implementations, the items could
treatment selection. The precision of the scales, with
be pooled and selected based on participant responses to
standard errors of roughly 2 points, suggests that they
construct other dynamic short forms.
may also be helpful for measuring treatment effects,
although more work can be done to enhance their utility
in a clinical significance framework. Because the PGBI
Limitations and Future Directions has already been translated into a variety of languages,
the short forms also are available in Spanish and more
It would be ideal to have more information about the
than a dozen other languages (available upon request
psychometrics of the depression scales when they are
from the authors), further enhancing utility.
used in a standalone, extracted format. Whereas technical
psychometric information has been published for the 10M
FUNDING
(Freeman et al., 2012), we did not gather the depression
short forms in an extracted format in these protocols. The
This research was supported in part by National Institute of
psychometrics are likely to prove robust based on all
Mental Health R01 MH066647 (PI: E. Youngstrom) and a
indications in the present data, plus the construction of
grant from the Stanley Medical Research Institute (PI: R.L.
the items on the depression form is similar to the mania
Findling).
items, so the features contributing to the high alpha and
coverage are likely similar. Of note, 10D Form A has
been used in an extracted format in multiple clinical trials
now, and it has shown excellent sensitivity to treatment
REFERENCES
effects (Findling et al., 2012). Even though those studies
did not report alpha, factor structure, or corrected item-
Achenbach, T. M. (1991). Manual for the child behavior Checklist/4-18
total correlations, the downstream evidence of validity and 1991 profile. Burlington, VT: University of Vermont, Department of
provides additional circumstantial evidence that reliability Psychiatry.
is likely strong. Similarly, performance should be evalu- Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA
ated in independent samples with different demographic school-age forms & profiles. Burlington, VT: University of Vermont.
Algorta, G. P., Youngstrom, E. A., Phelps, J., Jenkins, M. M., Youngstrom,
and clinical characteristics, particularly now that the
J. K., & Findling, R. L. (2013). An inexpensive family index of risk for
PGBI has been translated and is available in multiple mood issues improves identification of pediatric bipolar disorder.
languages (available at https://trello.com/b/dYUKlNRP/ Psychological Assessment, 25, 12–22. doi:10.1037/a0029225
translated-measures-dashboard). With both depression Alloy, L. B., Urosevic, S., Abramson, L. Y., Jager-Hyman, S., Nusslock, R.,
and mania scales available, it is possible that more com- Whitehouse, W. G., & Hogan, M. (2011). Progression along the bipolar
spectrum: A longitudinal study of predictors of conversion from bipolar
plicated scoring algorithms could improve clinical deci-
spectrum conditions to bipolar I and II disorders. Journal of Abnormal
sion making (Kraemer, 1992); future work should see Psychology. doi:10.1037/a0023973
how the scales combine with information about other Baumer, B. S., Kaplan, D. T., & Horton, N. R. (2017). Modern data science
risk factors and clinical characteristics (Youngstrom, with R. Boca Raton, FL: Taylor & Francis.
Halverson, et al., 2017). More needs to be done to Bebko, G., Bertocci, M. A., Fournier, J. C., Hinze, A. K., Bonar, L.,
Almeida, J. R., … Phillips, M. L. (2014). Parsing dimensional vs diag-
describe use of these scales as progress and outcome
nostic category-related patterns of reward circuitry function in behavio-
measures. rally and emotionally dysregulated youth in the Longitudinal Assessment
SHORT FORMS FOR PARENT GBI 15

of Manic Symptoms study. JAMA Psychiatry, 71, 71–80. doi:10.1001/ Findling, R. L., Youngstrom, E. A., McNamara, N. K., Stansbrey, R. J.,
jamapsychiatry.2013.2870 Wynbrandt, J. L., Adegbite, C., … Calabrese, J. R. (2012). Double-blind,
Birmaher, B. (2013). Bipolar disorder in children and adolescents. Child randomized, placebo-controlled long-term maintenance study of aripipra-
Adolesc Ment Health, 18, 140–148. doi:10.1111/camh.12021 zole in children with bipolar disorder. Journal of Clinical Psychiatry, 73,
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. 57–63. doi:10.4088/JCP.11m07104
P., Irwig, L. M., … De Vet, H. C. W. (2003). Towards complete and Findling, R. L., Youngstrom, E. A., Zhao, J., Marcus, R., Andersson, C.,
accurate reporting of studies of diagnostic accuracy: The STARD initia- McQuade, R., & Mankoski, R. (2012). Respondent and item level
tive. British Medical Journal, 326, 41–44. doi:10.1136/bmj.326.7379.41 patterns of response of aripiprazole in the acute treatment of pediatric
Clark, L. A., & Watson, D. (1991). Tripartite model of anxiety and depres- bipolar I disorder. Journal of Affective Disorders, 143, 231–235.
sion: Psychometric evidence and taxonomic implications. Journal of doi:10.1016/j.jad.2012.04.033
Abnormal Psychology, 100, 316–336. Freeman, A. J., Youngstrom, E. A., Frazier, T. W., Youngstrom, J. K.,
Cyranowski, J. M., Frank, E., Young, E., & Shear, K. (2000). Adolescent Demeter, C., & Findling, R. L. (2012). Portability of a screener for
onset of the gender difference in lifetime rates of major depression. pediatric bipolar disorder to a diverse setting. Psychological
Archives of General Psychiatry, 57, 21–27. Assessment, 24, 341–351. doi:10.1037/a0025617
Danielson, C. K., Youngstrom, E. A., Findling, R. L., & Calabrese, J. R. Gaylord-Harden, N. K., Elmore, C. A., Campbell, C. L., & Wethington, A.
(2003). Discriminative validity of the General Behavior Inventory using (2011). An examination of the tripartite model of depressive and anxiety
youth report. Journal of Abnormal Child Psychology, 31, 29–39. symptoms in African American youth: Stressors and coping strategies as
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). common and specific correlates. Journal of Clinical Child and
Comparing the areas under two or more correlated receiver operating Adolescent Psychology, 40, 360–374. doi:10.1080/
characteristic curves: A nonparametric approach. Biometrics, 44, 15374416.2011.563467
837–845. Geller, B., Zimerman, B., Williams, M., Bolhofner, K., Craney, J. L.,
Demeter, C. A., Youngstrom, E. A., Carlson, G. A., Frazier, T. W., Rowles, DelBello, M. P., & Soutullo, C. (2001). Reliability of the Washington
B. M., Lingler, J., … Findling, R. L. (2013). Age differences in the University in St. Louis Kiddie Schedule for Affective Disorders and
phenomenology of pediatric bipolar disorder. Journal of Affective Schizophrenia (WASH-U-KSADS) mania and rapid cycling sections.
Disorders, 147, 295–303. doi:10.1016/j.jad.2012.11.021 Journal of the American Academy of Child & Adolescent Psychiatry,
Depression and Bipolar Support Alliance. (n.d.). Mental health screening 40, 450–455. doi:10.1097/00004583-200104000-00014
center. Retrieved from http://www.dbsalliance.org/site/PageServer?pagen Goldstein, B. I., Birmaher, B., Carlson, G. A., DelBello, M. P., Findling, R.
ame=education_screeningcenter L., Fristad, M., … Youngstrom, E. A. (2017). The International Society
Depue, R. A., Kleiman, R. M., Davis, P., Hutchinson, M., & Krauss, S. P. for Bipolar Disorders Task Force report on pediatric bipolar disorder:
(1985). The behavioral high-risk paradigm and bipolar affective disorder, Knowledge to date and directions for future research. Bipolar Disorders,
VIII: Serum free cortisol in nonpatient cyclothymic subjects selected by 19, 524–543. doi:10.1111/bdi.12556
the General Behavior Inventory. American Journal of Psychiatry, 142, Guo, S., & Fraser, M. W. (2010). Propensity score analysis: Statistical
175–181. doi:10.1176/ajp.142.2.175 methods and applications. Los Angeles, CA: Sage.
Depue, R. A., Luciana, M., Arbisi, P., Collins, P., & Leon, A. (1994). Horwitz, S. M., Demeter, C. A., Pagano, M. E., Youngstrom, E. A., Fristad,
Dopamine and the structure of personality: Relation of agonist-induced M. A., Arnold, L. E., … Findling, R. L. (2010). Longitudinal Assessment
dopamine activity to positive emotionality. Journal of Personality and of Manic Symptoms (LAMS) study: Background, design, and initial
Social Psychology, 67, 485–498. screening results. Journal of Clinical Psychiatry, 71, 1511–1517.
Depue, R. A., Slater, J. F., Wolfstetter-Kausch, H., Klein, D. N., Goplerud, doi:10.4088/JCP.09m05835yel
E., & Farr, D. A. (1981). A behavioral paradigm for identifying persons Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical
at risk for bipolar depressive disorder: A conceptual framework and five approach to defining meaningful change in psychotherapy research.
validation studies. Journal of Abnormal Psychology, 90, 381–437. Journal of Consulting and Clinical Psychology, 59, 12–19.
doi:10.1037/0021-843X.90.5.381 doi:10.1037/0022-006X.59.1.12
DeVellis, R. F. (1991). Scale Development: Theory and applications. Jo, B., Findling, R. L., Hastie, T. J., Youngstrom, E. A., Wang, C. P.,
Newbury Park, CA: Sage. Arnold, L. E., … Horwitz, S. M. (2016). Construction of longitudinal
Ferdinand, R. F. (2008). Validity of the CBCL/YSR DSM-IV scales anxiety prediction targets using semisupervised learning. Statistical Methods
problems and affective problems. Journal of Anxiety Disorders, 22, 126– in Medical Research, 962280216684163. doi:10.1177/
134. doi:10.1016/j.janxdis.2007.01.008 0962280216684163
Findling, R. L., Jo, B., Frazier, T. W., Youngstrom, E. A., Demeter, C. A., Fristad, Kaufman, J., Birmaher, B., Brent, D., Rao, U., Flynn, C., Moreci, P., … Ryan, N.
M. A., … Horwitz, S. M. (2013). The 24-month course of manic symptoms in (1997). Schedule for Affective Disorders and Schizophrenia for School-Age
children. Bipolar Disorders, 15, 669–679. doi:10.1111/bdi.12100 Children-Present and Lifetime version (K-SADS-PL): Initial reliability and
Findling, R. L., McNamara, N. K., Gracious, B. L., Youngstrom, E. A., validity data. Journal of the American Academy of Child & Adolescent
Stansbrey, R. J., Reed, M. D., … Calabrese, J. R. (2003). Combination Psychiatry, 36, 980–988. doi:10.1097/00004583-199707000-00021
lithium and divalproex sodium in pediatric bipolarity. Journal of Kim, E. Y., & Miklowitz, D. J. (2002). Childhood mania, attention deficit
American Academy of Child and Adolescent Psychiatry, 42, 895–901. hyperactivity disorder and conduct disorder: A critical review of diag-
doi:10.1097/01.CHI.0000046893.27264.53 nostic dilemmas. Bipolar Disorders, 4, 215–225.
Findling, R. L., Nyilas, M., Forbes, R. A., McQuade, R. D., Jin, N., Klein, D. N., & Depue, R. A. (1984). Continued impairment in persons at
Iwamoto, T., … Chang, K. (2009). Acute treatment of pediatric bipolar risk for bipolar affective disorder: Results of a 19-month follow-up study.
I disorder, manic or mixed episode, with aripiprazole: A randomized, Journal of Abnormal Psychology, 93, 345–347.
double-blind, placebo-controlled study. Journal of Clinical Psychiatry, Klein, D. N., Depue, R. A., & Slater, J. F. (1986). Inventory identification
70, 1441–1451. doi:10.4088/JCP.09m05164yel of cyclothymia. IX. Validation in offspring of bipolar I patients. Archives
Findling, R. L., Youngstrom, E. A., McNamara, N. K., Stansbrey, R. J., of General Psychiatry, 43, 441–445.
Demeter, C. A., Bedoya, D., … Calabrese, J. R. (2005). Early symptoms Konig, I. R., Malley, J. D., Weimar, C., Diener, H. C., Ziegler, A., & German
of mania and the role of parental risk. Bipolar Disorders, 7, 623–634. Stroke, S. C. (2007). Practical experiences on the necessity of external
doi:10.1111/j.1399-5618.2005.00260.x validation. Statistics in Medicine, 26, 5499–5511. doi:10.1002/sim.3069
16 YOUNGSTROM ET AL.

Kraemer, H. C. (1992). Evaluating medical tests: Objective and quantita- Young, R. C., Biggs, J. T., Ziegler, V. E., & Meyer, D. A. (1978). A rating
tive guidelines. Newbury Park, CA: Sage. scale for mania: Reliability, validity, and sensitivity. British Journal of
Pepe, M. S. (2003). The statistical evaluation of medical tests for classifi- Psychiatry, 133, 429–435.
cation and prediction. New York, NY: Wiley. Youngstrom, E. A. (2009). Definitional issues in bipolar disorder across the
Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of life cycle. Clinical Psychology: Science & Practice, 16, 140–160.
method bias in social science research and recommendations on how to doi:10.1111/j.1468-2850.2009.01154.x
control it. Annual Review of Psychology, 63, 539–569. doi:10.1146/ Youngstrom, E. A., Egerton, G. A., Genzlinger, J., Freeman, L. K., Rizvi, S.
annurev-psych-120710-100452 H., & Van Meter, A. (2018). Improving the global identification of bipolar
Portugal, L. C., Rosa, M. J., Rao, A., Bebko, G., Bertocci, M. A., Hinze, A. spectrum disorders: Meta-analysis of the diagnostic accuracy of checklists.
K., … Mourao-Miranda, J. (2016). Can Emotional and Behavioral Psychological Bulletin, 144, 315–342. doi:10.1037/bul0000137
Dysregulation in Youth Be Decoded from Functional Neuroimaging? Youngstrom, E. A., Findling, R. L., Danielson, C. K., & Calabrese, J. R.
PLoS One, 11, e0117603. doi:10.1371/journal.pone.0117603 (2001). Discriminative validity of parent report of hypomanic and
Poznanski, E. O., Miller, E., Salguero, C., & Kelsh, R. C. (1984). depressive symptoms on the General Behavior Inventory. Psychological
Preliminary studies of the reliability and validity of the Children’s Assessment, 13, 267–276.
Depression Rating Scale. Journal of the American Academy of Child Youngstrom, E. A., Frazier, T. W., Findling, R. L., & Calabrese, J. R.
Psychiatry, 23, 191–197. (2004, May). A ten item brief screen for manic-depression in youths age
Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the sins of 5 to 17 years. Paper presented at the Annual meeting of the American
short-form development. Psychological Assessment, 12, 102–111. Psychiatric Association, New York, NY.
Spitzer, R. L. (1983). Psychiatric diagnosis: Are clinicians still necessary? Youngstrom, E. A., Frazier, T. W., Findling, R. L., & Calabrese, J. R.
Comprehensive Psychiatry, 24, 399–411. (2008). Developing a ten item short form of the Parent General Behavior
Stewart, A. J., Theodore-Oklota, C., Hadley, W., Brown, L. K., Donenberg, Inventory to assess for juvenile mania and hypomania. Journal of
G., Diclemente, R., & Project Style Study, G. (2012). Mania symptoms Clinical Psychiatry, 69, 831–839. doi:10.4088/JCP.v69n0517
and HIV-risk behavior among adolescents in mental health treatment. Youngstrom, E. A., Genzlinger, J. E., Egerton, G. A., & Van Meter, A. R.
Journal of Clinical Child & Adolescent Psychology, 41, 803–810. (2015). Multivariate meta-analysis of the discriminative validity of care-
doi:10.1080/15374416.2012.675569 giver, youth, and teacher rating scales for pediatric bipolar disorder:
Straus, S. E., Glasziou, P., Richardson, W. S., & Haynes, R. B. (2011). Mother knows best about mania. Archives of Scientific Psychology, 3,
Evidence-based medicine: How to practice and teach EBM (4th ed.). 112–137. doi:10.1037/arc0000024
New York, NY: Churchill Livingstone. Youngstrom, E. A., Halverson, T. F., Youngstrom, J. K., Lindhiem, O., &
Van Meter, A. R., Algorta, G. P., Youngstrom, E. A., Lechtman, Y., Findling, R. L. (2017). Evidence-Based Assessment from simple clinical
Youngstrom, J. K., Feeny, N. C., & Findling, R. L. (2017). Assessing judgments to statistical learning: Evaluating a range of options using
for suicidal behavior in youth using the Achenbach System of pediatric bipolar disorder as a diagnostic challenge. Clinical
Empirically Based Assessment. European Child & Adolescent Psychological Science. doi:10.1177/2167702617741845
Psychiatry. doi:10.1007/s00787-017-1030-y Youngstrom, E. A., Meyers, O. I., Demeter, C., Kogos Youngstrom, J., Morello,
Van Meter, A. R., Burke, C., Kowatch, R. A., Findling, R. L., & L., Piiparinen, R., … Calabrese, J. R. (2005). Comparing diagnostic checklists
Youngstrom, E. A. (2016). Ten-year updated meta-analysis of the clinical for pediatric bipolar disorder in academic and community mental health settings.
characteristics of pediatric mania and hypomania. Bipolar Disorders, 18, Bipolar Disorders, 7, 507–517. doi:10.1111/j.1399-5618.2005.00269.x
19–32. doi:10.1111/bdi.12358 Youngstrom, E. A., Van Meter, A., Frazier, T. W., Hunsley, J., Prinstein, M.
Van Meter, A., Youngstrom, E., Youngstrom, J. K., Ollendick, T., Demeter, J., Ong, M.-L., & Youngstrom, J. K. (2017). Evidence-Based Assessment
C., & Findling, R. L. (2014). Clinical decision making about child and as an integrative model for applying psychological science to guide the
adolescent anxiety disorders using the Achenbach System of Empirically voyage of treatment. Clinical Psychology: Science and Practice.
Based Assessment. Journal of Clinical Child & Adolescent Psychology, doi:10.1111/cpsp.12207
43, 552–565. doi:10.1080/15374416.2014.883930 Youngstrom, E. A., Zhao, J., Mankoski, R., Forbes, R. A., Marcus, R.
Yee, A. M., Algorta, G. P., Youngstrom, E. A., Findling, R. L., Birmaher, M., Carson, W., … Findling, R. L. (2013). Clinical significance of
B., Fristad, M. A., & Group, L. (2015). Unfiltered administration of the treatment effects with aripiprazole versus placebo in a study of
YMRS and CDRS-R in a clinical sample of children. Journal of Clinical manic or mixed episodes associated with pediatric bipolar I disorder.
Child & Adolescent Psychology, 44, 992–1007. doi:10.1080/ Journal of Child & Adolescent Psychopharmacology, 23, 72–79.
15374416.2014.915548 doi:10.1089/cap.2012.0024

You might also like