Professional Documents
Culture Documents
Reliability and Validity of The ARMIDILO S in Sex Offenders With Intellectual Disabilities
Reliability and Validity of The ARMIDILO S in Sex Offenders With Intellectual Disabilities
Disabilities
To cite this article: Claudia Pouls & Inge Jeandarme (2023) Reliability and Validity of the
ARMIDILO-S in Sex Offenders with Intellectual Disabilities, Journal of Mental Health Research in
Intellectual Disabilities, 16:1, 37-53, DOI: 10.1080/19315864.2022.2148790
ABSTRACT KEYWORDS
Background: The ARMIDILO-S is advocated as a promising tool ARMIDILO-S; risk assessment;
for assessing dynamic risk factors in sex offenders with intellec intellectual disability; sex
tual disabilities (SOIDs). However, research remains scarce. The offenders
present study aimed to further validate this instrument in SOIDs.
Method: The study prospectively followed 38 SOIDs for up to
one year to test the accuracy of the ARMIDILO-S in predicting
violent and sexual incidents.
Results: Overall predictive accuracy was moderate to high. The
ARMIDILO-S further showed excellent prospective qualities in
identifying both high-risk and low-risk offenders for violence.
Regarding sexual offending, it was only good at prospectively
detecting low-risk individuals.
Conclusions: This study provided further evidence of the good
predictive validity of the ARMIDILO-s in predicting future sexual
and violent incidents in SOIDs. More research in preferably larger
samples as well as field validity studies are recommended.
The empirical assessment of risk of further sexual offending in sex offenders can
be done by using static and/or dynamic variables. Static items, such as victim
gender, are not amenable for change through treatment and thus provide no
useful information about treatment targets. However, static risk factors give an
indication of “baseline risk” or, in other words, the risk of recidivism without
any intervention or treatment. It can be used to determine treatment intensity
and the degree of supervision. Following the Risk Need Responsivity (RNR)
model (Bonta & Andrews, 2007), increasing levels of treatment intensity and
supervision are recommended with increasing risk scores. In second order,
dynamic risk factors can be used as a guide to determine treatment goals and
effectively manage and monitor client’s risk changes. Instruments relying on
static factors were developed in mainstream sex offender samples and have not
been extensively tested in offender populations with intellectual disabilities.
Furthermore, the scarce amount of research shows mixed results. At this
point, only the Static–99 (and revised versions; Harris et al., 2003; Phenix
et al., 2008; Phenix, Fernandez et al., 2016; Phenix, Helmus et al., 2016) and
the Rapid Risk Assessment for Sex Offense Recidivism (RRASOR; Hanson,
CONTACT Claudia Pouls claudia.pouls@opzcrekem.be Knowledge Centre Forensic Psychiatric Care, Public
Psychiatric Care Centre , Daalbroekstraat 106, Rekem 3621, Belgium
© 2022 OPZC Rekem
38 C. POULS AND I. JEANDARME
1997) are recommended for the prediction of sexual offenses in sex offenders
with an intellectual disability (SOIDs; Hanson et al., 2013; Hounsome et al.,
2018; Pouls & Jeandarme, 2015). Dynamic factors on the other hand can guide
treatment plans with the goal of reducing risk. For that purpose, both main
stream instruments such as the Historical Clinical Risk management-20 (HCR-
20; Webster et al., 1997), Short-Term Assessment of Risk and Treatability
(START; Webster et al., 2004) or Sexual Violence Risk–20 (SVR-20; Boer
et al., 1997) have been validated in SOIDs, but also specific instruments have
been developed (e.g., Dynamic Risk Assessment and Management System
(DRAMS) – (Lindsay & Beail, 2004); Assessment of Risk and Manageability
for Individuals who Offend Sexually (ARMIDILO-S) – (Boer et al., 2013, 2004).
In the Netherlands, the Dynamic Risk Outcome Scales (DROS; Drieschner &
Hesper, 2008) was developed to assess treatment progress of patients with mild
intellectual disability or borderline intellectual functioning and severe behavioral
and/or psychiatric problems. In theory, ID-specific tools have some advantages
over mainstream tools because they address ID-specific criminogenic needs.
However, evidence for the validity of these tools is even more preliminary
(Hounsome et al., 2018; Pouls & Jeandarme, 2015).
Up till now, only a few studies have been conducted with the ARMIDILO-S.
Blacker et al. (2011) evaluated the RRASOR, RM2000/V, SVR-20 and the
client subscales of a previous version of the ARMIDILO in matched samples
of 44 SOIDs (including borderline intellectual functioning) and 44 non-ID sex
offenders. Predictive accuracy of the ARMIDILO in the SOID group generally
exceeded that of the non-ID group, although not significantly. For the stable
client subscale, an AUC of .61 was found for sexual reconviction and an AUC
of .56 for official and unofficial sexual offense-related behavior. The acute
client subscale was the best predictor for sexual recidivism with an AUC of .73
(sexual reconviction) and .76 (unofficial sexual recidivism and reconviction
data; p < .001). When only considering a small subsample of SOIDs with an IQ
below 75 (n = 10), the stable client subscale produced a significant predictive
effect for sexual reconviction (AUC = .86); whereas the AUC of the acute client
subscale was high but non-significant (AUC = .75), possibly due to the very
small sample size. The study further showed that other instruments (RRASOR,
RM2000/V and the SVR-20) performed little better than chance level in
distinguishing sexual recidivists from non-recidivists (AUC = .37–.55). In
terms of violent recidivism, AUCs were high for both the acute (AUC = .76)
and stable (AUC = .83) client subscale of the ARMIDILO-S. In a second study,
Lofthouse et al. (2013) prospectively analyzed the ARMIDILO-S in 64 male
SOIDs (IQ < 75) from a community service in Scotland. Inter-rater reliability
was high for both subscales (r = .98 for the environmental subscale and r = .96
for the client subscale) and the total score (r = .98). For the prediction of sexual
incidents, large significant effect sizes were reached for total (subscale) scores:
total environment AUC = .81, total client AUC = .90 and total ARMIDILO-S
JOURNAL OF MENTAL HEALTH RESEARCH IN INTELLECTUAL DISABILITIES 39
RQ3. What is the predictive validity of the client subscale and the environ
mental subscale of the ARMIDILO-S in predicting sexual incidents?
RQ4. What is the predictive validity of the risk and protective ratings of the
ARMIDILO-S in predicting sexual incidents?
The study was conducted in six specialized forensic psychiatric and prison
units for SOIDs (including borderline intellectual functioning) and
included 50 male SOIDs. The Static-99R score could not be scored for
seven patients due to missing information (n = 5) or because there was no
category A offense (n = 2). Due to the limited sample size, only patients
with a follow-up of less than six months were excluded (n = 5), leaving
a total study sample of 38.
The mean IQ score was 63 (SD = 11.13, range = 45–85). Three patients had an
IQ between 35 and 50, 21 between 50 and 70 and 12 between 70 and 85. Based on
a clinical DSM-IV-diagnosis, 18 patients were classified as having a mild intel
lectual disability, six as having a moderate intellectual disability and two with
borderline intellectual functioning. One third (n = 12) did not receive an official
diagnosis of intellectual disability but was nevertheless admitted to a unit for
SOIDs and had an IQ at or below 85 (range = 53–85). A paraphilic disorder was
present in 27 patients, a developmental disorder in eight patients, personality
disorder in seven patients and a substance abuse disorder in seven patients.
JOURNAL OF MENTAL HEALTH RESEARCH IN INTELLECTUAL DISABILITIES 41
Almost half of the participants (n = 18) had more than one psychiatric diagnosis.
All patients committed a sexual offense, either as an index offense (n = 35) or as
a prior offense (n = 18). Sex offenses concerned hands-on sex offenses in 35 cases
and hands-off offenses (e.g., possession of child pornography, indecent expo
sure) in 16 cases. Hands-on offenses were inflicted against children (n = 29),
adults (n = 2), or against both an adult and a child (n = 4). The mean age at the
time of the assessment was 44 years (SD = 12.14, range = 24–74). The mean
length of treatment or imprisonment was close to three years (33 months,
SD = 34.82, range = 2.3–126.2).
Measures
Static-99R
The Static-99R (Helmus et al., 2012; Phenix, Fernandez et al., 2016) was scored
as part of the ARMIDILO-S scoring process (cf. infra). The Static-99R is an
actuarial instrument designed to assess the likelihood of sexual and violent
recidivism in sex offenders. It consists of 10 static items relating to demo
graphic, offense and victim information. The total score is the sum of the item
scores and varies from –3 to 12, further divided into five nominal risk
categories: Level I – very low risk (scores of –3 to –2), Level II – below average
risk (scores of –1 to 0), Level III – average risk (scores of 1 to 3), Level IVa –
above average risk (scores of 4 to 5) and Level IVb – well above average risk
(scores of 6 or above). In the current study, the Dutch translation of the Static-
99R (Smid et al., 2014) was used.
ARMIDILO-S
The ARMIDILO-S (Boer et al., 2013) is a SPJ risk assessment tool for the
assessment and management of risk for sexually inappropriate behavior
in SOIDs. The original authors strongly encourage that the scoring is
based on file information and one or two interviews with staff members.
A client interview is recommended, but not necessary. The instrument
contains 27 stable (i.e. slowly changing) and acute (i.e. rapidly changing)
dynamic items divided into a “client” and “environment” subscale (see
Table 1). In the first ARMIDILO-S evaluation the stable items need to be
scored based on the preceding one to two years (or up to five years when
the client resided in a highly structured setting); the acute items over the
previous two to three months. Thereafter, an annual scoring of the stable
items is advised while the acute items can be reassessed more frequently
to monitor ongoing risk. Each item is evaluated as both a risk and
a protective factor and rated on a 3-point scale where N indicates that
the item is absent, S that the item is somewhat present, and Y that the
item is present. For example, when a client didn’t act impulsively in the
past period and clearly showed problem solving skills, the risk factor for
42 C. POULS AND I. JEANDARME
Procedure
Outcome Measures
The predictive accuracy was assessed using two outcome measures: sexual and
violent incidents. A violent incident referred to physical non-sexual violence
against another person: uttering threats, grabbing by the throat, kicking,
hitting, biting, or throwing objects against a person with the purpose of
inflicting pain. A sexual incident was defined as illegal sexual behavior such
as sexual assault, sexual touching, gross indecency, indecent exposure.
Statistical Analyses
All analyses were conducted in SPSS 22© (IBM Corp, 2013) and MedCalc
(Garber, 1998). Inter-rater reliability (IRR) was evaluated through a two-way
random intraclass correlation coefficient (ICC2,1 absolute agreement). Fleiss’s
(1986) critical values for single measures were used: ICC ≥ .75 = excellent, ICC
≥ .60 = good, ICC ≥ .40 = moderate, and ICC < .40 = poor. Predictive validity
was analyzed using both discrimination and calibration indicators.
Discrimination refers to how well an instrument can separate those who
44 C. POULS AND I. JEANDARME
went on to be violent from those who did not. Calibration refers to how well
the prediction of risk (expected recidivism) agrees with the actual observed
risk (Singh, 2013). A global effect size was calculated through the ROC
analysis. The corresponding AUC values were evaluated according to the
classification of Rice and Harris (2005) whereby AUC ≥ .56 = little effect,
AUC ≥ .64 = moderate effect and AUC ≥ .71 = large effect. Using the
information of a 2 × 2 contingency table, sensitivity (percentage of recidivists
who were judged to be at high risk), specificity (percentage of non-recidivists
who were judged to be at low risk), positive predictive value (PPV, percentage
of participants judged to be at high risk who did reoffend), negative predictive
value (NPV, percentage of low-risk individuals who did not reoffend), number
needed to detain (NND, number of participants judged to be at high risk who
need to be detained to prevent a single incident or offense), and number safely
discharged (NSD, number of participants judged to be at low risk who could
be discharged prior to a single incident or offense) were calculated. These
performance indicators provide information about how accurate a tool iden
tifies high-risk (“rule in”; PPV and NND) and low-risk (“rule out”; NPV and
NSD) individuals. Calculating these measures requires a single cutoff thresh
old. Participants classified as being at moderate or high risk were compared
with participants classified as low risk.
Results
Descriptive Statistics
The base rate was 10.5% (n = 4) for sexual incidents and 42.1% (n = 16) for
non-sexual violent incidents.
Total scores ranged from –22 to +5 (possible score range –46 to +46), total
client scores ranged from –14 to +8 (possible score range –30 to +30) and total
environment scores ranged from –11 to 0 (possible score range –16 to +16). In
Table 2, descriptive statistics for the structured professional judgment are
presented. According to the Static-99R rating, only four patients were deemed
Table 2. Descriptive statistics for the structured professional judgment relating to the Static-99R
risk rating and the risk, protective, and convergent rating of the ARMIDILO-S.
Static-99R ARMIDILO-S risk ARMIDILO-S protective ARMIDILO-S convergent
rating1 rating rating rating
Low 4 (10.5%) 11 (28.9%) 0 28 (73.7%)
Moderate 13 (34.2%) 18 (47.4%) 5 (13.2%) 9 (23.7%)
High 21 (55.3%) 9 (23.7%) 33 (86.8%) 1 (2.6%)
Note. The convergent rating is derived from taking into account the numerical score of the Static-99R, the SPJ risk
rating of the ARMIDILO-S items and the SPJ protective rating of these same ARMIDILO-S items.
1
The Static-99 Rwas scored using the revised risk categories (Phenix, Fernandez et al., 2016). These were recoded as
follows: Low = Level I + II; Moderate = Level III; High = Level IVa + Level IVb.
JOURNAL OF MENTAL HEALTH RESEARCH IN INTELLECTUAL DISABILITIES 45
Inter-rater Reliability
Inter-rater reliability (ICC2,1) of the ARMIDILO-S convergent SPJ rating was
poor with regard to violent incidents (.28) and moderate for sexual inci
dents (.55).
Predictive Validity
Convergent SPJ Rating
The ARMIDILO-S convergent SPJ rating was able to significantly predict
sexual and violent incidents with high accuracy (AUC = .77–.90) according
to the criteria of both Rice and Harris (2005) and Sjöstedt and Grann (2002).
The accuracy of the numerical score was also high (AUC = .74) for the
prediction of violent incidents, but moderate and non-significant
(AUC = .70, p = .20) regarding sexual incidents. No statistically significant
differences were found between the AUCs of the numerical scores and the
convergent SPJ ratings regarding the prediction of sexual incidents (p = .18) or
violent incidents (p = .58). An overview of all performance indicators is shown
in Table 3. These numbers can be read as follows: Of those individuals who
were involved in sexual incidents, 100% had been classified as being at
moderate or high risk of future sexual offending (sensitivity). Of those indivi
duals who were not involved in sexual incidents, 82% had been judged to be at
low risk (specificity). Of those judged to be at moderate or high risk, 40% was
Table 3. Performance indicators of the ARMIDILO-S for the prediction of different outcome
measures based on the convergent SPJ rating and numerical score.
Sexual incidents (95% CI) Violent incidents (95% CI)
ARMIDILO-S Numerical AUC (95% CI) .70 (.44–.96) .74* (.58–.91)
SPJ convergent rating AUC (95% CI) .90* (.80–1.00) .79** (.63–.95)
Sensitivity (95% CI) 100% (39.76–100) 62.5% (35.43–84.80)
Specificity (95% CI) 82.4% (65.47–93.24) 95.5% (77.16–99.88)
PPV (95% CI) 39.9% (24.33–57.88) 90.9% (58.67–98.60)
NPV (95% CI) 100% 77.8% (64.88–86.90)
NND 3 1
a
NSD Not applicable 4
Note. AUC = area under the curve; CI = confidence interval; PPV = positive predictive value; NPV = negative predictive
value; NND = number needed to detain; NSD = number safely discharged.
* p .05; ** p < .01.
a
This value was undefined because the calculation entailed division by zero.
46 C. POULS AND I. JEANDARME
Subscales
Component analyses showed that none of the subscales were significantly
predictive of sexual incidents, although the numerical score of the environ
mental subscale showed a trend toward significance (p = .06) and the con
fidence interval did not include .50 (see Table 4).
Discussion
Already in 2004, Lindsay and Beail (2004) addressed the lack of studies that
validate the use of existing risk assessment instruments in ID populations and
the urgent need to develop objective and valid risk assessment tools tailored to
the needs of (S)OIDs. However, little progress has been made to date
(Hounsome et al., 2018; Lofthouse et al., 2017; Pouls & Jeandarme, 2015).
The aim of the current study was therefore to assess the predictive validity of
the ARMIDILO-S in sex offenders with ID or borderline intellectual
functioning.
The AUC analyses showed moderate to high accuracy of the convergent SPJ
rating in predicting sexual and violent incidents, even when more strict criteria
JOURNAL OF MENTAL HEALTH RESEARCH IN INTELLECTUAL DISABILITIES 47
are used for the interpretation (Sjöstedt & Grann, 2002). The high AUC of .90
for the prediction of sexual incidents is in line with the AUC of .92 for the
numerical total score in the study of Lofthouse et al. (2013). An important
difference with the other ARMIDILO(-S) studies however, is that the convergent
SPJ rating is used alongside a numerical total score. Although the AUC of the
SPJ judgment was higher than its numerical counterpart, differences were non-
significant and might be attributed to the limited sample size. Furthermore,
a high AUC value was found for the environmental subscales, in line with the
results of the study conducted by Lofthouse et al. (2013). Although non-
significant, there was a trend towards significance and the confidence interval
did not include .50. Potentially with a larger sample, this might have been
significant. It may also be due to the limited variance of the environmental
scores (0 to –11). Mainly because of the secure setting and limited (or absent)
liberties, all the participants scored “no risk” on the environmental items.
Furthermore, performance indicators of the ARMIDILO-S results were mainly
higher compared to the Static-99R in the same population (Pouls & Jeandarme,
2022), although significance testing was not conducted. The same trend could be
seen in the study of Blacker et al. (2011), where predictive validity of the acute
and stable client subscales of the previous version of the ARMIDILO-S was
generally higher than that of the RRASOR and RM2000/V. This indicates the
need to include dynamic risk factors during treatment/for monitoring purposes.
Another explanation could be that predictive validity of SOID-specific risk
assessment instruments is better than that of instruments developed in main
stream offender populations.
In addition to the generally reported AUC value, other – more clinically
relevant – performance indicators were analyzed. Concerning the prediction
of sexual incidents, the ARMIDILO-S was better at prospectively detecting
low-risk individuals. The instrument further showed excellent prospective
qualities in predicting who was going to act violently (detection of high-risk
offenders), although the detection of low-risk violent offenders was relatively
high too. This finding is surprising, given that (limited) findings in non-ID
offender samples show that risk assessment instruments are generally better in
identifying low-risk offenders compared to high-risk offenders (Declue &
Campbell, 2013; Fazel et al., 2012; Singh et al., 2011). The suggestion of
Fazel et al. (2012) to use risk assessment tools to screen out low-risk cases,
rather than to detect high-risk individuals, may therefore not extend to SOIDs.
However, there are no clear cutoff standards for interpreting PPV/NPV or
NND/NSD values, making this a rather moral judgment. Furthermore, com
parison with other studies is hampered by methodological differences (e.g.,
PPV/NPV and NND/NSD are base rate-dependent) and a lack of indicators
for predictive accuracy other than AUC. The results with regard to the
prediction of sexual incidents were comparable to one non-peer reviewed
study in SOIDs (Sindall, 2012), despite of the difference in the base rate
48 C. POULS AND I. JEANDARME
(10,5% in the current study; 31,2% in the Sindall study). In the study of Sindall
(2012), NPV and sensitivity was 100%, as was the case in the current study.
PPV and specificity were respectively 55% and 64% in the study of Sindall
(2012); 40% and 82% in the current study.
From a clinical point of view, the ARMIDILO-S was experienced as a time-
consuming instrument, both for the rater and the staff being interviewed. On
the contrary, staff found it useful that they were being pushed to reflect on
risk-relevant matters. For the rater, it was particularly hard to collect informa
tion necessary to score the environmental subscales. Focusing on staff atti
tudes, team communication (or miscommunication) or supervision problems
may cause staff to become defensive. This problem can possibly be overcome
when a team member scores the ARMIDILO-S, instead of an external rater.
Furthermore, the scoring of the instrument was not an easy task. This is
reflected in the poor inter-rater reliability results (ICC2,1 = .28–.55).
Potentially a lack of experience or prior risk assessment training by
the second raters, i.e. students, could explain these poor results. This was
confirmed by exploratory analyses of the predictive accuracy using the ratings
of the primary rater versus the ratings of the second rater.2 Training therefore
seems necessary to guarantee accurate scoring, which is in line with the user
requirements defined in the manual. The influence of individual rater char
acteristics was also demonstrated in field studies of the Psychopathy
Checklist–Revised (PCL-R: Hare, 2003; e.g., Boccaccini et al., 2014, 2008),
although the question why this results in scoring differences remains largely
unanswered. Individual studies pointed to personality traits of the evaluator
(Miller et al., 2011), level of experience (Rufino et al., 2012), and the experience
in scoring the instrument (Murrie et al., 2012) as a potential explanation.
Jeandarme et al. (2016) hypothesized that level of background training and
education could also be related to scoring differences. Furthermore, the item
instructions in the manual are rather limited and sometimes vague which
could have created more room for interpretation and consequently more
subjectivity, compromising rater agreement. Nevertheless, the very low IRR
achieved in this study is problematic. Furthermore, this is not consistent with
the high inter-rater reliability found by Lofthouse et al. (2013).
Limitations
Although the results are promising, they must be interpreted with caution. The
reliability of the findings is limited by the small sample size, although all
people with ID or borderline intellectual functioning and sexual offense
histories in OID-specific projects in Flanders were included. This might
have affected the ROC analysis, because sample sizes below 200 result in
2
Data available on request.
JOURNAL OF MENTAL HEALTH RESEARCH IN INTELLECTUAL DISABILITIES 49
Future Research
Conclusion
This study has provided further evidence of the good predictive validity of the
ARMIDILO-S in predicting future sexual and violent incidents in SOIDs.
Although the ARMIDILO-S was able to prospectively detect individuals who
are at high risk for future violent behavior, more caution is needed regarding
the detection of high-risk individuals for future sexual incidents. Furthermore,
the environmental subscales might be of added value. However, more research
in preferably larger samples is necessary to confirm these results. Despite the
limited amount of empirical research, the ARMIDILO-S is currently the most
validated dynamic risk assessment tool available for SOIDs, even when main
stream instruments are considered.
Acknowledgments
Special thanks to the participating clients and institutions: A.B.A.G.G. (’t Zwart Goor), Amanis
(‘t Zwart Goor), Itinera (Sint-Idesbald), Limes (Sint-Ferdinand), Ontgrendeld (OBRA), KFP
(APZ Sint-Lucia), and Forensische Zorg 4 (OPZC Rekem). We also want to thank the Federal
Government of Justice.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
Blacker, J., Beech, A. R., Wilcox, D. T., & Boer, D. P. (2011). The assessment of dynamic risk
and recidivism in a sample of special needs sexual offenders. Psychology, Crime & Law, 17(1),
75–92. https://doi.org/10.1080/10683160903392376
Boccaccini, M. T., Murrie, D. C., Rufino, K. A., & Gardner, B. O. (2014). Evaluator differences
in Psychopathy Checklist-Revised factor and facet scores. Law and Human Behavior, 38(4),
337–345. https://doi.org/10.1037/lhb0000069
Boccaccini, M. T., Turner, D. B., & Murrie, D. C. (2008). Do some evaluators report consis
tently higher or lower PCL-R scores than others? Findings from a statewide sample of
sexually violent predator evaluations. Psychology, Public Policy, and Law, 14(4), 262–283.
https://doi.org/10.1037/a0014523
JOURNAL OF MENTAL HEALTH RESEARCH IN INTELLECTUAL DISABILITIES 51
Boer, D. P. (2013). Some essential environmental ingredients for sex offender reintegration.
International Journal of Behavioral Consultation and Therapy, 8(3–4), 8–11. https://doi.org/
10.1037/h0100976
Boer, D. P., Haaven, J., Lambrick, F., Lindsay, W. R., McVilly, K. R., Sakdalan J., &
Frize, M. C. J. (2013). ARMIDILO-S manual. http://www.armidilo.net/
Boer, D. P., Hart, S. D., Kropp, P. R., & Webster, C. D. (1997). Manual for the sexual violence
risk-20: Professional guidelines for assessing risk of sexual violence. The Mental Health, Law, &
Policy Institute.
Boer, D. P., McVilly, K. R., & Lambrick, F. (2007). Contextualizing risk in the assessment of
intellectually disabled individuals. Sexual Offender Treatment, 2(2), 1–4. http://www.sexual-
offender-treatment.org/59.html
Boer, D. P., Tough, S., & Haaven, J. (2004). Assessment of risk manageability of intellectually
disabled sex offenders. Journal of Applied Research in Intellectual Disabilities, 17(4), 275–283.
https://doi.org/10.1111/j.1468-3148.2004.00214.x
Bonta, J., & Andrews, D. A. (2007). Risk-Need-Responsivity Model for offender assessment and
rehabilitation. Her Majesty the Queen in Right of Canada.
Declue, G., & Campbell, T. (2013). Calibration performance indicators of the static-99R: 2013
update. Open Access Journal of Forensic Psychology, 5, 82–88. https://www.oajfp.com/_files/
ugd/166e3f_549efdb235474b6eaae789bc6433f8fc.pdf
Drieschner, K. H., & Hesper, B. L. (2008). Dynamic risk outcome scales. Trajectum.
Edens, J. F., & Boccaccini, M. T. (2017). Taking forensic mental health assessment “out of the
lab” and into “the real world”: Introduction to the special issue on the field utility of forensic
assessment instruments and procedures. Psychological Assessment, 29(6), 599–610. https://
doi.org/10.1037/pas0000475
Fazel, S., Singh, J. P., Doll, H., & Grann, M. (2012). Use of risk assessment instruments to
predict violence and antisocial behaviour in 73 samples involving 24 827 people: Systematic
review and meta-analysis. BMJ, 345. https://doi.org/10.1136/bmj.e4692
Garber, C. (1998). MedCalc Software for Statistics in Medicine. Clinical Chemistry, 44(6), 1370.
https://doi.org/10.1093/clinchem/44.6.1370
Hanczar, B., Hua, J., Sima, C., Weinstein, J., Bittner, M., & Dougherty, E. R. (2010). Small-
sample precision of ROC-related estimates. Bioinformatics, 26(6), 822–830. https://doi.org/
10.1093/bioinformatics/btq037.
Hanson, R. K. (1997). The development of a brief actuarial risk scale for sexual offense
recidivism. Department of the Solicitor General of Canada.
Hanson, R. K., Sheahan, C. L., & VanZuylen, H. (2013). STATIC-99 and RRASOR predict
recidivism among developmentally delayed sexuald offenders: A cumulative meta-analysis.
Sexual Offender Treatment, 8(1), 1–14. http://www.sexual-offender-treatment.org/119.html
Hare, R. D. (2003). Manual for the Revised Psychopathy Checklist (2nd ed.). Multi-Health
Systems.
Harris, A., Phenix, A., Hanson, R. K., & Thornton, D. (2003). Static-99 coding rules revised -
2003. Department of the Solicitor General of Canada.
Helmus, L. M., Hanson, R. K., Murrie, D. C., & Zabarauckas, C. L. (2021). Field validity of
static-99R and STABLE-2007 with 4,433 men serving sentences for sexual offences in British
Columbia: New findings and meta-analysis. Psychological Assessment, 33(7), 581–595.
https://doi.org/10.1037/pas0001010
Helmus, L. M., Thornton, D., Hanson, R. K., & Babchishin, K. M. (2012). Improving the
predictive accuracy of Static-99 and Static-2002 with older sex offenders: Revised age
weights. Sexual Abuse: A Journal of Research and Treatment, 24(1), 64–101. https://doi.
org/10.1177/1079063211409951
52 C. POULS AND I. JEANDARME
Hounsome, J., Whittington, R., Brown, A., Greenhill, B., & McGuire, J. (2018). The structured
assessment of violence risk in adults with intellectual disability: A systematic review. Journal
of Applied Research in Intellectual Disabilities, 31(1), e1–e17. https://doi.org/10.1111/jar.
12295
IBM Corp. (2013). IBM SPSS statistics for windows, version 22.0.
Jeandarme, I., Pouls, C., De Laender, J., Oei, T. I., & Bogaerts, S. (2016). Field validity of the
HCR-20 in forensic medium security units in Flanders. Psychology, Crime & Law, 23(4),
305–322. https://doi.org/10.1080/1068316X.2016.1258467
Lindsay, W. R., & Beail, N. (2004). Risk assessment: Actuarial prediction and clinical judge
ment of offending incidents and behaviour for intellectual disability services. Journal of
Applied Research in Intellectual Disabilities, 17(4), 229–234. https://doi.org/10.1111/j.1468-
3148.2004.00212.x
Lofthouse, R. E., Golding, L., Totsika, V., Hastings, R., & Lindsay, W. (2017). How effective are
risk assessments/measures for predicting future aggressive behaviour in adults with intel
lectual disabilities (ID): A systematic review and meta-analysis. Clinical Psychology Review,
58, 76–85. https://doi.org/10.1016/j.cpr.2017.10.001
Lofthouse, R. E., Lindsay, W. R., Totsika, V., Hastings, R. P., Boer, D. P., & Haaven, J. L. (2013).
Prospective dynamic assessment of risk of sexual reoffending in individuals with an intel
lectual disability and a history of sexual offending behaviour. Journal of Applied Research in
Intellectual Disabilities, 26(5), 394–403. https://doi.org/10.1111/jar.12029
Miller, A. K., Rufino, K. A., Boccaccini, M. T., Jackson, R. L., & Murrie, D. C. (2011). On
individual differences in person perception: Raters’ personality traits relate to their
Psychopathy Checklist-Revised scoring tendencies. Assessment, 18(2), 253–260. https://doi.
org/10.1177/1073191111402460
Murrie, D. C., Boccaccini, M. T., Caperton, J., & Rufino, K. (2012). Field validity of the
Psychopathy Checklist–Revised in sex offender risk assessment. Psychological Assessment,
24(2), 524–529. https://doi.org/10.1037/a0026015
Neal, T. M. S., Miller, S. L., & Shealy, R. C. (2015). A field study of a comprehensive violence
risk assessment battery. Criminal Justice and Behavior, 42(9), 952–968. https://doi.org/10.
1177/0093854815572252
Nijman, H. L. I., Muris, P., Merckelbach, H. L. G. J., Palmstierna, T., Wistedt, B., Vos, A. M.,
van Rixtel, A., & Allertz, W. (1999). The staff observation aggression scale–revised (SOAS-
R). Aggressive Behavior, 25(3), 197–209. https://doi.org/10.1002/(SICI)1098-2337(1999)
25:3<197::AID-AB4>3.0.CO;2-C
Pedersen, L., Ramussen, K., & Elsass, P. (2012). HCR-20 violence risk assessments as a guide for
treating and managing violence risk in a forensic psychiatric setting. Psychology, Crime &
Law, 18(8), 733–743. https://doi.org/10.1080/1068316X.2010.548814
Phenix, A., Doren, D., Helmus, L., Hanson, R. K., & Thornton, D. (2008). Coding rules for
static-2002. http://www.static99.org/pdfdocs/static2002codingrules.pdf
Phenix, A., Fernandez, Y., Harris, A. J. R., Helmus, M., Hanson, R. K., & Thornton, D. (2016).
Static-99R coding rules revised — 2016 . http://www.static99.org/pdfdocs/Coding_manual_
2016_v2.pdf
Phenix, A., Helmus, L. M., & Hanson, R. K. (2016). Static-99R & static-20002R evaluators’
workbook. http://www.static99.org/pdfdocs/Evaluators_Workbook_2016-10-19.pdf
Pouls, C., & Jeandarme, I. (2015). Risk assessment and risk management in offenders with
intellectual disabilities: Are we there yet? Journal of Mental Health Research in Intellectual
Disabilities, 8(3–4), 213–236. https://doi.org/10.1080/19315864.2015.1070221
Pouls, C., & Jeandarme, I. (2022). Reliability and validity of the static-99R in sex offenders with
intellectual disabilities. Journal of Intellectual Disabilities and Offending Behaviour, 13(1),
20–31. https://doi.org/10.1108/JIDOB-08-2021-0013
JOURNAL OF MENTAL HEALTH RESEARCH IN INTELLECTUAL DISABILITIES 53
Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC area,
Cohen’s d, and r. Law and Human Behavior, 29(5), 615–620. https://doi.org/10.1007/
s10979-005-6832-7
Rufino, K. A., Boccaccini, M. T., Hawes, S. W., & Murrie, D. C. (2012). When experts disagreed,
who was correct? A comparison of PCL-R scores from independent raters and opposing
forensic experts. Law and Human Behavior, 36(6), 527–537. https://doi.org/10.1037/
h0093988
Sindall, O. (2012). An exploratory validation study of a risk assessment tool for male sex
offenders with an intellectual disability [Doctoral dissertation, Canterbury Christ Church
University]. https://repository.canterbury.ac.uk/item/86992/an-exploratory-validation-
study-of-a-risk-assessment-tool-for-male-sex-offenders-with-an-intellectual-disability.
Singh, J. P. (2013). Predictive validity performance indicators in violence risk assessment:
A methodological primer. Behavioral Sciences & the Law, 31(1), 8–22. https://doi.org/10.
1002/bsl.2052
Singh, J. P., Grann, M., & Fazel, S. (2011). A comparative study of violence risk assessment
tools: A systematic review and metaregression analysis of 68 studies involving 25,980
participants. Clinical Psychology Review, 31(3), 499–513. https://doi.org/10.1016/j.cpr.2010.
11.009
Sjöstedt, G., & Grann, M. (2002). Risk assessment: What is being predicted by actuarial
prediction instruments? The International Journal of Forensic Mental Health, 1(2),
179–183. https://doi.org/10.1080/14999013.2002.10471172
Smid, W., Koch, M., & van den Berg, J. W. (2014). STATIC-99R scorehandleiding [Static-99R
scoring manual]. De Forensische Zorgspecialisten [The Forensic Care Specialists].
van Alphen, P. (2016). ARMIDILO: Nederlandse vertaling. Open Universiteit.
Vojt, G., Thomson, L. D. G., & Marshall, L. A. (2013). The predictive validity of the HCR-20
following clinical implementation: Does it work in practice? The Journal of Forensic
Psychiatry & Psychology, 24(3), 371–385. https://doi.org/10.1080/14789949.2013.800894
Webster, C. D., Douglas, K. S., Eaves, D., & Hart, S. D. (1997). HCR-20: Assessing risk for
violence (Version 2) (Version 2 ed.). Simon Fraser University and Forensic Psychiatric
Services Commission of British Columbia.
Webster, C. D., Martin, M. L., Brink, J., Nicholls, T. L., & Middleton, C. (2004). Manual for the
Short-Term Assessment of Risk and Treatability (START). Forensic Psychiatric Services
Commission and St. Joseph’s Healthcare.