Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Received: 28 September 2018 Revised: 15 November 2018 Accepted: 20 November 2018

DOI: 10.1002/bsl.2392

SPECIAL ISSUE ARTICLE

Machine learning in suicide science: Applications


and ethics
Kathryn P. Linthicum | Katherine Musacchio Schafer | Jessica D. Ribeiro

Department of Psychology, Florida State


University, Tallahassee, FL 32306‐4301, USA For decades, our ability to predict suicide has remained at
Correspondence near‐chance levels. Machine learning has recently emerged
Jessica D. Ribeiro, Department of Psychology,
Florida State University, 1107 W. Call St., as a promising tool for advancing suicide science, particu-
Tallahassee, FL 32306‐4031, USA. larly in the domain of suicide prediction. The present review
Email: ribeiro@psy.fsu.edu
provides an introduction to machine learning and its poten-
tial application to open questions in suicide research.
Although only a few studies have implemented machine
learning for suicide prediction, results to date indicate con-
siderable improvement in accuracy and positive predictive
value. Potential barriers to algorithm integration into clinical
practice are discussed, as well as attendant ethical issues.
Overall, machine learning approaches hold promise for
accurate, scalable, and effective suicide risk detection; how-
ever, many critical questions and issues remain unexplored.

1 | I N T RO DU CT I O N

Though you might not realize it, machine learning is all around you. Chance are, it already exerts considerable influ-
ence over your everyday life: email inboxes sorted into “spam”, “important”, and “social”; optimized navigation routes;
personalized Netflix recommendations; targeted online ads; facial recognition integrated into social media platforms.
Machine learning has also been applied to numerous branches of research (e.g., biology, physics, ecology) and led to
remarkable scientific advances (see, e.g., Carrasquilla & Melko, 2017; Jones et al., 2009; Neumann et al., 2010). We
are now beginning to see its application to the field of suicide science. What does this mean for the future of suicide
prediction? What are the ethical implications of machine learning algorithms in clinical practice? The purpose of this
review is to provide an introductory address to these questions. Specifically, we aim to (1) introduce machine learning
and its basic concepts, (2) discuss how machine learning can be used to advance knowledge in suicide science, (3)
review the existing literature using machine learning to advance suicide science, and (4) discuss implications for prac-
tice and attendant ethical issues.

Behav Sci Law. 2019;1–9. wileyonlinelibrary.com/journal/bsl © 2019 John Wiley & Sons, Ltd. 1
2 LINTHICUM ET AL.

2 | O V E R V I E W O F M A C H I N E L EA R N I N G

2.1 | What is machine learning?

As a subfield of artificial intelligence, machine learning originated in the latter half of the twentieth century and has
strong ties to the fields of computer science and statistics. The first machine learning programs were designed in the
early 1950s to play abstract strategy games (e.g., chess). Samuel (1959) described a checkers‐playing program capable
of learning from its mistakes, becoming increasingly skilled over repeated experiences. In the intervening decades,
research into the field of machine learning has drastically expanded its capabilities and applications.
Broadly defined, machine learning is the study and application of algorithms and systems that can improve
knowledge or performance with experience. A fundamental premise of machine learning is the assumption that
machines can learn from data, recognize patterns in data, and make sense of data with minimal human intervention.
Prior to the advent of machine learning, we relied on humans to identify and specify important relationships within
data. By instead allowing a machine to make sense of data at the outset, highly complex and possibly meaningful pat-
terns in the data can be detected that would be difficult—or even impossible, in some cases—for humans to derive.
Machine learning algorithms are also able to change and improve with exposure to new data. These patterns of
detection hold numerous potential advantages above manually derived approaches to model specification, including
efficiency, complexity, and flexibility.

2.2 | Types of machine learning algorithms

Machine learning can be used for many purposes. Generally, we can divide machine learning algorithms into three
broad categories, which vary based on their objectives. These are supervised learning, unsupervised learning, and
reinforcement learning. We briefly discuss each in turn.

2.2.1 | Supervised learning

Supervised learning involves exposing an algorithm to input (i.e., predictors; independent variables) and output (i.e.,
target; dependent variables) data. The output or target is labeled at the outset. The algorithm is then exposed to
example or “training” data. With exposure to training data, the algorithm will define a function that optimally maps
the input variables to the labeled output or target. That is, the algorithm will derive the most accurate and parsimo-
nious relationship between the input variables and the target of interest (i.e., output). Supervision in supervised learn-
ing does not refer to human involvement; rather, supervision refers to the fact that the target values provide a way
for the model to determine when it has achieved the desired task.
Depending on the nature of the output variable, supervised machine learning can be further distinguished as
classification or regression problems. Classification problems refer to cases in which the output variable is a discrete
outcome. Within classification problems, the task for the algorithm involves determining to which outcome category,
or class, an example belongs. A class can have two or more levels, which may or may not be ordinal. Developing
algorithms to determine whether an individual will default on a loan, develop cancer, or attempt suicide would be
examples of classification problems. Supervised learning models can also be used to predict continuous outcomes
(e.g., numeric values). These are called regression problems. Student loan debt, laboratory values, and the number
of days until someone attempts suicide would each be examples of numeric outcomes for regression problems.

2.2.2 | Unsupervised learning

Unsupervised learning involves exploring the underlying structure of data. In contrast to supervised learning models,
unsupervised learning involves unlabeled data—that is, data for which the underlying organizational structure is
LINTHICUM ET AL. 3

unknown or not assumed. Unsupervised learning can be used to extract potentially meaningful information about
data without the direction of a target outcome variable.
Clustering and dimension reduction represent two common uses of unsupervised learning algorithms. Clustering
is an exploratory technique that is designed to identify potentially meaningful subgroups in the data without prior
knowledge of existing groups. For example, a suicide researcher may be interested in whether clusters of “patient
types” emerge from a general pool of suicidal individuals. Another popular use of unsupervised learning is for
compressing data by reducing the number of variables under consideration to a highly important subset. This tech-
nique is particularly useful when working with high‐dimensional data, which can be computationally costly. Com-
monly, this technique is also applied when preparing the data for analysis. Specifically, it can be used to reduce or
remove statistical noise in the data and/or compress data to reduce computational cost while retaining relevant
information.

2.2.3 | Reinforcement learning

A third type of machine learning is reinforcement learning. The objective of reinforcement learning algorithms is to
develop systems that improve performance based on feedback from the environment. Steps that move an algorithm
closer to its target are selected and propagated forward to the next iteration. Through repeated iterations, the algo-
rithm is reinforced, and improves until it reaches an optimal level of performance or a stopping point of number of
iterations as set forth in parameters. An element of randomness is still retained within most algorithms, since unex-
pected or untested moves can result in large gains later and the efficacy of steps can vary with context. Reinforce-
ment learning techniques are popular for problems involving movement towards a specified outcome—goals such
as “win a game of Go” or “translate this phrase correctly” are well suited to the methods.

3 | S U P E R V I S E D M A C H I N E LE A R N I N G : A P P L I C A T I O N S F O R S U I C I D E
SCIENCE

Across the three types of machine learning described above, a multitude of algorithms have been developed. Each
possesses unique strengths and weaknesses (cf. Kotsiantis, Zaharakis, & Pintelas, 2007, for a review). Selecting which
type of machine learning to implement depends heavily on the nature of the question and data at hand. For major
standing questions in the suicide field, supervised machine learning methods will often be most applicable. Accord-
ingly, for the remainder of this paper, we focus on supervised machine learning.
Supervised machine learning methods can be applied to address a range of open questions in suicidology. Chief
among these questions is how we can accurately predict suicidal thoughts and behaviors. Advancing suicide predic-
tion has been the focus of suicide researchers for over five decades. Despite considerable research effort, recent
meta‐analyses have revealed that prediction has been weak (i.e., AUCs = 0.56–0.58) and virtually unchanged since
the inception of longitudinal suicide research (Franklin et al., 2017). No single risk factor or risk assessment approach
has demonstrated clear superiority, with nearly all producing estimates only marginally better than chance. Even risk
factors commonly cited as particularly “strong”—for instance, prior suicidal behavior, depression, hopelessness, or
male sex—are weak predictors (Huang, Ribeiro, Musacchio, & Franklin, 2017; Ribeiro et al., 2016a; Ribeiro, Huang,
Fox, & Franklin, 2018). This is also the case for screeners, risk assessments, and clinical judgement, which often com-
bine multiple predictors in fairly rudimentary ways; although these approaches integrate several risk factors, they
remain weak predictors with poor positive predictive value (Carter et al., 2017; Franklin et al., 2017).
One possible explanation of why prediction has been so poor using conventional methods may be how suicide
risk has traditionally been conceptualized and modeled. In reviewing the prediction literature, Franklin et al. (2017)
noted that most studies examined risk factors in isolation (e.g., univariate associations) or within circumscribed sets
of risk factors combined in fairly intuitive and rudimentary ways (e.g., sum scores; two‐ or three‐way interactions).
4 LINTHICUM ET AL.

The implicit assumption of this kind of approach is that the nature of suicide risk is simple and, to some degree, easy
for humans to intuit or conceptually understand. Specifically, the dominant approach assumes that the presence of a
circumscribed set of necessary risk factors combined in a specific way is sufficient to accurately predict suicidal
thoughts and behaviors. This approach may also be an artifact of the fact that, for decades, humans were tasked with
making sense of data. Accordingly, our efforts have largely resulted on evaluating cognitively manageable, and often
fairly intuitive, hypotheses. Yet, based on the evidence to date, this approach does not appear to be sufficient to pro-
duce accurate prediction. Instead, it is possible that a more complex conceptualization of risk may be necessary
(Ribeiro et al., 2016b). Conventional statistical approaches used in psychology and psychiatry are not well suited to
model complexity; by contrast, machine learning is optimally suited to do so.
Within suicide prediction, machine learning techniques are increasingly demonstrating an edge on accuracy and
scalability relative to our conventional statistical approaches. Machine learning holds four distinct advantages above
traditional approaches in each of these domains. First, machine learning methods alone determine the most accurate
and parsimonious algorithm that maps a target outcome to factors of interest. Although parameters can be adjusted
and set forth by the experimenter, the optimal path through the data is largely determined by the machine. Tradi-
tional approaches, by contrast, require the researcher to determine an algorithm a priori and test its exact specifica-
tions. As a result, these algorithms have been fairly simple, often using a small set of predictors combined in a fairly
rudimentary way. As noted above, given the potential complexity of suicidal thoughts and behaviors, this has repeat-
edly failed to yield accurate prediction (Franklin et al., 2017).
Second, machine learning algorithms can accommodate a large number of factors and simultaneously consider
highly complex combinations among those factors. Given recent advances in computing power, simultaneous consid-
eration of thousands of different factors is now possible within a single machine learning model. Algorithms are also
able to model complex relationships among variables. For reasons discussed above, such complex combinations are
unlikely to result from traditional statistical approaches.
Third, and relatedly, machine learning algorithms are well equipped to process high‐dimensional datasets that
include a large number of variables as potential predictors. Applying conventional statistical approaches to such data
is highly vulnerable to overfitting, which occurs when a model capitalizes on the idiosyncrasies (“noise”) of a dataset.
Left unchecked, this can result in overly optimistic estimates of performance. An overfit model will demonstrate
strong performance within the dataset it was developed on but perform poorly once applied to novel datasets.
Guarding against overfitting is a central element of model development within most good machine learning efforts,
with most implementing several techniques to protect against overfitting, thereby increasing the likelihood of
generalizability.
Fourth, machine learning algorithms are designed to maximize clinical significance and generalizability. Suicidal
thoughts and behaviors are extremely rare. Because of this, statistical significance does not necessarily equate to clin-
ical significance. Although statistical significance can be achieved using conventional statistical approaches, the accu-
racy, robustness, and clinical utility of the resulting models are severely limited. For instance, in a recent meta‐analysis
of the effects of depression and hopelessness on further suicidal thoughts and behaviors, hopelessness was found to
have a statistically significant weighted odds ratio of 1.98 in the prediction of suicide death (Ribeiro et al., 2018). In
other words, based on the aggregate of prospective research to date, hopelessness essentially doubles the odds of
death by suicide. Although this effect indicates increased risk among individuals experiencing hopelessness, it is
important to consider how meaningful this finding is in the context of the prevalence of suicide death. In the United
States, the prevalence of dying by suicide is approximately 0.00013; hopelessness would double this risk to 0.00026
—a figure that is still essentially 0 (Ribeiro et al., 2018). This illustrates that a statistically significant effect does not
necessarily ensure clinical significance. In machine learning, model performance is evaluated with respect to predic-
tive accuracy rather than just statistical significance, using metrics such as area under the receiver operating curve,
sensitivity, and positive predictive value.
Beyond predictive accuracy, machine learning, when applied to widely available systems of data, is also more
readily scalable than our conventional approaches to risk detection. We currently rely on assessments that are time,
LINTHICUM ET AL. 5

personnel, and expertise intensive, typically conducted on a one‐to‐one level through interviews or questionnaires.
Accordingly, these approaches are not readily scalable. In the United States, over 5,000 areas are designated as hav-
ing shortages in mental health professionals (Health Resources and Services Administration, 2018). Even if all mental
health providers were adequately trained in suicide risk assessment, there still would not be enough personnel to
meet the growing need for risk detection and monitoring. Compounding this issue is the distribution of suicides
across the United States—many of the areas containing the most suicides per year are also rural and contain the low-
est population densities and densities of healthcare providers (Nelson, Pomerantz, Howard, & Bushy, 2007). More-
over, many individuals may never interact with a mental health professional, as more are likely to be seen by
primary care providers rather than mental health professionals prior to their death (Luoma, Martin, & Pearson,
2002). Suicide risk assessment training among primary care providers may be limited; moreover, given the demands
of a primary care visit, time may also be constrained to administer a comprehensive suicide risk assessment. Machine
learning algorithms, by contrast, can be applied to large systems of data that are widely available across healthcare
settings (e.g., electronic health records). These algorithms can then be translated into more accurate and less time‐
intensive tools that can be more readily disseminated on a large scale.

4 | O V E R V I E W O F T HE EX I S T I N G LI T E R A T U R E

The potential utility of computer‐generated algorithms for suicide risk prediction has been discussed for decades,
with the first study discussing the topic published in 1974 by Greist and colleagues. Since then, very few machine
learning suicide prediction studies have been published, with the majority of suicide prediction studies using machine
learning being published in the past decade. Relative to prediction studies, applications of machine learning to cross‐
sectional data are more common (e.g., Fernandes et al., 2018; Hettige et al., 2017). Although this pattern echoes the
broader suicide literature (Franklin et al., 2017), longitudinal study designs are critical to inform risk (Kraemer et al.,
1997). Despite the limited literature base, initial results using machine learning are promising. Below, we provide
an overview of the most rigorous and influential longitudinal machine learning studies focused on suicide prediction.
As noted above, although still a nascent area of research, initial results are promising. Results, particularly from
recent efforts, have provided compelling evidence for the utility of machine learning in advancing our ability to predict
suicidal behaviors. Results from these studies indicate that machine learning can considerably improve prediction accu-
racy, producing accuracy estimates from high 0.80s to low 0.90s (Barak‐Corren et al., 2017; Kessler et al., 2015; Walsh,
Ribeiro, & Franklin, 2017, 2018). This is a substantial advance compared with the predictive strength of our conven-
tional approaches to prediction, which on average produce accuracy estimates in the 0.50s (Franklin et al., 2017).
Several studies have also applied machine learning to advance our understanding of how our ability to predict
may change over time. Although some variability exists, in general these efforts suggest that our ability to predict sui-
cidal behaviors may improve as these behaviors become more imminent (Tran, Phung, Luo, & Venkatesh, 2015;
Walsh et al., 2017, 2018). Generally, however, machine learning efforts can produce accurate prediction even years
prior to the event.
Notably, across these efforts, a variety of machine learning algorithms have been used, including regularized
regression, random forest, support vector machines, and neural nets. Across these efforts, comparable accuracy esti-
mates have been achieved using a range of different algorithms. This highlights a critical point: although it is reason-
able to assume that researchers are in search of a singular optimized algorithm, this is not the case. That is, there is no
one “golden ticket” algorithm; instead, what results demonstrate is that a variety of machine learning risk algorithms
can be developed with the ability to accurately predict suicidal thoughts and behaviors. Care should be taken to
develop algorithms that demonstrate robust performance within a target population or given a particular data source,
rather than search for an “ultimate” risk algorithm.
The most promising machine learning algorithms for suicide risk detection will have two central features: high
accuracy and high scalability. Within research efforts to date, algorithms that have demonstrated strong performance
6 LINTHICUM ET AL.

in widely available data sources (e.g., electronic health records) are the strongest candidates for large‐scale implemen-
tation in the near future (e.g., Kessler et al., 2015; Walsh et al., 2017, 2018). However, many different algorithms
could be equally effective and scalable, and all require external validation. Further research will be necessary to dis-
seminate this promising approach to suicide risk detection to a variety of settings.
Taken together, these efforts suggest that machine learning is among the most powerful tools we have at our
disposal for advancing suicide prediction. Nevertheless, these efforts represent only the first steps in this direction.
Many critical questions remain unexplored. Limitations of the prior literature can also temper some enthusiasm for
these approaches. Below, we discuss four notable limitations and issues for future study.
First, there is tremendous homogeneity in data sources. Over half of the longitudinal studies to date focus on the
analysis of electronic health record data, with about two‐thirds utilizing data from visits to hospitals or medical cen-
ters. Although the use of electronic records is sensible, given the potential for scalability, there are some drawbacks.
For instance, the nature of electronic health record data in general includes many variables that are highly nonspecific
to suicide. As it stands, it is unclear whether the complexity observed in machine learning models reflects true com-
plexity in suicide risk or is simply a methodological artifact of the nonspecific nature of the input variables. Machine
learning efforts that include variables specific to suicide can be useful for disambiguating this issue. This in turn would
have important theoretical implications.
Second, there is tremendous homogeneity with respect to population types. Most studies have focused largely
on adult populations, only recently expanding to include adolescent participants (Hill, Oosterhoff, & Kaplow, 2017;
Walsh et al., 2018). Initial results are promising, with prediction accuracies achieved by machine learning models
for adolescents on par with those achieved for adult populations (Hill et al., 2017; Walsh et al., 2017, 2018). No
efforts, to our knowledge, have extended models to older adults, despite this age group being particularly vulnerable
to death by suicide (Curtin, Warner, & Hedegaard, 2016). Given that most older adults die by suicide on first attempt
and many machine learning models include prior suicidal behavior among the predictor variables, it is possible that
these models will not generalize well to this population. In general, sample homogeneity limits our ability to examine
the capability of machine learning to predict over a broader variety of settings and populations.
Third, most studies have examined suicidal behavior – typically nonfatal suicide attempts or suicide death. Only a
single longitudinal study to date has focused on the prediction of suicidal ideation (Hill et al., 2017). Again, this pat-
tern parallels the broader literature on suicide risk (Franklin et al., 2017). Machine learning algorithms may prove use-
ful for examining a broader array of suicidal phenomena. For example, it has been suggested that prison inmates
display increased rates of malingering and exaggerate suicidal symptoms in order to gain access to healthcare or
manipulate others (Correia, 2000). Accurate identification of these individuals may be of high importance to the
healthcare, legal, and forensic systems. However, to our knowledge researchers have yet to apply machine learning
to these related research questions. Future research on algorithms for identifying and predicting a broader range
of suicide‐relevant phenomena would be useful.
Fourth, external validation is critical, but to date has been rare. External validation refers to the process of eval-
uating model performance when applied to novel sources of data. Although most models are built to buffer against
overfitting and increase chances of generalization, many models will still fail or require modification when applied to
new data. The objective of much of this research is to produce tangible tools for providers in order to more accu-
rately detect risk. Model robustness is critical prior to large‐scale implementation. Moving forward, it is critical that
the existing models that show promise be externally validated.

5 | PRACTICAL AND ETHICAL CONSIDERATIONS

Machine learning holds promise for designing more accurate and effective risk detection and intervention strategies.
Translating risk algorithms into effective clinical tools that can be delivered at point of care is an important next step.
Prior to doing so, however, several important practical and ethical issues must be considered.
LINTHICUM ET AL. 7

Although initial research speaks to the promise of machine learning risk algorithms with respect to accuracy and
scale, a successful risk detection strategy must extend beyond these issues. Specifically, a successful strategy must
also be optimized to the setting in which it will be delivered and the users who will be interfacing with the tool. Sev-
eral barriers exist that can undermine the implementation of any risk assessment approach. Negative beliefs or
expectations from providers on the tool's usefulness may significantly hinder its use and implementation. Providers
require effective and efficient training on how to use the tool. Disruption of existing clinical workflows can also inter-
fere with implementation. Without providing adequate demonstrations of the utility of a machine learning algorithm
and its smooth integration into existing procedures and resources, even the most accurate and scalable tool will fail
(Kilsdonk, Peute, & Jaspers, 2017).
Accurate risk detection is necessary for prevention; however, it is likely insufficient. Beyond detecting individuals
at risk, we must pair accurate and scalable risk detection strategies with effective and scalable interventions. Under-
standing how to link risk scores derived by a machine learning risk algorithm with appropriate treatments requires
additional study. To our knowledge, researchers have yet to test the appropriateness of various risk management
strategies given algorithm‐indicated suicide risk levels. Considerations about what interventions are appropriate or
available in a given setting is also critical. For instance, recommendations made in primary care, given limits on time
and expertise, may be different in scope than recommendations made in mental health settings. In primary care atten-
dant recommendations for providers may involve securing a consultation from a mental health professional, whereas
in mental health settings recommendations may be made to assess imminent risk or implement a more specific inter-
vention (e.g., safety planning; means restriction). Consideration should also be made with respect to what interven-
tions will be most effective for given suicide risk levels. Currently, there are no studies examining the effectiveness of
interventions across algorithm‐indicated suicide risk levels. Future research is needed to determine what interven-
tions are most appropriately matched to a given risk prediction level.
The integration of machine learning algorithms into practice, especially in the critical domain of risk detection,
raises important ethical issues. First, the influence of bias on algorithm construction must be assessed and guarded
against. Algorithms reflect the nature of the data collected, which can result in amplifications of societal inequalities.
For example, an algorithm may be designed for use within a hospital system and rely on electronic health records for
input. It might automatically select a specific feature, such as number of prior medical visits, as a useful part of pre-
diction for whether or not someone will attempt suicide. However, the number of visits a person has in their record
may be related to many other factors—for example, if an individual is an undocumented immigrant, they may be less
likely to have documented contact with the health system. These types of systematic variance in the input data can
then lead to over‐ or under‐performance of algorithms in different patient populations, potentially perpetuating dis-
crimination in healthcare treatment across populations. Similar issues are already being considered with algorithms
designed and used to predict recidivism, classify child abuse, and determine disability status and funding (e.g., Danks
& London, 2017). Machine learning risk algorithms considered for use in life‐or‐death situations, such as determining
suicide risk, must be rigorously tested and shown to be robust against sources of potential bias.
Second, at present, machine learning risk algorithms can speak to who will attempt or die by suicide but not when
someone at risk might act. This gap in knowledge is particularly salient when considering the likelihood of imminent
risk. When suicide risk is determined to be sufficiently elevated to threaten an individual's imminent safety, clinicians
are ethically bound to take actions to intervene. In many cases, this may involve involuntary hospitalization. Arguably,
such determinations are among the most difficult predictions to make. Yet, as it stands, clinicians are still responsible
for determining that level of risk given the constraints of existing algorithms.
Third, although machine learning algorithms have improved the accuracy of suicide risk detection, their perfor-
mance is not perfect. For a number of reasons, perfect prediction using machine learning is unlikely. Accordingly, risk
algorithms will always suffer classification errors—that is, false positives and false negatives. In the case of false pos-
itives, individuals who are not at risk will be classified as being at risk. This misclassification will affect the distribution
of available resources. It will also potentially have negative effects on the individual. More concerning, however, is
the case of false negatives. In such instances, individuals who are at risk will be missed. Although several efforts have
8 LINTHICUM ET AL.

reported much stronger positive predictive values than conventional risk assessment approaches (e.g., Walsh et al.,
2017), the presence of false negatives may persist. In such cases, ethical and legal questions regarding where the
responsibility lies should be considered—if a life is lost to suicide because of a classification error, at least in part,
is it the responsibility of the risk algorithm, its developer, the healthcare setting that adopted the risk algorithm, or
the provider? Similarly, if an algorithm indicates suicide risk that the individual explicitly denies or the clinician does
not agree with, how should the situation be resolved? To what extent a doctor, law enforcement official, or court
should rely on a machine learning risk algorithm to inform the serious decision to involuntarily confine an individual,
and at what risk point, remains an unanswered ethical question.

6 | C O N CL U S I O N

In sum, machine learning stands to be a promising tool for advancing our knowledge in critical domains of suicide sci-
ence. Machine learning techniques have already demonstrated considerable promise in suicide risk prediction, sub-
stantially enhancing accuracy from near‐chance levels (Franklin et al., 2017) to near‐perfect levels (e.g., Kessler
et al., 2015, 2017; Walsh et al., 2017, 2018). Despite its promise, the application of machine learning to suicide sci-
ence is in its infancy. Many questions remain unexplored, and many steps are still necessary before integration into
healthcare settings. As we continue to pursue machine learning applications for clinical practice, we must also remain
attentive to critical ethical questions. Addressing these issues and leveraging advances in machine learning to
improve suicide science will require multidisciplinary teams passionate about reducing the global burden of suicide.

RE FE R ENC ES
Barak‐Corren, Y., Castro, V. M., Javitt, S., Hoffnagle, A. G., Dai, Y., Perlis, R. H., … Reis, B. Y. (2017). Predicting suicidal behav-
ior from longitudinal electronic health records. American Journal of Psychiatry, 174(2), 154–162. https://doi.org/10.1176/
appi.ajp.2016.16010077

Carrasquilla, J., & Melko, R. G. (2017). Machine learning phases of matter. Nature Physics, 13(5), 431–434. https://doi.org/
10.1038/nphys4035

Carter, G., Milner, A., McGill, K., Pirkis, J., Kapur, N., & Spittal, M. J. (2017). Predicting suicidal behaviours using clinical instru-
ments: Systematic review and meta‐analysis of positive predictive values for risk scales. British Journal of Psychiatry,
210(6), 387–395. https://doi.org/10.1192/bjp.bp.116.182717

Correia, K. M. (2000). Suicide assessment in a prison environment: A proposed protocol. Criminal Justice and Behavior, 27(5),
581–599. https://doi.org/10.1177/0093854800027005003

Curtin, S. C., Warner, M., & Hedegaard, H. (2016). Increase in suicide in the United States, 1999–2014: NCHS data brief, no
241. Hyattsville, MD: National Center for Health Statistics.

Danks, D., & London, A. J. (2017, August). Algorithmic bias in autonomous systems. In Proceedings of the Twenty‐Sixth Inter-
national Joint Conference on Artificial Intelligence, pp. 4691–4697.

Fernandes, A. C., Dutta, R., Velupillai, S., Sanyal, J., Stewart, R., & Chandran, D. (2018). Identifying suicide ideation and sui-
cidal attempts in a psychiatric clinical research database using natural language processing. Scientific Reports, 8(1), 7426.
https://doi.org/10.1038/s41598‐018‐25773‐2

Franklin, J. C., Ribeiro, J. D., Fox, K. R., Bentley, K. H., Kleiman, E. M., Huang, X., … Nock, M. K. (2017). Risk factors for suicidal
thoughts and behaviors: A meta‐analysis of 50 years of research. Psychological Bulletin, 143(2), 187. https://doi.org/
10.1037/bul0000084–232.

Greist, J. H., Gustafson, D. H., Stauss, F. F., Rowse, G. L., Laughren, T. P., & Chiles, J. A. (1974). Suicide risk prediction: A new
approach. Suicide and Life‐threatening Behavior, 4(4), 212–223.

Health Resources and Services Administration (2018). Designated health professional shortage area statistics. Retrieved
from https://ersrs.hrsa.gov/ReportServer?/HGDW_Reports/BCD_HPSA/BCD_HPSA_SCR50_Qtr_Smry_HTML&rc:
Toolbar=false

Hettige, N. C., Nguyen, T. B., Yuan, C., Rajakulendran, T., Baddour, J., Bhagwat, N., … De Luca, V. (2017). Classification of
suicide attempters in schizophrenia using sociocultural and clinical features: A machine learning approach. General Hos-
pital Psychiatry, 47, 20–28. https://doi.org/10.1016/j.genhosppsych.2017.03.001
LINTHICUM ET AL. 9

Hill, R. M., Oosterhoff, B., & Kaplow, J. B. (2017). Prospective identification of adolescent suicide ideation using classification
tree analysis: Models for community‐based screening. Journal of Consulting and Clinical Psychology, 85(7), 702–711.
https://doi.org/10.1037/ccp0000218
Huang, X., Ribeiro, J. D., Musacchio, K. M., & Franklin, J. C. (2017). Demographics as predictors of suicidal thoughts and
behaviors: A meta‐analysis. PLoS ONE, 12(7), e0180793. https://doi.org/10.1371/journal.pone.0180793
Jones, T. R., Carpenter, A. E., Lamprecht, M. R., Moffat, J., Silver, S. J., Grenier, J. K., … Sabatini, D. M. (2009). Scoring diverse
cellular morphologies in image‐based screens with iterative feedback and machine learning. Proceedings of the National
Academy of Sciences of the United States of America, 106(6), 1826–1831. https://doi.org/10.1073/pnas.0808843106
Kessler, R. C., Stein, M. B., Petukhova, M. V., Bliese, P., Bossarte, R. M., Bromet, E. J., … Bell, A. M. (2017). Predicting suicides
after outpatient mental health visits in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS).
Molecular Psychiatry, 22(4), 544–551. https://doi.org/10.1038/mp.2016.110
Kessler, R. C., Warner, C. H., Ivany, C., Petukhova, M. V., Rose, S., Bromet, E. J., … Fullerton, C. S. (2015). Predicting suicides
after psychiatric hospitalization in US Army soldiers: the Army Study to Assess Risk and Resilience in Servicemembers
(Army STARRS). JAMA Psychiatry, 72(1), 49–57. https://doi.org/10.1001/jamapsychiatry.2014.1754
Kilsdonk, E., Peute, L. W., & Jaspers, M. W. (2017). Factors influencing implementation success of guideline‐based clinical
decision support systems: A systematic review and gaps analysis. International Journal of Medical Informatics, 98,
56–64. https://doi.org/10.1016/j.ijmedinf.2016.12.001
Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerg-
ing Artificial Intelligence Applications in Computer Engineering, 160, 3–24.
Kraemer, H. C., Kazdin, A. E., Offord, D. R., Kessler, R. C., Jensen, P. S., & Kupfer, D. J. (1997). Coming to terms with the terms
of risk. Archives of General Psychiatry, 54(4), 337–343. https://doi.org/10.1001/archpsyc.1997.01830160065009
Luoma, J. B., Martin, C. E., & Pearson, J. L. (2002). Contact with mental health and primary care providers before suicide: A
review of the evidence. American Journal of Psychiatry, 159(6), 909–916. https://doi.org/10.1176/appi.ajp.159.6.909
Nelson, W., Pomerantz, A., Howard, K., & Bushy, A. (2007). A proposed rural healthcare ethics agenda. Journal of Medical
Ethics, 33(3), 136–139. https://doi.org/10.1136/jme.2006.015966
Neumann, B., Walter, T., Hériché, J. K., Bulkescher, J., Erfle, H., Conrad, C., … Cetin, C. (2010). Phenotypic profiling of the
human genome by time‐lapse microscopy reveals cell division genes. Nature, 464(7289), 721–727. https://doi.org/
10.1038/nature08869
Ribeiro, J. D., Franklin, J. C., Fox, K. R., Bentley, K. H., Kleiman, E. M., Chang, B. P., & Nock, M. K. (2016a). Self‐injurious
thoughts and behaviors as risk factors for future suicide ideation, attempts, and death: a meta‐analysis of longitudinal
studies. Psychological Medicine, 46(2), 225–236. https://doi.org/10.1017/S0033291715001804
Ribeiro, J. D., Franklin, J. C., Fox, K. R., Bentley, K. H., Kleiman, E. M., Chang, B. P., & Nock, M. K. (2016b). Letter to the Editor:
Suicide as a complex classification problem: machine learning and related techniques can advance suicide prediction—a
reply to Roaldset (2016). Psychological Medicine, 46(9), 2009–2010. https://doi.org/10.1017/S0033291716000611
Ribeiro, J. D., Huang, X., Fox, K. R., & Franklin, J. C. (2018). Depression and hopelessness as risk factors for suicide ideation,
attempts and death: Meta‐analysis of longitudinal studies. British Journal of Psychiatry, 212(5), 279–286. https://doi.org/
10.1192/bjp.2018.27
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of research and development,
3(3), 210–229. https://doi.org/10.1147/rd.33.0210
Tran, T., Phung, D., Luo, W., & Venkatesh, S. (2015). Stabilized sparse ordinal regression for medical risk stratification. Knowl-
edge and Information Systems, 43(3), 555–582. https://doi.org/10.1007/s10115‐014‐0740‐4
Walsh, C. G., Ribeiro, J. D., & Franklin, J. C. (2017). Predicting risk of suicide attempts over time through machine learning.
Clinical Psychological Science, 5(3), 457–469. https://doi.org/10.1177/2167702617691560
Walsh, C. G., Ribeiro, J. D., & Franklin, J. C. (2018). Predicting suicide attempts in adolescents with longitudinal clinical data
and machine learning. Journal of Child Psychology and Psychiatry, 59, 1261–1270. https://doi.org/10.1111/jcpp.12916

How to cite this article: Linthicum KP, Schafer KM, Ribeiro JD. Machine learning in suicide science: Applica-
tions and ethics. Behav Sci Law. 2019;1–9. https://doi.org/10.1002/bsl.2392

You might also like