Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Available online at www.sciencedirect.

com

ScienceDirect
Behavior Therapy 53 (2022) 334–347

www.elsevier.com/locate/bt

Chatbot-Delivered Psychotherapy for Adults With Depressive and


Anxiety Symptoms: A Systematic Review and Meta-Regression ,,
Shi Min Lim
National University Hospital, National University Health System
Chyi Wey Claudine Shiau
Tan Tock Seng Hospital, National Healthcare Group
Ling Jie Cheng
Saw Swee Hock School of Public Health, National University of Singapore

Ying Lau
Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine,
National University of Singapore

ment of heterogeneity was done using v2 and I2 tests. A


Although psychotherapy is a well-established treatment for meta-analysis of 11 trials revealed that chatbot-delivered
depression and anxiety, chatbot-delivered psychotherapy is psychotherapy significantly improved depressive symptoms
an emerging field that has yet to be explored in depth. This (g = 0.54, 95% confidence interval [ 0.66, 0.42], p <
review aims to (a) examine the effectiveness of chatbot- .001). Although no significant subgroup differences were
delivered psychotherapy in improving depressive symp- detected, results revealed larger effect sizes for samples of
toms among adults with depression or anxiety, and (b) clinically diagnosed anxiety or depression, chatbots with
evaluate the preferred features for the design of chatbot- an embodiment, a combination of types of input and out-
delivered psychotherapy. Eight electronic databases were put formats, less than 10 sessions, problem-solving ther-
searched for relevant randomized controlled trials. Meta- apy, off-line platforms, and in different regions of the
analysis and random effects meta-regression was con- United States than their counterparts. Meta-regression
ducted using Comprehensive Meta-Analysis 3.0 software. did not identify significant covariates that had an impact
Overall effect was measured using Hedges’s g and deter- on depressive symptoms. Chatbot-delivered psychotherapy
mined using z statistics at significance level p < .05. Assess- can be adopted in health care institutions as an alternative
treatment for depression and anxiety. More high-quality
trials are warranted to confirm the effectiveness of
We would like to acknowledge authors for sharing their study in chatbot-delivered psychotherapy on depressive symptoms.
our meta-analysis.
PROSPERO registration number: CRD42020153332.
We received no funding from an external source.
SML and CWCS conducted the systematic literature search with
help of the senior librarian. SML and CWCS performed the title
and abstract screening, data extraction, thematic data analysis, Keywords: artificial intelligence; psychotherapy; depression; meta-
and data synthesis. SML, CWCS, LJC, and YL participated in the analysis; systematic review
conception and the design of the study, the development of the
methodology, data management, and reference editing. SML, YL,
and LJC wrote and formatted the paper and SML, CWCS, LJC, DEPRESSIVE DISORDERS ARE DISTINGUISHED by pes-
and YL read and approved the final version of the paper. simism, diminished interest levels, experience of
Address correspondence to Ying Lau, Ph.D., Level 2, Clinical guilt, and depleted self-worth (World Health
Research Centre, Block MD11, 10 Medical Drive, Singapore Organization, 2017), whereas anxiety disorders
117597. e-mail: nurly@nus.edu.sg.
are distinguished by feelings of excessive worry
0005-7894/Ó 2021 Association for Behavioral and Cognitive Therapies. and fear (World Health Organization, 2017). It
Published by Elsevier Ltd.
chatbot-delivered psychotherapy 335
was estimated that adults with depressive disor- The mechanism by which chatbots work can be
ders surged by 18.4% from 2005 to 2015 (Vos explained by the persuasive systems design (PSD)
et al., 2016) and adults diagnosed with anxiety dis- model (Scholten et al., 2017). According to Kegel
orders increased by 14.9% from 2005 to 2015 and Wieringa (2014), the PSD model consists of
(Vos et al., 2016). Comorbidity of depression three elements: the intent, the event, and the strat-
and anxiety are frequently observed, where egy. First, chatbots identify the event, which is the
85.0% of depressive patients experienced anxiety problem experienced by the user through analyz-
symptoms and 90.0% of anxious patients also ing the context of user response. Next, the strategy
experienced depressive symptoms (Tiller, 2013). consisting of messages that utilizes persuasive tech-
Both disorders are associated with poor social niques is communicated to the user. Technology-
functioning of the individual and the patients are mediated persuasive techniques are structured to
at increased risk of suicide attempts (Heeren change the cognitive stance of the individual via
et al., 2018). Substantial health care burden and the provision of support in four dimensions: “pri-
expenditure are placed on national health systems mary task support” that assists users in task
due to the relapsing and recurrent nature of accomplishment, “dialogue support” that guides
depression and anxiety (Kuyken et al., 2015), and offers constructive feedback to the user, “cred-
and contribute to losses in worldwide work partic- ibility support” that confirms the trustworthiness
ipation and productivity (Chisholm et al., 2016). of the feedback generated, and “social support”
Psychotherapy including cognitive-behavioral that encourages the user throughout the behavior
therapy (CBT) and problem-solving therapy (PST) change process (Oinas-Kukkonen & Harjumaa,
are well supported for the treatment of depression 2009). The intent refers to the overall intended
and anxiety (Weitz et al., 2018). CBT posits that behavior or attitude change that the chatbot aims
dysfunctional cognitions bring about the perpetua- to achieve. The process of analyzing user response
tion of distressed emotions and maladaptive behav- and provision of persuasive messages continues
iors (Beck, 1970). CBT techniques target change in until the intent is achieved. Chatbot-delivered psy-
dysfunctional cognitions, which subsequently chotherapy combines the delivery of psychothera-
improve emotional distress and maladaptive peutic content with a range of persuasive
behaviors (Hofmann et al., 2012). According to techniques, simulating an interaction similar to a
D’Zurilla and Goldfried (1971), PST aims to gener- therapeutic conversation (Bendig et al., 2019).
ate individual appraisal of the problem and Chatbots can be classified according to the pres-
prompts the individual to identify effective solu- ence of embodiment (presence of human features),
tions to cope with the problem. PST reduces the variation in input and output formats (variation in
perceived severity of the problem by inducing the ways of responding to and receiving responses
individual to develop different strategies to counter from), and can be based on online or off-line
a problem and enhance his or her internal locus of (without an Internet connection) platforms (Abd-
control (Warmerdam et al., 2010). alrazaq et al., 2019).
Although effectiveness of traditional face-to- Given the emerging nature of chatbots in the
face psychotherapy has been established, only a field of psychotherapy, similar reviews (Abd-
minority of depressed and anxious individuals alrazaq et al., 2019; Montenegro et al., 2019;
receive treatment in a timely and adequate manner Vaidyam et al., 2019) focused on conceptualizing
(Patel et al., 2010) due to treatment barriers the scope of chatbots and their uses in clinical psy-
related to accessibility, convenience, worries about chology, instead of evaluating their effects on men-
stigmatization, and costs (Lindhiem et al., 2015). tal health outcomes. Existing meta-analytic
Chatbots are innovative subsets of technological reviews that reported effects on depressive symp-
interventions that may be utilized to deliver psy- toms (Bennett et al., 2019; Moman et al., 2019;
chotherapy while overcoming barriers of tradi- Wahle et al., 2017) were restricted to mixed inter-
tional psychotherapy. Chatbots are automated ventions as they included a broad scope of
software that hold a text-based or speech-based technology-delivered mental health interventions,
dialogue with humans through an interactive inter- leading to high statistical heterogeneity in their
face (Abdul-Kader & Woods, 2015). Chatbots are results. Only one meta-analysis specifically ana-
conducted on platforms independent of time and lyzed chatbot-delivered psychotherapy (Twomey
place, making them widely accessible to individu- et al., 2017), but the included trials involved vary-
als (Lin et al., 2017). Anonymity of the user is pre- ing degrees of therapist involvement that induced
served, allaying worries about stigmatization with variability in the efficacy of the intervention. Given
regard to the sensitive topics discussed (Schnyder the potential that chatbot-delivered psychotherapy
et al., 2017). has on improving depressive symptoms, further
336 lim et al.
research must be performed to evaluate their effec- the electronic databases PubMed, Embase,
tiveness. It was hypothesized that chatbot- Cochrane Library, CINAHL, PsycINFO, Scopus,
delivered psychotherapy would improve depressive and IEEE Xplore from study inception until
symptoms in adults with depression or anxiety. December 15, 2019, and updated in March 18,
The review objectives include (a) synthesize 2020, to identify eligible trials. A detailed search
available evidence that examines the effectiveness strategy for eight databases is found in the supple-
of chatbot-delivered psychotherapy in improving mentary material. Second, a search was conducted
depressive symptoms among adults with depression for unpublished trials from a variety of clinical
or anxiety, and (b) identify the preferred features in trial registries. Finally, reference lists of similar
designing chatbot-delivered psychotherapy. systematic reviews and included studies, gray liter-
ature, and target journals were hand searched to
Method maximize potential trials. Original authors of rele-
The Preferred Reporting Items for Systematic vant trials were reached via e-mail correspondence
Reviews and Meta-Analyses (PRISMA) was to request missing information or to seek clarifica-
employed in the conduct of this systematic review tions. Studies with title or abstracts that met the
and meta-analysis (Moher et al., 2009). This selection criteria were selected for full-text evalua-
review is registered in the PROSPERO database tion by two independent reviewers. Reasons of
at the Centre for Reviews and Dissemination in exclusion were documented. Interrater agreement
the United Kingdom (CRD42020153332). using the Cohen kappa test statistic was calculated
(McHugh, 2012). A third author was involved for
inclusion and exclusion criteria the resolution of any disagreements.
We included all types of randomized controlled
trials (RCTs) that had adults with subclinical or data extraction
clinically diagnosed depression, anxiety, or both. Data items extracted for characteristics of
Trials were included if they incorporated autono- included trials consist of author, design, year,
mous chatbots that involved synchronous two- country, mean age, nature of sample, intervention,
way exchanges with the user, delivered psy- control, sample size, attrition rate, missing data
chotherapeutic content with the intent to treat management, intention-to-treat (ITT) analysis,
depression or anxiety, and were conducted via trial registration, published protocol, and grant
mobile or computer devices. Due to the emerging support. Data items extracted relevant to
nature and the associated heterogeneous terminol- chatbot-delivered psychotherapy included plat-
ogy used to describe chatbot technology, the crite- form, user input format, chatbot output format,
ria by McTear et al. (2016a) is used in this review psychotherapy, presence of embodiment, number
to identify an autonomous chatbot. First, an appli- of sessions, length of session, session frequency,
cation interface must be present for users to input intervention duration, follow-up, and outcomes.
their responses (spoken language or text). Second,
software present in the chatbot must indepen- quality assessment
dently interpret the meaning of words by the user. Two authors appraised the quality of the eligible
Third, the chatbot must formulate a response to trials independently by utilizing the Cochrane
the user based on the input given (spoken language risk-of-bias tool (Higgins & Green, 2011). Six
or text). Comparators included wait-list control, assessment domains were categorized as “high
treatment as usual (TAU), or information-only risk,” “low risk,” or “unclear risk”—namely, (a)
control, all of which lacked chatbot technology. generation of random sequence, (b) concealment
Primary outcome includes depressive symptoms of allocation sequence, (c) blinding of participants
measured at the immediate postintervention per- and personnel, (d) blinding of outcome assess-
iod. No limits were in place for the publication ment, (e) incomplete outcome data, and (f) selec-
year, and only trials reported in the English lan- tive reporting. Interrater agreement using the
guage were included. Details of the eligibility crite- Cohen kappa test statistic was calculated
ria are found in the supplementary material (McHugh, 2012). Any disagreements were
information. resolved by LJC. The Grading of Recommenda-
tions, Assessment, Development and Evaluation
search strategy (GRADE) framework was utilized to evaluate the
A three-step extensive search was performed in overall evidence quality across trials (Guyatt
line with the guidelines proposed by the Cochrane et al., 2011). The quality of evidence was
Handbook for Systematic Reviews of Interven- appraised based on five evidence factors: (a)
tions (Higgins & Green, 2011). First, we searched methodological limitations, (b) inconsistency, (c)
chatbot-delivered psychotherapy 337
indirectness, (d) imprecision, and (e) publication duration of intervention, and sample size influ-
bias. Each factor was rated “not serious,” “seri- enced the effect size of chatbot-delivered psy-
ous,” or “very serious.” chotherapy. We adopted a p < .05 significance
level for random effects meta-regression analysis.
statistical analysis
Comprehensive Meta-Analysis Software (Version Results
3; Borenstein et al., 2013) was used to conduct
the meta-analyses by pooling data of the same out- study selection
comes under a random effects model. A random The PRISMA flow diagram depicts the process of
effects model was adopted as it accounts for the trial selection as shown in Figure 1. A total of 11
statistical assumption of variation in the estima- RCTs were selected for meta-analysis. Interrater
tion of the mean across all trials (Borenstein agreement of .79 was calculated using Cohen’s
et al., 2010). Z statistics at a significance level of kappa for trial selection, which indicates consider-
p < .05 was used to analyze the overall effect. able agreement between two authors regarding
Pooled mean effect sizes of continuous outcome trial inclusion (McHugh, 2012). Included trials
scores were estimated using Hedges’s g, as it pro- are presented in Table 1 and are marked with an
vides a more accurate estimation of the overall asterisk (*) in the reference list. Reasons for eligi-
effect size when the included trials have small sam- bility of intervention considered as chatbot tech-
ple sizes (Rosenthal et al., 1994). Effect size was nology in this review is included in the
interpreted as small (0.2), medium (0.5), large supplementary material information.
(0.8), and very large (1.2; Hedges & Olkin,
2014). The inverse–variance statistical method study characteristics
was utilized to analyze continuous outcomes Table 1 summarizes the trial characteristics among
(Higgins & Green, 2011). Statistical heterogeneity 1,099 participants from 2009 (Meyer et al., 2009)
was evaluated based on Cochran’s Q test, where p to 2017 (Berger et al., 2017; Fitzpatrick et al.,
< .10 indicates the presence of significant statistical 2017; Zwerenz et al., 2017) in Switzerland
heterogeneity (Higgins & Green, 2011). The I2 (Berger et al., 2011, 2017), the United Kingdom
statistic was used to quantify the degree of hetero- (Burton et al., 2016), the United States (Cartreine
geneity, where I2 indicates the variation percent- et al., 2012; Fitzpatrick et al., 2017; Sandoval
age between the sample estimates of trials that is et al., 2017), and Germany (Meyer et al., 2009,
due to heterogeneity instead of sampling error 2015; Moritz et al., 2012; Schröder et al., 2014;
(Sedgwick, 2013). The I2 was interpreted as 0– Zwerenz et al., 2017). Sample sizes ranged from
40% (might not be important), 30–60% (moder- 14 (Cartreine et al., 2012) to 396 (Meyer et al.,
ate), 50–90% (substantial), and 75–100% (consid- 2009). Seven trials included adults with a clinical
erable; Higgins & Green, 2011). diagnosis of depression or anxiety, while four tri-
Sensitivity analyses were applied to identify als included adults with a subclinical disorder of
heterogeneous trials that were removed to main- depression, anxiety, or both. Three trials had chat-
tain overall homogeneity (Higgins & Green, bots with embodiment. Two trials utilized CBT
2011). Subgroup analyses were conducted to only, while two trials utilized PST only. Three tri-
reduce overall heterogeneity across studies (I2 > als had chatbots that were delivered via off-line
40%; Sedgwick, 2013) and to compare the treat- platforms. Three trials had chatbots with response
ment effects among features of intervention options and written as the input format. Three tri-
(Borenstein & Higgins, 2013). Predefined sub- als had chatbots that used a combination of writ-
groups included the nature of sample, type of com- ten, spoken, and gestures as the output format.
parator, type of psychotherapy, input format, The intervention duration was between 2 weeks
output format, chatbot embodiment, intervention (Fitzpatrick et al., 2017) and 12 weeks (Meyer
duration, and platform. Publication bias was et al., 2015; Zwerenz et al., 2017).
examined using visual inspection of asymmetry Seven trials had an overall unclear risk of bias
of funnel plot and Egger’s test (Egger et al., attributed mainly to insufficient details on the
1997) when the review included 10 or more trials domain of performance bias, while four trials
(Sterne et al., 2011). Meta-regression was con- had an overall high risk of bias attributed mainly
ducted to explain whether between-trial hetero- to the presence of attrition bias. Interrater agree-
geneity was attributed to covariates (Borenstein ment of .81 was calculated using Cohen’s kappa
et al., 2011). A random effects multivariate for risk of bias assessment of individual trials,
meta-regression model was conducted to assess which indicates near perfect agreement
whether year of publication, mean age of sample, (McHugh, 2012).
338 lim et al.

5180 records idenfied from database

Idenficaon
searching (ll 31st December 2019)
PubMed (N=879), EMBASE (N=1072),
CINHAL (N=347), Cochrane (N =1440),
PsycINFO (N =523), Scopus (N =655), Addional records idenfied
ProQuest Dissertaons and Theses (N =86), through reference list searching
IEEExplore (N =178) (N = 2)

2889 records were idenfied as


duplicates and removed using
Screening

ENDNOTE soware

Records screened (N = 2293)

Reasons for 2220 records exclusion:


Irrelevant tle (N= 1825)
Irrelevant abstract (N= 395)
Eligibility

Full-text arcles assessed


for eligibility
(N = 73)

Reasons for n= 62 full-text arcles


exclusion:
No chatbot (N = 27)
Human support involved (N = 13)
Populaon without depressive or
anxiety symptoms (N = 11)
Non-RCT (N = 3)
Study protocol (N = 3)
Included

Abstract only (N = 3)

Trials included in
meta-analysis
(N= 11)

FIGURE 1 Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram.

depressive symptoms exclude trials with a high risk of bias (Lundh &
A meta-analysis on depressive symptoms involv- Gøtzsche, 2008). Results after the sensitivity
ing 1,099 participants was conducted in the analysis indicated minimal change in depressive
selected 11 trials as shown in Figure 2. Compared symptom scores (z = 7.16, p < .001) and effect
to control conditions, results of the meta-analysis size (g = 0.58), favoring chatbot-delivered
showed a significant improvement in depressive psychotherapy.
symptom scores (z = 8.64, p < .001), with a med-
ium effect size (g = 0.54), favoring chatbot- subgroup analyses
delivered psychotherapy using a random effects Subgroup analyses discovered larger effect sizes for
model. Sensitivity analysis was conducted to chatbots with embodiment (g = 0.88), using off-
Table 1
Characteristics of Included Randomized Controlled Trials
Author Design Nature of sample/mean age ± Sample Intervention (name) Control Outcomes Attrition ITT/missing Protocol/
(year)/country SD size (measures) rate (%) data registration/grant
management support
Berger et al. 3-arm Adults diagnosed with MDD or T: 51 Unguided Internet-based Wait-list Depressive 13.7 Yes/yes No/no/yes
(2011)/ RCT dysthymia/38.80 ± 14.0 I: 25 program (Deprexis) control symptoms
Switzerland C: 26 (BDI-II)
Berger et al. 2-arm Adults diagnosed with SAD, T: 139 Unguided Internet TAU Depressive 13.7 No/no No/yes/yes
(2017)/ RCT PDA, or GAD/41.95 ± NR I: 70 intervention (Velibra) symptoms
Switzerland C: 69 (BDI-II)

chatbot-delivered psychotherapy
Burton et al. 2-arm Adults diagnosed with MDD/ T: 28 Embodied virtual agent- TAU Depressive 25.0 No/no No/yes/yes
(2016)/United RCT 38.65 ± NR I: 14 based system symptoms
Kingdom C: 14 (Help4Mood) (BDI-II)
Cartreine et al. 2-arm Adults diagnosed with minor T: 14 Electronic problem-solving Wait-list Depressive 21.4 Yes/yes No/yes/yes
(2012)/United RCT depression/50.40 ± NR I: 7 treatment (ePST) control symptoms
States C: 7 (HDI)
Fitzpatrick et al. 2-arm Adults with subclinical anxiety T: 70 Fully automated Information Depressive 20.0 Yes/yes No/no/yes
(2017)/United RCT and depressive symptoms22.21 I: 34 conversational agent only symptoms
States ± NR C: 36 (Woebot) (PHQ-9)
Meyer et al. 2-arm Adults with subclinical T: 396 Integrative online treatment Wait-list Depressive 45.5 No/yes No/yes/yes
(2009)/Ger- RCT depression/34.76 ± 11.60 I: 320 (Deprexis) control symptoms
many C: 76 (BDI)
Meyer et al. 2-arm Adults diagnosed with MDD or T: 163 Internet-based treatment TAU Depressive 17.8 Yes/yes No/yes/yes
(2015)/Ger- RCT Dysthymia/42.00 ± 11.39 I: 78 (Deprexis) symptoms
many C: 85 (PHQ-9)
Moritz et al. 2-arm Adults with subclinical T: 210 Online self-help program Wait-list Depressive 19.0 No/no No/yes/no
(2012)/Ger- RCT depression/38.60 ± NR I: 105 for depression (Deprexis) control symptoms
many C: 105 (BDI)
Sandoval et al. 2-arm Adults diagnosed with MDD or T: 45 Interactive media-based TAU Depressive 0 No/no No/no/yes
(2017)/United RCT Dysthymic Disorder/28.60 ± NR I: 25 problem-solving therapy symptoms
States C: 20 (imbPST) (HSCL)
Schröder et al. 2-arm Adults with subclinical T: 78 Psychological online Wait-list Depressive 26.9 No/no No/yes/no
(2014)/Ger- RCT depression and epilepsy/37.53 I: 38 intervention for depression control symptoms
many ± NR C: 40 (Deprexis) (BDI-I)
Zwerenz et al. 2-arm Adults diagnosed with T: 229 Online self-help (Deprexis) Information Depressive 6.11 Yes/yes Yes/yes/yes
(2017)/Ger- RCT depression/47.98 ± 9.79 I: 115 only symptoms
many C: 114 (BDI-II)
Note. SD = standard deviation; ITT = intent to treat; RCT = randomized controlled trials; MDD = major depressive disorder; T = total; I = intervention; C = control; BDI = Beck Depression
Inventory; SAD = social anxiety disorder; PDA = panic disorder with or without agoraphobia; GAD = generalized anxiety disorder; NR = not reported; HDI = Hamilton Depression Inventory; PHQ-
9 = Patient Health Questionnaire–9; TAU = treatment as usual; HSCL = Hopkins Symptom Checklist.

339
340 lim et al.
Study name Statistics for each study Sample size Year of Publication Age Country Hedges's g and 95% CI
Hedges's Chatbot
g Z-Value p-Value Psychotherapy Control
Berger et al (2011) -0.65 -2.31 0.02 25 26 2011 38.80 Europe
Berger et a; (2017) -0.56 -3.05 0.00 57 63 2017 41.95 Europe
Burton et al (2016) -0.47 -1.09 0.28 12 9 2016 38.65 Europe
Cartreine et al (2012) -1.24 -2.25 0.02 7 7 2012 50.40 US
Fitzpatrick et al (2017) -0.62 -2.30 0.02 31 25 2017 22.21 US
Meyer et al (2009) -0.64 -4.06 0.00 159 57 2009 34.76 Europe
Meyer et al (2015) -0.57 -3.22 0.00 61 73 2015 42.00 Europe
Moritz et al (2012) -0.43 -2.78 0.01 80 90 2012 38.60 Europe
Sandoval et al (2016) -0.98 -3.14 0.00 25 20 2016 28.60 US
Schroder et al (2014) -0.22 -0.82 0.41 25 32 2014 37.53 Europe
Zwerenz et al (2017) -0.44 -3.20 0.00 108 107 2017 47.98 Europe
-0.54 -8.64 0.00

-2.00 -1.00 0.00 1.00 2.00

Chatbot Psychotherapy Control

FIGURE 2 Forest plot of effect size (Hedges’s g) in depressive symptoms scores for Chatbot psychotherapy and control group.

line platforms (g = 0.88), delivering PST (g = 1.05), Discussion


having combined forms of input format (g = 0.84), Our review showed significant improvements on
less than 10 sessions (g = 0.75), and combined depressive symptoms of medium effect size.
forms of output format (g = 0.88). Results are pre- Heterogeneity was low and no publication bias
sented in Table 2. In addition, there are larger was detected. Random effect meta-regression did
effect sizes among those samples with either clini- not identify any impact of the covariates on effect
cally diagnosed anxiety (g = 0.56) or depression (g size.
= 0.55) in the United States (g = 0.83). All of the
analyses revealed statistically significant effects quality assessment
for chatbot-delivered psychotherapy. No signifi- The risk of bias domain was downgraded owing to
cant subgroup differences were detected. the presence of performance bias in the majority of
included trials. Performance bias arose from the
meta-regression lack of blinding of participants in all included tri-
We conducted a random effect meta-regression to als. Due to the lack of blinding of participants, the
evaluate the impact of covariates on the effect size placebo effect cannot be ruled out (Charlesworth
of depressive symptoms. We included the follow- et al., 2017). The imprecision domain was down-
ing covariates: year of publication, mean age of graded owing to the presence of small sample sizes
sample, intervention duration, and sample size, in the majority of included trials, which diminishes
as shown in Table 3. Our findings suggested that the probability of discovering true effects (Button
the year of publication (b = .01, p = .67), mean et al., 2013). ITT analyses were utilized in only 5
age of sample (b = .01, p = .38), intervention dura- out of 11 trials. According to the Consolidated
tion (b = .02, p = .47), and sample size (b < .001, p Standards of Reporting Trials (CONSORT) state-
= .46) had no impact on depressive symptoms. ment, all RCTs should adopt ITT analyses and
evaluate results according to participants’ original
overall evidence assessment group assignments. Failure to perform ITT analy-
The overall quality of evidence for depressive ses can lead to the disruption of prognostic balance
symptoms is rated as low for the RCTs as per among treatment allocation groups and the overes-
the GRADE criteria. The risk-of-bias factor was timation of treatment effects (Abraha et al., 2015).
downgraded due to a high risk for performance
bias. The imprecision factor was downgraded effectiveness on depressive symptoms
due to small sample sizes present in more than half Our meta-analysis found that chatbot-delivered
of the included trials. The GRADE Summary of psychotherapy significantly reduced depressive
Findings table is found in the supplementary mate- symptoms among adults with depression or anxi-
rial information. Publication bias was not detected ety. Our review included only trials that contained
for trials that reported depressive symptoms as chatbots as part of the intervention, which can be
observed by the symmetrical distribution of considered a step forward in literature. Chatbot-
included trials on the funnel plot (Egger’s test, P delivered psychotherapy is a form of guided com-
= 0.07) as presented in Supplementary material 9. puterized self-help in the absence of human inter-
Table 2
Subgroup Analyses of Chatbot-Delivered Psychotherapy for Depressive Symptoms
Design Subgroups Number of trialsref Sample size I2 (%) g Overall effect (p value) Subgroup differences (p
value)
Population Clinical diagnosis 7a-d,g,i,k 600 0 0.57 z = 6.86 (p < .001) Q = 0.27 (p = .601)
Subclinical disorder 4e,f,h,j 499 0 0.50 z = 5.28 (p < .001)
Nature of sample Anxiety only 1b 120 0 0.56 z = 3.05 (p < .001) Q = 0.43 (p = .81)
Depression with 2e,j 113 13.78 0.42 z = 2.04 (p = .04)
comorbidities
Depression only 8a,c,d,f,g,h,i,k 866 0 0.55 z = 7.82 (p < .001)

chatbot-delivered psychotherapy
Region Europe 8a,b,c,f,g,h,j,k 984 0 0.51 z = 7.65 (p < .001) Q = 2.62 (p = .11)
United States 3d,e,i 115 0 0.83 z = 4.34 (p < .001)
Embodiment With embodiment 3c,d,i 80 0 0.88 z = 3.83 (p < .001) Q = 2.37 (p = .124)
Without embodiment 8a,b,e–h,j,k 1,019 0 0.51 z = 7.90 (p < .001)
Platform Online 8a,b,e–h,j,k 1,019 0 0.51 z = 7.90 (p < .001) Q = 2.37 (p = .124)
Offline 3c,d,i 80 0 0.88 z = 3.83 (p < .001)
Psychotherapeutic CBT only 2c,e 77 0 0.58 z = 2.53 (p = .01) Q = 3.74 (p = .154)
content
PST only 2d,i 59 0 1.05 z = 3.84 (p < .001)
Mixed psychotherapy 7a,b,f–h,j,k 963 0 0.51 z = 7.57 (p < .001)
Input format Response options only 8a-c,f–h,j,k 984 0 0.51 z = 7.65 (p < .001) Q = 2.62 (p = .105)
Response options + written 3d,e,i 115 0 0.84 z = 4.34 (p < .001)
Output format Written only 8a,b,e,f–h,j,k 1,019 0 0.51 z = 7.90 (p < .001) Q = 2.37 (p = .124)
Written + spoken + gestures 3c,d,i 80 0 0.88 z = 3.83 (p < .001)
Number of sessions <10 3b,d,i 179 13.20 0.75 z = 4.23 (p < .001) Q = 1.73 (p = .42)
10 6a,f,g,h,j,k 843 0 0.50 z = 6.94 (p < .001)
Not reported 2c,e 77 0 0.58 z = 2.53 (p = .012)
Note. I2 = I2 statistic; g = Hedges’ g; Q Cochran’s = Q statistic; z = z statistics; CBT = cognitive-behavioral therapy; PST = problem-solving therapy.
a
Berger et al. (2011).
b
Berger et al. (2017).
c
Burton et al. (2016).
d
Cartreine et al. (2012).
e
Fitzpatrick et al. (2017).
f
Meyer et al. (2009).
g
Meyer et al. (2015).
h
Moritz et al. (2012).
i
Sandoval et al. (2017).
j
Schröder et al. (2014).
k
Zwerenz et al. (2017).

341
342 lim et al.
Table 3
Random Effects Meta-Regression Models of Chatbot-Delivered Psychotherapy by Various Covariates
Covariate b Standard error 95% lower 95% upper z p value
Year of publication 0.01 0.02 0.03 0.05 0.43 .67
Mean age of sample 0.01 0.01 0.01 0.03 0.87 .38
Duration of intervention 0.02 0.03 0.03 0.07 0.72 .47
Sample size <0.001 <0.001 < 0.001 <0.001 0.72 .46
Note. b = regression coefficient; z = z statistics.

vention (McTear et al., 2016b). Previous reviews improve negative cognitions (Fernández &
found comparable effects between traditional Mairal, 2017), while techniques of problem
face-to-face psychotherapy and computerized psy- appraisal and problem orientation reduce per-
chotherapy (Carlbring et al., 2018). In addition, ceived problem severity (Bell & D’Zurilla, 2009).
previous reviews have shown that guided psy- Both reduction in dysfunctional attitudes and
chotherapy is more effective than unguided psy- increased feelings of control played a mediational
chotherapy attributed to external pressure role within CBT and PST that in turn improved
fostering individual accountability and positive depressive symptoms (Warmerdam et al., 2010).
reinforcements from the source of guidance (i.e.,
therapist; Alfonsson et al., 2017). preferred features of chatbot-
Chatbots offer a solution to the shortage of delivered psychotherapy
mental health workers, as they are able to function Our subgroup analyses highlighted that embodied
autonomously while providing guided psychother- chatbots achieve more favorable effect sizes. Mul-
apy via automated script-based dialogue (Oh tiple studies (Araujo, 2018; Go & Sundar, 2019;
et al., 2017), and are suggested to form therapeutic Schroeder et al., 2013) also supported the notion
alliances with individuals similar to those formed that users preferred to interact with an embodied
with human therapists (Kiluk et al., 2014). Previ- agent instead of a text-only interface. Araujo
ous meta-analyses (Twomey et al., 2017; Wahle (2018) suggested that embodiment increases the
et al., 2017) that included chatbots as part of the perception of human likeness and aids in the sim-
intervention reported similar effect sizes as did ulation of human-to-human interaction. One pos-
our review, which demonstrates the consistency sible reason for the effectiveness of this feature is
of effect sizes when chatbots are adopted as part that the presence of embodiment increases the
of the intervention. The meta-analyses of Moman credibility of the chatbot and induces a feeling of
et al. (2019) and Bennett et al. (2019) that social partnership in the user (Mayer & DaPra,
included various technological interventions (dis- 2012).
cussion forums, online workbooks, video games) Our subgroup analyses also suggested that users
but did not involve chatbots reported lower effect preferred to interact with chatbots that utilize a
sizes. combination of input and output formats. In line
One possible reason for the lower effect sizes with the review by Montenegro et al. (2019), the
may be the superiority of chatbot technology over combination of having response options and a
other forms of technological interventions, and the written format as a form of input introduces dia-
absence of chatbot intervention was reflected in logue variability into the interaction between user
the lower effect sizes. Another possible reason and chatbot. The combination of different types of
for the lower effect sizes may be attributed to the input format avoids perceived repetitiveness and
large heterogeneity among the trial interventions. subsequent question fatigue in the users
Clinically heterogeneous trials contribute to the (Henrichsen & Allwood, 2013). In addition, the
variability of effect estimates (Kriston, 2013). use of combined types of output formats (written,
Another reason for the improvement in depressive spoken, and gestures) aids in the establishment of a
symptoms may be attributed to the incorporation level of realism, especially when new technologies
of the well-established theoretical frameworks of are involved (Montenegro et al., 2019). Having a
CBT and PST for the treatment of depression combination of types of output format allows for
and anxiety (Cuijpers et al., 2018; Newby et al., greater expressivity of the chatbot to deliver infor-
2016) into scripts programmed within the chatbot mation, where the inclusion of nonverbal behav-
software that allow for conversations that mimic ior—like facial expression, gaze, gestures, and
therapeutic discussions (D’Alfonso et al., 2017). postures—powerfully influences feelings of rap-
These scripts include techniques of cognitive port between chatbot and user (Burgoon et al.,
restructuring and behavioral activation that 2016).
chatbot-delivered psychotherapy 343
Congruent to our subgroup analyses favoring bots, a step beyond previous reviews that only
off-line platforms, the study by Heer et al. (2011) identified design features of general technology-
recognizes the security challenges faced by utilizing delivered interventions (Morrison et al., 2012;
the Internet as a platform for information exchange, Whitton et al., 2015). Fourth, publication bias
such as the leaking of confidential information or was not detected among our included trials. Last,
instances of identity theft. Off-line platforms offer our review utilized the GRADE system to assess
greater user privacy based on the virtue that it is sep- overall evidence quality across trials.
arate from the Internet, away from a global network Limitations identified include that all trials were
that anyone can participate in and access the per- carried out in Western countries and are hence
sonal information disclosed (Vitak, 2012). Future restricted to the English language, thereby restrict-
trials should seek to validate this finding given the ing the generalization of effects across all coun-
increased utilization of computer technologies tries. Next, small sample sizes detected in half of
within mental health interventions. the included trials might induce small study
Our subgroup analyses observed that PST effects. And, the low quality of overall evidence
achieved a greater intervention effect compared might reduce the internal validity of findings.
to CBT. Our findings are consistent with the Next, the small number of trials included in the
review by Zhang et al. (2018) that demonstrated meta-analysis might have caused small study
higher effect sizes of PST compared to CBT for effects (Button et al., 2013). Last, the majority of
the treatment of primary care depression and anx- the subgroups identified have an uneven covariate
iety using technology-delivered psychotherapies. distribution (Richardson et al., 2019) that could
In PST, the individual adopts a more directive role explain the absence of statistical difference within
in the therapeutic process from problem identifica- subgroups, hence the results should be interpreted
tion to solution implementation (Cuijpers et al., with caution.
2018), while the process of traditional CBT usu-
ally entails the individual following the instruc- future research
tions of the therapist (Lee et al., 2013). Unlike in First, considering the low quality of the overall evi-
face-to-face psychotherapy where human thera- dence assessed by GRADE, well-designed RCTs
pists are able to alter therapeutic content to facili- that follow the recommendations of the CON-
tate client elaboration, chatbot responses are SORT statement should be conducted in future tri-
limited to the scope of their programmed scripts, als, especially efforts to minimize performance
which restricts the adaptiveness of chatbot-to- bias by blinding participants and personnel so as
user responses (Rahman et al., 2017). Higher user to facilitate proper research implementation and
involvement in exploration and examination of reporting. Larger trials are needed to strengthen
their experiences during the therapeutic process the evidence. Next, this review also addressed the
might be a possible explanation for the higher need for large sample sizes when conducting trials,
effect size of chatbot-delivered PST. These findings especially in non-Western countries. Last, given
should be further confirmed with future high- that only limited forms of chatbot-delivered psy-
quality trials given the small number of trials chotherapy were evaluated in this review, future
included in the subgroup analyses due to the trials should explore the effectiveness of other
emerging nature surrounding the types of forms of psychotherapy, such as mindfulness-
chatbot-delivered psychotherapy. based therapy and acceptance and commitment
Although no statistically significant subgroup therapy delivered via chatbots, both of which have
differences were detected, this review intended to been delivered using other technological modali-
examine the relative effect sizes of subgroup fea- ties in previous RCTs (Lin et al., 2015; Mak
tures as a guide for future studies on the emerging et al., 2015).
topic of chatbot-delivered psychotherapy.
clinical implications
strengths and limitations Chatbot-delivered psychotherapy provides an
We identified several strengths in our review. First, alternative method for the delivery of psychother-
our review adopted a comprehensive search strat- apy to individuals (Clement et al., 2015), and
egy that included both computing and medical expands the availability of psychotherapy to indi-
databases to identify more potential trials. Second, viduals who are unable to access mental health ser-
low statistical heterogeneity was also achieved, as vices due to limitations of time or location (Ebert
evidenced by the low I2 values in the meta- et al., 2018). Chatbot-delivered psychotherapy
analysis. Third, we performed subgroup analyses also promotes collaboration between the fields of
to identify preferred features in the design of chat- health sciences and computing, improves the qual-
344 lim et al.
ity of health care services delivered by encouraging Araujo, T. (2018). Living up to the chatbot hype: The
the integration of technology into traditional influence of anthropomorphic design cues and commu-
nicative agency framing on conversational agent and
health care practices (Kuo, 2011), and allows company perceptions. Computers in Human Behavior,
health care institutions to upscale their mental 85, 183–189. https://doi.org/10.1016/j.chb.2018.03.051.
health services with ease since chatbot-serverless Beck, A. T. (1970). Cognitive therapy: Nature and relation to
systems do not require dedicated physical infras- behavior therapy. Behavior Therapy, 1(2), 184–200.
tructure to function (Yan et al., 2016). https://doi.org/10.1016/S0005-7894(70)80030-2.
Bell, A. C., & D’Zurilla, T. J. (2009). Problem-solving therapy
for depression: A meta-analysis. Clinical Psychology
conclusions Review, 29(4), 348–353. https://doi.org/10.1016/j.
Chatbot-delivered psychotherapy was found to cpr.2009.02.003.
produce significant improvements in depressive Bendig, E., Erb, B., Schulze-Thuesing, L., & Baumeister, H.
symptoms among adults with depression or anxi- (2019). The next generation: Chatbots in clinical psychol-
ogy and psychotherapy to foster mental health—A scoping
ety. Subgroup analyses found that features includ- review. Verhaltenstherapie, 1–13. https://doi.org/10.1159/
ing samples clinically diagnosed with either 000501812.
anxiety or depression, chatbots with an embodi- Bennett, S. D., Cuijpers, P., Ebert, D. D., McKenzie Smith, M.,
ment, combination of types of input and output Coughtrey, A. E., Heyman, I., Manzotti, G., & Shafran, R.
formats, less than 10 sessions, PST, off-line plat- (2019). Practitioner review: Unguided and guided self-help
interventions for common mental health disorders in
forms, and in the United States are more favorable children and adolescents: A systematic review and meta-
in the design of chatbots. Meta-regression analyses analysis. Journal of Child Psychology and Psychiatry.
did not find the impact of year of publication, https://doi.org/10.1111/jcpp.13010.
mean age, country, nature of population, embodi- *Berger, T., Hammerli, K., Gubser, N., Andersson, G., &
ment, duration, platform, psychotherapy, input Caspar, F. (2011). Internet-based treatment of depression:
A randomized controlled trial comparing guided with
formats, and output formats as covariates on unguided self-help. Cognitive Behaviour Therapy, 40(4),
depressive symptoms. Future trials should adhere 251–266. https://doi.org/10.1080/16506073.2011.616531.
to the recommendations of the CONSORT state- *Berger, T., Urech, A., Krieger, T., Stolz, T., Schulz, A.,
ment (Schulz et al., 2010), as more high-quality Vincent, A., Moser, C. T., Moritz, S., & Meyer, B. (2017).
trials are needed to fully derive the effectiveness Effects of a transdiagnostic unguided Internet intervention
(“velibra”) for anxiety disorders in primary care: Results of
of chatbot-delivered psychotherapy on depressive a randomized controlled trial. Psychological Medicine, 47
symptoms. (1), 67–80. https://doi.org/10.1017/s0033291716002270.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H.
Conflict of Interest Statement (2010). A basic introduction to fixed-effect and random-
The authors declare that there are no conflicts of interest. effects models for meta-analysis. Research Synthesis Meth-
ods, 1(2), 97–111. https://doi.org/10.1002/jrsm.12.
Supplementary data to this article can be found online at Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H.
https://doi.org/10.1016/j.beth.2021.09.007. (2011). Introduction to meta-analysis. Wiley.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H.
(2013). Comprehensive meta-analysis (Version 3). Biostat.
References Borenstein, M., & Higgins, J. (2013). Meta-analysis and
subgroups. Prevention Science, 14(2), 134–143. https://doi.
Abd-alrazaq, A. A., Alajlani, M., Alalwan, A. A., Bewick, B. org/10.1007/s11121-013-0377-7.
M., Gardner, P., & Househ, M. (2019). An overview of the Burgoon, J. K., Guerrero, L. K., & Floyd, K. (2016).
features of chatbots in mental health: A scoping review. Nonverbal communication. Routledge, 10.4324/
International Journal of Medical Informatics, 132. https:// 9781315663425.
doi.org/10.1016/j.ijmedinf.2019.103978 103978. *Burton, C., Szentagotai Tatar, A., McKinstry, B., Matheson,
Abdul-Kader, S. A., & Woods, J. (2015). Survey on chatbot C., Matu, S., Moldovan, R., Macnab, M., Farrow, E.,
design techniques in speech conversation systems. David, D., Pagliari, C., Serrano Blanco, A., & Wolters, M.
International Journal of Advanced Computer Science and (2016). Pilot randomised controlled trial of Help4Mood,
Applications, 6(7). https://doi.org/10.14569/IJACSA. an embodied virtual agent-based system to support treat-
2015.060712. ment of depression. Journal of Telemedicine and Telecare,
Abraha, I., Cherubini, A., Cozzolino, F., De Florio, R., 22(6), 348–355. https://doi.org/10.1177/
Luchetta, M. L., Rimland, J. M., Folletti, I., Marchesi, 1357633x15609793.
M., Germani, A., & Orso, M. (2015). Deviation from Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint,
intention to treat analysis in randomised trials and treat- J., Robinson, E. S., & Munafò, M. R. (2013). Power
ment effect estimates: Meta-epidemiological study. BMJ, failure: Why small sample size undermines the reliability of
350. https://doi.org/10.1136/bmj.h2445 h2445. neuroscience. Nature Reviews Neuroscience, 14(5), 365.
Alfonsson, S., Johansson, K., Uddling, J., & Hursti, T. (2017). https://doi.org/10.1038/nrn3475.
Differences in motivation and adherence to a prescribed Carlbring, P., Andersson, G., Cuijpers, P., Riper, H., &
assignment after face-to-face and online psychoeducation: Hedman-Lagerlöf, E. (2018). Internet-based vs. face-to-
An experimental study. BMC Psychology, 5(1), 3. https:// face cognitive behavior therapy for psychiatric and somatic
doi.org/10.1186/s40359-017-0172-5. disorders: An updated systematic review and meta-analysis.
chatbot-delivered psychotherapy 345
Cognitive Behaviour Therapy, 47(1), 1–18. https://doi.org/ Journal of Clinical Epidemiology, 64(4), 383–394. https://
10.1080/16506073.2017.1401115. doi.org/10.1016/j.jclinepi.2010.04.026.
*Cartreine, J. A., Locke, S. E., Buckey, J. C., Sandoval, L., & Hedges, L. V., & Olkin, I. (2014). Statistical methods for
Hegel, M. T. (2012). Electronic problem-solving treatment: meta-analysis. Academic Press.
Description and pilot study of an interactive media Heer, T., Garcia-Morchon, O., Hummen, R., Keoh, S. L.,
treatment for depression. JMIR Research Protocols, 1(2). Kumar, S. S., & Wehrle, K. (2011). Security challenges in
https://doi.org/10.2196/resprot.1925 e11. the IP-based Internet of things. Wireless Personal Commu-
Charlesworth, J. E., Petkovic, G., Kelley, J. M., Hunter, M., nications, 61(3), 527–542. https://doi.org/10.1007/s11277-
Onakpoya, I., Roberts, N., Miller, F. G., & Howick, J. 011-0385-5.
(2017). Effects of placebos without deception compared Heeren, A., Jones, P. J., & McNally, R. J. (2018). Mapping
with no treatment: A systematic review and meta-analysis. network connectivity among symptoms of social anxiety
Journal of Evidence-Based Medicine, 10(2), 97–107. and comorbid depression in people with social anxiety
https://doi.org/10.1111/jebm.12251. disorder. Journal of Affective Disorders, 228, 75–82.
Chisholm, D., Sweeny, K., Sheehan, P., Rasmussen, B., Smit, https://doi.org/10.1016/j.jad.2017.12.003.
F., Cuijpers, P., & Saxena, S. (2016). Scaling-up treatment Henrichsen, P. J., & Allwood, J. (2013). Predicting the
of depression and anxiety: A global return on investment attitude flow in dialogue based on multi-modal speech
analysis. The Lancet Psychiatry, 3(5), 415–424. https://doi. cues. NEALT proceedings: Northern European Associa-
org/10.1016/S2215-0366(16)30024-4. tion for Language and Technology [Symposium]. 4th
Clement, S., Schauman, O., Graham, T., Maggioni, F., Evans- Nordic Symposium on Multimodal Communication,
Lacko, S., Bezborodovs, N., Morgan, C., Rüsch, N., November 15–16, Gothenburg, Sweden.
Brown, J., & Thornicroft, G. (2015). What is the impact Higgins, J. P., & Green, S. (2011). Cochrane handbook for
of mental health-related stigma on help-seeking? A sys- systematic reviews of interventions (Vol. 4). Wiley.
tematic review of quantitative and qualitative studies. Hofmann, S. G., Asnaani, A., Vonk, I. J., Sawyer, A. T., &
Psychological Medicine, 45(1), 11–27. https://doi.org/ Fang, A. (2012). The efficacy of cognitive behavioral
10.1017/S0033291714000129. therapy: A review of meta-analyses. Cognitive Therapy and
Cuijpers, P., de Wit, L., Kleiboer, A., Karyotaki, E., & Research, 36(5), 427–440. https://doi.org/10.1007/s10608-
Ebert, D. D. (2018). Problem-solving therapy for adult 012-9476-1.
depression: An updated meta-analysis. European Kegel, R. H., & Wieringa, R. J. (2014). Persuasive technolo-
Psychiatry, 48(1), 27–37. https://doi.org/10.1016/j.eurpsy. gies: A systematic literature review and application to pisa
2017.11.006. [Technical Report No. TR-CTIT-14–07]. Centre for
D’Alfonso, S., Santesteban-Echarri, O., Rice, S., Wadley, G., Telematics and Information Technology, University of
Lederman, R., Miles, C., Gleeson, J., & Alvarez-Jimenez, Twente, Enschede.
M. (2017). Artificial intelligence-assisted online social Kiluk, B. D., Serafini, K., Frankforter, T., Nich, C., & Carroll,
therapy for youth mental health. Frontiers in Psychology, K. M. (2014). Only connect: The working alliance in
8, 796. https://doi.org/10.3389/fpsyg.2017.00796. computer-based cognitive behavioral therapy. Behaviour
D’Zurilla, T. J., & Goldfried, M. R. (1971). Problem solving Research and Therapy, 63, 139–146. https://doi.org/
and behavior modification. Journal of Abnormal Psychol- 10.1016/j.brat.2014.10.003.
ogy, 78(1), 107–126. https://doi.org/10.1037/h0031360. Kriston, L. (2013). Dealing with clinical heterogeneity in
Ebert, D. D., Van Daele, T., Nordgreen, T., Karekla, M., meta-analysis: Assumptions, methods, interpretation.
Compare, A., Zarbo, C., Brugnera, A., Øverland, S., International Journal of Methods in Psychiatric Research,
Trebbi, G., & Jensen, K. L. (2018). Internet- and mobile- 22(1), 1–15. https://doi.org/10.1002/mpr.1377.
based psychological interventions: Applications, efficacy, Kuo, M.-H. (2011). Opportunities and challenges of cloud
and potential for improving mental health. European computing to improve health care services. Journal of
Psychologist, 23(2), 167–187. https://doi.org/10.1027/ Medical Internet Research, 13(3). https://doi.org/10.2196/
1016-9040/a000318. jmir.1867 e67.
Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Kuyken, W., Hayes, R., Barrett, B., Byng, R., Dalgleish, T.,
Bias in meta-analysis detected by a simple, graphical test. Kessler, D., Lewis, G., Watkins, E., Brejcha, C., & Cardy,
BMJ, 315(7109), 629–634. https://doi.org/10.1136/ J. (2015). Effectiveness and cost-effectiveness of mindful-
bmj.315.7109.629. ness-based cognitive therapy compared with maintenance
Fernández, E. N., & Mairal, J. B. (2017). Behavioral activa- antidepressant treatment in the prevention of depressive
tion versus cognitive restructuring to reduce automatic relapse or recurrence (PREVENT): A randomised con-
negative thoughts in anxiety generating situations. Psi- trolled trial. The Lancet, 386(9988), 63–73. https://doi.org/
cothema, 29(2), 172–177. 10.1016/S0140-6736(14)62222-4.
*Fitzpatrick, K. K., Darcy, A., & Vierhile, M. (2017). Lee, J. A., Neimeyer, G. J., & Rice, K. G. (2013). The
Delivering cognitive behavior therapy to young adults with relationship between therapist epistemology, therapy style,
symptoms of depression and anxiety using a fully auto- working alliance, and interventions use. American Journal
mated conversational agent (Woebot): A randomized of Psychotherapy, 67(4), 323–345. https://doi.org/10.1176/
controlled trial. JMIR Mental Health, 4(2). https://doi. appi.psychotherapy.2013.67.4.323.
org/10.2196/mental.7785 e19. Lin, J., Lüking, M., Ebert, D. D., Buhrman, M., Andersson,
Go, E., & Sundar, S. S. (2019). Humanizing chatbots: The G., & Baumeister, H. (2015). Effectiveness and cost-
effects of visual, identity and conversational cues on effectiveness of a guided and unguided Internet-based
humanness perceptions. Computers in Human Behavior, acceptance and commitment therapy for chronic pain:
97, 304–316. https://doi.org/10.1016/j.chb.2019.01.020. Study protocol for a three-armed randomised controlled
Guyatt, G., Oxman, A. D., Akl, E. A., Kunz, R., Vist, G., trial. Internet Interventions, 2(1), 7–16. https://doi.org/
Brozek, J., Norris, S., Falck-Ytter, Y., Glasziou, P., & 10.1016/j.invent.2014.11.005.
Debeer, H. (2011). GRADE guidelines: 1. Introduction— Lin, J., Paganini, S., Sander, L., Lüking, M., Ebert, D. D.,
GRADE evidence profiles and summary of findings tables. Buhrman, M., Andersson, G., & Baumeister, H. (2017).
346 lim et al.
An Internet-based intervention for chronic pain: A three- Morrison, L. G., Yardley, L., Powell, J., & Michie, S. (2012).
arm randomized controlled study of the effectiveness of What design features are used in effective e-health inter-
guided and unguided acceptance and commitment therapy. ventions? A review using techniques from critical interpre-
Deutsches Ärzteblatt International, 114(41), 681. tive synthesis. Telemedicine and e-Health, 18(2), 137–144.
Lindhiem, O., Bennett, C. B., Rosen, D., & Silk, J. (2015). https://doi.org/10.1089/tmj.2011.0062.
Mobile technology boosts the effectiveness of psychother- Newby, J. M., Twomey, C., Li, S. S. Y., & Andrews, G.
apy and behavioral interventions: A meta-analysis. Behav- (2016). Transdiagnostic computerised cognitive beha-
ior Modification, 39(6), 785–804. https://doi.org/10.1177/ vioural therapy for depression and anxiety: A systematic
0145445515595198. review and meta-analysis. Journal of Affective Disorders,
Lundh, A., & Gøtzsche, P. C. (2008). Recommendations by 199, 30–41. https://doi.org/10.1016/j.jad.2016.03.018.
Cochrane review groups for assessment of the risk of bias Oh, K.-J., Lee, D., Ko, B., & Choi, H.-J. (2017). A chatbot for
in studies. BMC Medical Research Methodology, 8(1), 22. psychiatric counseling in mental healthcare service based
https://doi.org/10.1186/1471-2288-8-22. on emotional dialogue analysis and sentence generation
Mak, W. W., Chan, A. T., Cheung, E. Y., Lin, C. L., & Ngai, [Paper presentation]. 2017 18th IEEE International Con-
K. C. (2015). Enhancing web-based mindfulness training ference on Mobile Data Management (MDM).
for mental health promotion with the health action process Oinas-Kukkonen, H., & Harjumaa, M. (2009). Persuasive
approach: Randomized controlled trial. Journal of Medical systems design: Key issues, process model, and system
Internet Research, 17(1). https://doi.org/10.2196/jmir.3746 features. Communications of the Association for Informa-
e8. tion Systems, 24(1), 28. https://doi.org/10.17705/
Mayer, R. E., & DaPra, C. S. (2012). An embodiment effect in 1CAIS.02428.
computer-based learning with animated pedagogical Patel, V., Maj, M., Flisher, A. J., De Silva, M. J., Koschorke,
agents. Journal of Experimental Psychology: Applied, 18 M., Prince, M., Zonal, W., Representatives, M. S., Tem-
(3), 239. https://doi.org/10.1037/a0028616. pier, R., Riba, M., & Sanchez, M. (2010). Reducing the
McHugh, M. L. (2012). Interrater reliability: The kappa treatment gap for mental disorders: A WPA survey. World
statistic. Biochemia Medica, 22(3), 276–282. https://doi. Psychiatry, 9(3), 169–176. https://doi.org/10.1002/j.2051-
org/10.11613/BM.2012.031. 5545.2010.tb00305.x.
McTear, M., Callejas, Z., & Griol, D. (2016a). The conver- Rahman, A., Al Mamun, A., & Islam, A. (2017). Program-
sational interface (Vol. 6). Springer. ming challenges of chatbot: Current and future prospective
McTear, M., Callejas, Z., & Griol, D. (2016b). Creating a [Paper presentation]. 2017 IEEE Region 10 Humanitarian
conversational interface using chatbot technology. In The Technology Conference (R10-HTC).
conversational interface: Talking to smart devices. Springer Richardson, M., Garner, P., & Donegan, S. (2019). Interpre-
(pp. 125–159). Springer. https://doi.org/10.1007/978-3- tation of subgroup analyses in systematic reviews: A
319-32967-3. tutorial. Clinical Epidemiology and Global Health, 7(2),
*Meyer, B., Berger, T., Caspar, F., Beevers, C., Andersson, G., 192–198. https://doi.org/10.1016/j.cegh.2018.05.005.
& Weiss, M. (2009). Effectiveness of a novel integrative Rosenthal, R., Cooper, H., & Hedges, L. (1994). Parametric
online treatment for depression (Deprexis): Randomized measures of effect size. Handbook of Research Synthesis,
controlled trial. Journal of Medical Internet Research, 11 621(2), 231–244.
(2). https://doi.org/10.2196/jmir.1151 e15. *Sandoval, L. R., Buckey, J. C., Ainslie, R., Tombari, M.,
*Meyer, B., Bierbrodt, J., Schröder, J., Berger, T., Beevers, C. Stone, W., & Hegel, M. T. (2017). Randomized
G., Weiss, M., Jacob, G., Späth, C., Andersson, G., Lutz, controlled trial of a computerized interactive media-based
W., Hautzinger, M., Löwe, B., Rose, M., Hohagen, F., problem solving treatment for depression. Behavior
Caspar, F., Greiner, W., Moritz, S., & Klein, J. P. (2015). Therapy, 48(3), 413–425. https://doi.org/10.1016/j.beth.
Effects of an Internet intervention (Deprexis) on severe 2016.04.001.
depression symptoms: Randomized controlled trial. Inter- Schnyder, N., Panczak, R., Groth, N., & Schultze-Lutter, F.
net Interventions, 2(1), 48–59. https://doi.org/10.1016/j. (2017). Association between mental health-related stigma
invent.2014.12.003. and active help-seeking: Systematic review and meta-
Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). analysis. British Journal of Psychiatry, 210(4), 261–268.
Preferred reporting items for systematic reviews and meta- https://doi.org/10.1192/bjp.bp.116.189464.
analyses: The PRISMA statement. Annals of Internal Scholten, M. R., Kelders, S. M., & Van Gemert-Pijnen, J. E.
Medicine, 151(4), 264–269. https://doi.org/10.7326/0003- (2017). Self-guided web-based interventions: Scoping
4819-151-4-200908180-00135. review on user needs and the potential of embodied
Moman, R. N., Dvorkin, J., Pollard, E. M., Wanderman, R., conversational agents to address them. Journal of Medical
Murad, M. H., Warner, D. O., & Hooten, W. M. (2019). Internet Research, 19(11). https://doi.org/10.2196/
A systematic review and meta-analysis of unguided elec- jmir.7351 e383.
tronic and mobile health technologies for chronic pain—is *Schröder, J., Brückner, K., Fischer, A., Lindenau, M., Köther,
it time to start prescribing electronic health applications? U., Vettorazzi, E., & Moritz, S. (2014). Efficacy of a
Pain Medicine, 20(11), 2238–2255. https://doi.org/ psychological online intervention for depression in people
10.1093/pm/pnz164. with epilepsy: A randomized controlled trial. Epilepsia, 55
Montenegro, J. L. Z., da Costa, C. A., & da Rosa Righi, R. (12), 2069–2076. https://doi.org/10.1111/epi.12833.
(2019). Survey of conversational agents in health. Expert Schroeder, N. L., Adesope, O. O., & Gilbert, R. B. (2013).
Systems With Applications, 129, 56–67. https://doi.org/ How effective are pedagogical agents for learning? A meta-
10.1016/j.eswa.2019.03.054. analytic review. Journal of Educational Computing
*Moritz, S., Schilling, L., Hauschildt, M., Schroder, J., & Research, 49(1), 1–39. https://doi.org/10.2190/EC.49.1.a.
Treszl, A. (2012). A randomized controlled trial of Inter- Schulz, K. F., Altman, D. G., & Moher, D. (2010).
net-based therapy in depression. Behaviour Research and CONSORT 2010 statement: Updated guidelines for report-
Therapy, 50(7–8), 513–521. https://doi.org/10.1016/j. ing parallel group randomised trials. BMC Medicine, 8(1),
brat.2012.04.006. 18. https://doi.org/10.1186/1741-7015-8-18.
chatbot-delivered psychotherapy 347
Sedgwick, P. (2013). Meta-analyses: Heterogeneity and sub- and problem-solving therapy for depressive symptoms:
group analysis. BMJ, 346. https://doi.org/10.1136/bmj. Exploring mechanisms of change. Journal of Behavior
f4040 f4040. Therapy and Experimental Psychiatry, 41(1), 64–70.
Sterne, J. A., Sutton, A. J., Ioannidis, J. P., Terrin, N., Jones, https://doi.org/10.1016/j.jbtep.2009.10.003.
D. R., Lau, J., Carpenter, J., Rücker, G., Harbord, R. M., Weitz, E., Kleiboer, A., Van Straten, A., & Cuijpers, P. (2018).
& Schmid, C. H. (2011). Recommendations for examining The effects of psychotherapy for depression on
and interpreting funnel plot asymmetry in meta-analyses of anxiety symptoms: A meta-analysis. Psychological
randomised controlled trials. BMJ, 343. https://doi.org/ Medicine, 48(13), 2140–2152. https://doi.org/10.1017/
10.1136/bmj.d4002 d4002. S0033291717003622.
Tiller, J. W. (2013). Depression and anxiety. Medical Journal Whitton, A. E., Proudfoot, J., Clarke, J., Birch, M.-R., Parker,
of Australia, 199(6), S28–S31. G., Manicavasagar, V., & Hadzi-Pavlovic, D. (2015).
Twomey, C., O’Reilly, G., & Meyer, B. (2017). Effectiveness Breaking open the black box: Isolating the most potent
of an individually-tailored computerised CBT programme features of a web and mobile phone-based intervention for
(Deprexis) for depression: A meta-analysis. Psychiatry depression, anxiety, and stress. JMIR Mental Health, 2(1).
Research, 256, 371–377. https://doi.org/10.1016/j. https://doi.org/10.2196/mental.3573 e3.
psychres.2017.06.081. World Health Organization (2017). Depression and other
Vaidyam, A. N., Wisniewski, H., Halamka, J. D., Kashavan, common mental disorders: Global health estimates. Author.
M. S., & Torous, J. B. (2019). Chatbots and conversational Yan, M., Castro, P., Cheng, P., & Ishakian, V. (2016).
agents in mental health: A review of the psychiatric Building a chatbot with serverless computing [Paper
landscape. Canadian Journal of Psychiatry, 64(7), presentation]. Proceedings of the 1st International Work-
456–464. https://doi.org/10.1177/0706743719828977. shop on Mashups of Things and APIs, Trento, Italy. https://
Vitak, J. (2012). The impact of context collapse and privacy dl.acm.org/doi/10.1145/3007203.3007217.
on social network site disclosures. Journal of Broadcasting Zhang, A., Franklin, C., Jing, S., Bornheimer, L. A., Hai, A.
and Electronic Media, 56(4), 451–470. https://doi.org/ H., Himle, J. A., Kong, D., & Ji, Q. (2018). The
10.1080/08838151.2012.732140. effectiveness of four empirically supported psychotherapies
Vos, T., Allen, C., Arora, M., Barber, R. M., Bhutta, Z. A., for primary care depression and anxiety: A systematic
Brown, A., Carter, A., Casey, D. C., Charlson, F. J., & review and meta-analysis. Journal of Affective Disorders,
Chen, A. Z. (2016). Global, regional, and national 245, 1168–1186. https://doi.org/10.1016/j.jad.2018.
incidence, prevalence, and years lived with disability for 12.008.
310 diseases and injuries, 1990–2015: A systematic anal- *Zwerenz, R., Becker, J., Knickenberg, R. J., Siepmann, M.,
ysis for the Global Burden of Disease Study 2015. The Hagen, K., & Beutel, M. E. (2017). Online self-help as an
Lancet, 388(10053), 1545–1602. https://doi.org/10.1016/ add-on to inpatient psychotherapy: Efficacy of a new
S0140-6736(16)31678-6. blended treatment approach. Psychotherapy and Psycho-
Wahle, F., Bollhalder, L., Kowatsch, T., & Fleisch, E. (2017). somatics, 86(6), 341–350. https://doi.org/10.1159/
Toward the design of evidence-based mental health infor- 000481177.
mation systems for people with depression: A systematic
literature review and meta-analysis. Journal of Medical RECEIVED: April 10, 2021
Internet Research, 19(5). https://doi.org/10.2196/jmir.7381 ACCEPTED: September 21, 2021
e191. AVAILABLE ONLINE: 12 OCTOBER 2021
Warmerdam, L., van Straten, A., Jongsma, J., Twisk, J., &
Cuijpers, P. (2010). Online cognitive behavioral therapy

You might also like