Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Psychiatry and

PCN Clinical Neurosciences


REGULAR ARTICLE

Clinical validity and intrarater and test–retest reliability of the


Structured Clinical Interview for DSM-5 – Clinician Version
(SCID-5-CV)
Flávia L. Osório, PhD ,1,2* Sonia Regina Loureiro, PhD,1,2 Jaime Eduardo C. Hallak, MD, PhD,1,2
Jo~ao Paulo Machado-de-Sousa, PhD,1,2 Juliana M. Ushirohira, MD,1,3 Cristiane V. W. Baes, MD, PhD,3
Thiago D. Apolinario, MD,3 Mariana F. Donadon, MSc,1 Livia M. Bolsoni, MSc,1 Thiago Guimar~aes, MD,1
Victor S. Fracon, MD,3 Ana Paula Casagrande Silva-Rodrigues, MSc,1,3 Fernanda Aguiar Pizeta, PhD,1
Roberto Mascarenhas Souza, MD,1 Rafael Faria Sanches, PhD,3 Rafael G. dos Santos, PhD,1,2
Rocio Martin-Santos, MD, PhD2,4,5 and José Alexandre S. Crippa, MD, PhD1,2

Aim: The Structured Clinical Interview for the DSM is one Results: The percentage of positive agreement between the
of the most used diagnostic instruments in clinical interview and clinical diagnoses ranged between 73% and
research worldwide. The current Clinician Version of the 97% and the diagnostic sensitivity/specificity were >0.70. In
instrument (SCID-5-CV) has not yet been assessed in the joint interview, the levels of positive agreement were high
respect to its psychometric qualities. We aimed to assess (>75%) and kappa levels were >0.70 for most diagnoses.
the clinical validity and different reliability indicators (inter- The values were less expressive, but still adequate, for inter-
rater test–retest, joint interview, face-to-face vs telephone rater test–retest interviews.
application) of the SCID-5-CV in a large sample of
Conclusion: The SCID-5-CV presented excellent reliability
180 non-prototypical and psychiatric patients based on
and high specificity as assessed with different methods. The
interviews conducted by raters with different levels of clini-
clinical validity of the instrument was also confirmed, which
cal experience.
supports its use in daily clinical practice. We highlight the ade-
Methods: The SCID-5-CV was administered face-to-face quacy of the instrument to be used via telephone and the need
and by telephone by 12 psychiatrists/psychologists who for careful use by professionals with little experience in psychi-
took turns as raters and observers. Clinical diagnoses atric clinical practice.
were established according to DSM-5 criteria and the lon-
gitudinal, expert, all data (LEAD) procedure. We calculated Keywords: clinical validity, joint interview, reliability, Structured Clinical
the percentage of agreement, diagnostic sensitivity and Interview for the DSM-5 – Clinician Version, test–retest.
specificity, and the level of agreement (kappa) for diagnos-
http://onlinelibrary.wiley.com/doi/10.1111/pcn.12931/full
tic categories and specific diagnoses.

In order to enhance the reliability of psychiatric diagnosis, the Ameri- attention, including: the use of a single strategy/method to assess reli-
can Psychiatric Association released the Structured Clinical Interview ability (test–retest or joint interview); the paucity of comparisons
for DSM-III-R1 in 1990 and it has accompanied the evolution of the between clinical diagnoses made with the SCID, other diagnostic
DSM through several revisions and expansions.2 Since then, the use instruments, and/or clinical diagnoses based on DSM criteria; the
of the interview has spread widely and it has become the main instru- enrollment of small samples and prototypical patients; and the
ment to select and describe research samples all over the world.2–4 involvement of raters highly trained in the use of the SCID. To our
The latest version of the instrument is called ‘SCID-5’ and it has five knowledge, no studies to date have assessed the psychometric proper-
different versions, the main of which is the SCID-5 – Clinician Ver- ties of the SCID-5.2 Therefore, our aim was to assess the clinical
sion (SCID-5-CV), a shorter and reformatted version of the SCID- validity and different reliability indicators (interrater test–retest, joint
5-Research Version that covers the most common diagnoses seen in interview, face-to-face vs telephone interview) of the SCID-5-CV in a
clinical settings.5 large sample of non-prototypical psychiatric patients based on assess-
A number of psychometric studies have attested the adequacy of ments by raters with different levels of clinical experience and famil-
previous versions of the SCID in terms of reliability6–10 and valid- iarity with the SCID. Our hypothesis was that the psychometric
ity.11,12 However, these studies have critical points that deserve indicators of this version of the instrument would be in accordance

1
Medical School of Ribeira ~o Preto, Sa~o Paulo University, Ribeira
~o Preto, Brazil
2
National Institute For Science and Technology (INCT-TM, CNPq), Brasília, Brazil
3
Clinical Hospital of Ribeira~o Preto, Sa~o Paulo University, Ribeira
~o Preto, Brazil
4
Hospital Clínic, Instituto de Investigaciones Biomedicas August Pi i Sunyer, Centro de Investigacion Biomedica en Red de Salud Mental, Barcelona, Spain
5
University of Barcelona, Barcelona, Spain
* Correspondence: Email: flaliosorio@gmail.com

754 © 2019 The Authors


Psychiatry and Clinical Neurosciences © 2019 Japanese Society of Psychiatry and Neurology

Psychiatry and Clinical Neurosciences 73: 754–760, 2019


14401819, 2019, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/pcn.12931 by Cochrane Romania, Wiley Online Library on [17/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Psychiatry and
PCN Clinical Neurosciences Clinical validity and reliability: SCID-5-CV

with the quality parameters of the health status instruments proposed School University Hospital (process number: HCRP 17241/2015) and
by Terwee et al.13 (rates of clinical/criterion validity and reliability/ written informed consent was obtained from all participants.
agreement: >0.70).
Procedures
Methods First, the main researcher (F. L. O.) selected eligible participants by
Study design convenience through the analysis of medical records focused on cur-
This was an observational study with psychometric characteristics. rent and past psychiatric diagnoses (primary and secondary). We tried
Four different investigations were made to assess the validity and reli- to cover all the diagnoses assessed by the SCID-5-CV as equitably as
ability of the SCID-5-CV: (i) clinical validity (clinical diagnosis of possible.
DSM-5 vs SCID diagnoses); (ii) joint interview (intrarater reliability); Participants were then invited to take part in the study and com-
(iii) short-interval interrater test–retest reliability; and (iv) face-to-face pleted at least one of the interview sessions described below, carried
versus telephone interview. out randomly by the different raters, who were always blind in respect
to the participantsʼ diagnoses. The raters took turns in the different
Instrument roles and pair formations.
We used a Brazilian version of the SCID-5-CV.5 The transcultural
adaptation of the instrument was carried out by members of our Joint interview
research group (F. L. O., J. A. S. C., J. P. M. S.) in a partnership with All participants were assessed face-to-face by one of the raters, called
the Brazilian publisher (Artmed) and in accordance with methodolog- ‘Rater 1’ (RT1), who also scored the instrument and established the
ical criteria proposed in the literature14 and the guidelines and autho- diagnoses based on the standard procedures for the SCID-5-CV. This
rization of the American Psychiatric Association. interview was observed by another rater, called ‘Observer 1’ (OB1),
who scored the interview simultaneously without access to the scores
Participants of RT1. After the end of the interview, OB1 could resume or deepen
The study involved a convenience sample of 180 participants specific questions and/or modules not assessed by RT1 if he or she
(124 psychiatric patients followed up at a high-complexity general found it necessary and when there was disagreement between raters.
university hospital and at outpatient psychiatric services of medium Once RT1 finished his or her interview, the diagnosis established
and low complexity; 29 psychiatric inpatients, and seven psychiatric could not be changed. This procedure was used to assess interrater
consultation liaison inpatients from the same university hospital, in reliability (RT1 vs OB1).
addition to 20 subjects with no history of psychiatric and/or psycho-
logical treatment). Short-interval test–retest interrater interview
Around 70% of the sample (n = 139) was randomly selected to take
Clinical diagnosis part in the short-interval test–retest interrater interview, which
The criteria of the DSM-5 and the longitudinal, expert, all data (LEAD) occurred between 10 and 30 days after the previous interview. This
procedure were used as the gold standard for clinical diagnoses (longi- interview was conducted by a second rater (RT2) who was blind in
tudinal evaluation performed by an expert using all data available),15 in respect to the patientsʼ original diagnoses and to the diagnoses made
line with previous studies on the validity of the SCID.12 Diagnoses by RT1 and OB1. The interview was made either face-to-face
were established by the professionals in charge of the clinical follow- (n = 53) or by telephone (n = 86) after participants had been ran-
up of participants and were registered in their medical records. domly selected by the main investigator. RT2 conducted the interview
according to the standard procedure and established a diagnosis at the
Raters end. This procedure was used to assess the interrater test–retest reli-
Twelve raters took part in the study, including seven psychiatrists and ability (RT1 vs RT2) and the face-to-face versus telephone reliability
five clinical psychologists with a mean time of 10 years since gradua- of the SCID-5-CV (RT1 vs RT2 face-to-face interviews and RT1 ver-
tion (SD = 4.3 years; range: 5–21 years) and 5.17 years of clinical sus RT2 telephone interviews).
experience (SD = 4.4 years; range: 1–18 years), whether in private To assess item B22 (related to catatonic behavior) during the
practice only (30%) or in private practice combined with experience telephone interview, raters were instructed to directly ask the partici-
in psychiatric institutions of education, research, and care (70%). All pants about the occurrence of the symptoms assessed according to the
raters had specialization and/or residency courses in the area of psy- definitions presented in the second column of the SCID-5-CV. To
chiatry/mental health and previous research experience. Raters classi- assess the reliability between the clinical DSM-5 versus SCID diag-
fied their prior experience in terms of clinical expertise and noses, the diagnoses established by RT1 were compared to the clini-
knowledge about psychopathology using a self-report scale with items cal diagnoses of each participant (primary and/or secondary).
ranging from 0 (no experience) to 10 (vast experience), reaching The interviews were performed in a laboratory environment and
means of 7.6 (SD = 0.9) and 7.9 (SD = 1.0), respectively. In respect had an average duration of 60–90 min. Each rater completed around
to the SCID, 70% of the raters had used the instrument before, 15 interviews as RT1, 15 as OB1, and 11 as RT2.
although with different levels of experience (much experience: 34%;
moderate experience: 42%; and little/no experience: 24%). Data analysis
All raters completed a training period of around 20 h supervised Data were analyzed using SPSS (IBM, Armonk, NY, USA). The
by the first author (F. L. O.) that involved the reading and discussion sociodemographic characteristics of the sample were analyzed in
of critical points/doubts in the SCID-5 userʼs guide,16 live observa- terms of simple frequency, means, standard deviation, and percentage.
tion of the interview, role-playing exercises, and rating/discussion of Each of the specific diagnostic categories assessed by the SCID-5-CV
five video-recorded interviews conducted by an experienced inter- was classified as 0 (absent) or 1 (present).
viewer (F. L. O.). We calculated the percentage of positive agreement (total num-
ber of participants whose diagnosis of interest was considered present
Ethical aspects by both raters/observers divided by the total number of participants
All procedures that are part of this work complied with the ethical with the diagnosis of interest considered present by RT1) and nega-
standards of the relevant national and institutional committees on tive agreement (total number of participants whose diagnosis of inter-
human experimentation and with the Declaration of Helsinki of 1975 est was considered absent by both raters/observers divided by the
and its revisions. All procedures involving human subjects were total number of participants with the diagnosis of interest considered
approved by the ethics committee of the Ribeir~ao Preto Medical absent by RT1). The kappa coefficient was used to estimate the rates
Psychiatry and Clinical Neurosciences 73: 754–760, 2019 755
14401819, 2019, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/pcn.12931 by Cochrane Romania, Wiley Online Library on [17/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Psychiatry and
Clinical validity and reliability: SCID-5-CV PCN Clinical Neurosciences

of reliability/agreement and was calculated for major diagnostic cate- affective bipolar II, schizoaffective, delusional, and adjustment disor-
gories (e.g., mood disorders) and specific diagnoses (e.g., recurrent ders. For two-thirds of the specific diagnoses, the percentage of posi-
major depressive disorder) using the diagnosis established by RT1 as tive agreement was above 68% (median = 79.5%). The percentage of
a reference. Considering that the kappa coefficient may be unstable negative agreement was quite a bit higher, both for the diagnostic cat-
when disorders have low base rates, it was used only when a mini- egories and specific diagnoses, with the lowest value of 96% (any
mum of five subjects per category was reached, according to proce- anxiety disorder and recurrent major depressive disorder).
dures adopted elsewhere.10,17,18 The sensitivity and specificity for each diagnosis were also calcu-
As the clinical background training of the raters could interfere lated. Except for persistent depressive disorders and schizophreniform
with the recognition and conceptualization of psychopathology in the disorder, sensitivity values remained above 0.70. No specificity value
performance of clinical diagnosis evaluations, we further analyzed the was below 0.80. In respect to kappa coefficients, excellent values
reliability coefficients of the raters with low clinical experience as (>0.75) were found for all diagnoses except for persistent depressive
compared with experienced raters. disorder, past major depressive disorder, schizophreniform disorder,
and agoraphobia, the values of which were unsatisfactory (0.45–0.75;
Table 2).
Results
The study sample consisted of 180 participants, 54.4% (n = 98) of
whom were female. The mean age of the sample was 38.3 years
(SD = 14.3 years; range: 16–84 years) and mean education was of
13.7 years (SD = 4.8 raters; range: 0–23 years). Around half (52.8%)
of the participants were professionally active. Table 1 presents the Table 2. Clinical validity indicators of the SCID-5-CV
clinical diagnoses of the sample (non-exclusive categories) according
Diagnosis: DSM-5/ LEAD SCID-5-CV versus clinical
to data extracted from medical reports.
procedure diagnosis (N = 180)

Clinical validity Bipolar I disorder 20/95/99/0.94/0.98/0.88


Taking the diagnosis made by RT1 in the face-to-face interview as a Bipolar II disorder 09/100/99/0.90/1.00/0.94
reference (Table 2), the percentage of positive agreement with the Any bipolar disorder 36/89/100/1.00/0.97/0.93
clinical diagnosis ranged from 73% (any anxiety disorder) to 97% Persistent depressive disorder 12/50/98/0.60/0.96/0.52
(any psychotic disorder) for the major diagnostic categories. For spe- Current major depressive 09/78/100/1.00/0.99/0.87
cific diagnoses, the lowest percentage of positive agreement was disorder
found for agoraphobia (25%) and the highest (100%) was found for Past major depressive disorder 22/64/97/0.78/0.95/0.66
Recurrent major depressive 34/76/96/0.74/0.95/0.76
disorder
Any major depressive disorder 72/75/98/0.96/0.85/0.76
Table 1. Baseline clinical diagnoses of the sample (N = 180)
Any mood disorder 108/84/94/0.96/0.80/0.76
Diagnosis: DSM-5/ LEAD procedure N (%) Schizophrenia 16/81/99/0.87/0.98/0.83
Schizophreniform disorder 02/50/99/0.50/0.99/—
Bipolar I disorder 18 (10.0)
Schizoaffective disorder 07/100/98/0.70/1.00/—
Bipolar II disorder 10 (5.6)
Delusional disorder 05/100/99/0.83/1.00/—
Substance-induced bipolar I/II disorder 4 (2.2)
Brief psychotic disorder 02/100/100/1.00/1.00/—
Current major depressive disorder 7 (3.9)
Substance-induced psychotic 04/75/100/1.00/0.99/—
Past major depressive disorder 18 (10.0)
disorder
Recurrent major depressive disorder 31 (17.2)
Any psychotic disorder 36/97/99/0.92/0.99/0.93
Other specified depressive disorder 1 (0.6)
Alcohol use disorder 18/83/99/0.88/0.98/0.84
Persistent depressive disorder 10 (5.6)
Non-alcohol substance use 20/90/100/1.00/0.99/0.94
Any mood disorder 95 (52.8)
disorder
Schizophrenia 15 (8.3)
Any substance use disorder 30/90/99/0.90/0.99/0.92
Schizophreniform disorder 2 (1.1)
Panic disorder (current/past) 21/62/99/0.88/0.96/0.73
Schizoaffective disorder 10 (5.6)
Agoraphobia 12/25/99/0.75/0.95/0.35
Delusional disorder 6 (3.3)
Social anxiety disorder 17/65/99/0.85/0.96/0.71
Brief psychotic disorder 2 (1.1)
Generalized anxiety disorder 22/59/97/0.72/0.94/0.61
Substance-induced psychotic disorder 3 (1.7)
Any anxiety disorder 47/73/96/0.72/0.96/0.73
Any psychotic disorder 38 (21.1)
Obsessive–compulsive 12/83/99/0.83/0.99/0.82
Alcohol use disorder 17 (9.4)
disorder
Non-alcohol substance use disorder 18 (10.0)
Post-traumatic stress 19/68/100/1.00/0.96/0.80
Any substance use disorder 28 (15.6)
disorder (current/past)
Panic disorder (current/past) 16 (8.9)
Attention deficit 11/91/100/1.00/0.99/0.95
Agoraphobia 4 (2.2)
hyperactivity disorder
Social anxiety disorder 13 (7.2)
Adjustment disorder 01/100/100/1.00/1.00/—
Generalized anxiety disorder 18 (10.0)
Any anxiety disorder 39 (21.7) Data are presented as: base rate/percentage of positive agreement/
Obsessive–compulsive disorder 12 (6.7) percentage of negative agreement/sensitivity/specificity/kappa
Post-traumatic stress disorder (current/past) 13 (7.2) coefficient.
Attention deficit hyperactivity disorder 10 (5.6) SCID-5-CV, Structured Clinical Interview for DSM-5 – Clinician
Adjustment disorder 1 (0.6) Version.

756 Psychiatry and Clinical Neurosciences 73: 754–760, 2019


14401819, 2019, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/pcn.12931 by Cochrane Romania, Wiley Online Library on [17/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Psychiatry and
PCN Clinical Neurosciences Clinical validity and reliability: SCID-5-CV

Table 3. Reliability indicators of the SCID-5-CV for the different diagnostic categories of the DSM-5

Joint
Diagnosis: DSM-5/ LEAD Interview Test–retest interrater Test–retest interrater interview: Test–retest interrater
procedure (N = 180) interview (N = 139) Face-to-face (N = 53) interview: Telephone (N = 86)
Bipolar I disorder 20/100/100/1.00 14/100/99/0.96 05/100/98/0.90 09/100/100/1.00
Bipolar II disorder 09/ 89/100/ 07/100/100/1.00 03/100/100/1.00 04/100/100/1.00
0.94
Any bipolar disorder 36/100/100/1.00 25/100/99/0.98 09/100/98/0.94 16/100/100/1.00
Persistent depressive 12/75/98/0.70 11/45/97/0.46 04/25/100/0.38 07/57/95/0.49
disorder
Current major depressive 09/78/99/0.82 08/62/98/0.65 02/50/98/0.48 06/67/97/0.71
disorder
Past major depressive 22/91/98/0.87 18/67/97/0.69 05/100/98/0.90 13/54/97/0.59
disorder
Recurrent major depressive 34/100/100/1.00 27/92/99/0.93 05/100/100/1.00 22/91/98/0.91
disorder
Any major depressive 72/98/99/0.98 60/82/96/0.79 15/80/94/0.76 45/82/97/0.79
disorder
Any mood disorder 108/99/99/0.98 85/88/94/0.81 24/92/93/0.85 61/87/96/0.77
Schizophrenia 16/100/99/0.97 14/93/99/0.92 09/89/100/0.93 05/100/99/0.90
Schizophreniform 02/50/100/— 02/50/100/— 02/50/100/— 0
disorder
Schizoaffective disorder 07/100/100/1.00 04/100/100/— 03/100/100/— 02/100/100/—
Delusional disorder 05/80/99/0.79 04/100/99/— 02/100/99/— 02/100/100/—
Brief psychotic disorder 02/100/100/— 02/100/99/— 02/100/99/— 0
Substance-induced 04/100/100/— 04/75/100/— 04/75/100/— 0
psychotic disorder
Any psychotic disorder 36/100/100/1.00 30/97/99/0.96 22/95/100/0.96 09/100/99/0.94
Alcohol use disorder 18/89/99/0.91 12/75/93/0.55 05/100/94/0.70 07/62/92/0.47
Non-alcohol substance 20/95/100/0.97 14/93/99/0.92 07/100/98/0.92 07/85/100/0.92
use disorder
Any substance use disorder 30/97/100/0.98 20/90/92/0.72 09/100/93/0.82 11/82/92/0.64
Panic disorder (current/ 21/100/99/0.97 14/50/96/0.52 05/67/98/0.65 09/45/96/0.47
past)
Agoraphobia 12/75/100/0.85 10/22/98/0.28 05/34/98/0.37 05/17/99/0.22
Social anxiety disorder 17/100/99/0.97 12/58/99/0.68 05/75/98/0.73 07/50/100/0.65
Generalized anxiety 22/90/99/0.92 18/61/97/0.65 05/80/98/0.78 13/54/97/0.59
disorder
Any anxiety disorder 47/97/100/0.97 33/73/95/0.71 11/73/95/0.70 22/73/95/0.71
Obsessive–compulsive 12/92/100/0.95 11/90/99/0.89 05/100/98/0.85 06/86/100/0.92
disorder
Post-traumatic stress 19/100/99/0.97 13/92/98/0.84 01/100/98/— 12/92/97/0.86
disorder (current/past)
Attention deficit 11/100/100/1.00 10/80/99/0.83 01/100/98/— 09/78/100/0.86
hyperactivity disorder
Adjustment disorder 01/100/100/— 01/100/100/— 0 01/100/100/—

Data are presented as: base rate/percentage of positive agreement/percentage of negative agreement/kappa coefficient.
SCID-5-CV, Structured Clinical Interview for DSM-5 – Clinician Version.

Reliability were excellent (>92%), regardless of diagnosis. The rates of positive agree-
In the joint interview, the levels of positive agreement were high ment were lower for some mood disorders (persistent depressive disorder,
(>75%) for all diagnoses except schizophreniform disorder (50%). current and past major depressive disorder) and for the specific diagnoses of
The rates of negative agreement were above 98% and kappa coeffi- different anxiety disorders (panic, social anxiety, and generalized anxiety
cient values were excellent, that is, above 0.75 (with the exception of disorder). In respect to the kappa coefficients for diagnostic categories, the
persistent depressive disorder, for which kappa was 0.70; Table 3). values found were either excellent (any bipolar disorder, major depressive
In respect to the interrater reliability of the test–retest interview, values disorder, mood disorder, and psychotic disorder) or satisfactory (any anxiety
were less expressive, but still adequate. The rates of negative agreement disorder: >0.70; any alcohol or non-alcohol substance use disorder: 0.64).
Psychiatry and Clinical Neurosciences 73: 754–760, 2019 757
14401819, 2019, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/pcn.12931 by Cochrane Romania, Wiley Online Library on [17/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Psychiatry and
Clinical validity and reliability: SCID-5-CV PCN Clinical Neurosciences

For specific diagnoses, excellent kappa values were found for quality, with distinct variance patterns (e.g., rater variance in the joint
schizophrenia and bipolar affective disorder I and II, recurrent major interview and information variance in a test–retest method).
depressive disorder, non-alcohol substance use disorder, obsessive– According to Flemenbaum and Zimmermann20 and Grove et al.,21 no
compulsive disorder, post-traumatic stress disorder, and attention defi- methodology is better a priori and the combined use of different
cit hyperactivity disorder. Satisfactory levels were found for persistent methodologies is desirable in order to determine which types of
depressive disorder, current and past major disorder, alcohol use dis- changes in assessment procedures are more efficient for the reduction
order, panic disorder, social anxiety disorder, and generalized anxiety of errors. However, this was not the case in most of the previous
disorder, regardless of the form of application of the interview (face- investigations with the SCID, which prioritized the use of the joint
to-face or telephone). It is important to mention that kappa values interview and usually with the assessment of video-recorded inter-
tended to be lower in the phone interviews, although remaining within views. This method tends to generate higher reliability values due to
the same category of classification (except for persistent depressive the absence of observation biases.10
disorder, the coefficient for which was higher in the phone interview). In our study, the recommendation of using combined methods
Kappa values were unsatisfactory (0.22–0.37) only for agoraphobia, was followed and the reliability values found through the different
regardless of the form of application of the interview. methodologies were adequate for most diagnoses. Indicators found in
the joint interview were always better than those found with the test–
retest method, in line with the findings of Zanarini et al.22 and Sharifi
Discussion et al.,7 which does not disqualify the adequacy of the latter method,
This was the first investigation on the psychometric qualities of the regarded by many as having more resemblance to the clinical setting.2
SCID-5-CV. The study included different methodologies to assess the Here, we have also assessed the reliability between face-to-face
reliability and clinical validity of the instrument in a sample of non- and telephone interviews, given the practicality and lower costs and
prototypical patients and involved raters with different training back- time associated with the latter form of administration. The agreement
grounds and clinical experience. The results showed that the interview indicators found were within acceptable levels, although somewhat
has adequate psychometric properties, both in respect to major diag- inferior to those obtained from face-to-face interviews. These findings
nostic categories and to specific diagnoses. contrast with those described by Cacciola et al.,23 but are in line with
The clinical accuracy of diagnostic instruments is a matter of a previous study from our group that dealt specifically with the diag-
debate, especially because this issue is rarely investigated, which is in nosis of social anxiety24 and supported the use of the SCID for
part associated with the difficulty of establishing a gold standard for research purposes and clinical screening.
comparison.11 In a previous investigation, Steiner et al.11 found unsat- In respect to diagnostic categories, we found higher agreement
isfactory agreement between diagnoses made with the Structured rates for major categories relative to specific diagnoses, as expected,
Clinical Interview for DSM-IV-Patient Edition (SCID-I/P) for DSM- indicating that raters tend to agree about whether a subject presents or
III-R and those made by an experienced clinician for almost all diag- not the general features of a given disorder, but disagree either in
noses (kappa < 0.55). According to the authors, this could be attribut- respect to the precise number of diagnostic criteria fulfilled or to the
able to limitations of the instrument itself (such as the diagnostic form of symptom presentation.
algorithm used and the restricted access to relevant information by In what concerns specific diagnoses, the kappa values found in
the clinician), to diagnostic biases (related to cultural issues), and to our study for bipolar and major depressive disorders were higher than
the training, expertise, and skills of the rater in eliciting, understand- those reported in many previous studies.3,6,7,9,10,22,23,25 However,
ing, and rating clinical material. when the time factor (current, past, or recurrent disorder) was
Studies with the SCID-I/P for DSM-IV that had a better control assessed separately, we found lower reliability for the diagnosis of
over variables associated with clinical diagnosis, such as the use of a past major depressive disorder with the test–retest method, which sug-
‘diagnostic checklist based on DSM-IV criteria’7 or the LEAD gests the possibility of increased information bias as the assessment
procedure,12,19 found better kappa, sensitivity, and specificity values. of this condition relies extensively on memory. Also, long-term cogni-
This improvement in the levels of reliability of the interview could tive alterations seem to occur in depressive disorders,26 which may
also be the result of a better operationalization and reliability of diag- contribute to this type of bias.
nostic criteria in the fourth version of the DSM.6 For persistent depressive disorder, the kappa coefficient for the
In our study, clinical diagnoses were based on the LEAD test–retest method (0.46) was intermediate among those described in
procedure,15 used as a gold standard of validity, and referred to the literature. The specific analysis of this diagnostic category showed
patients with different symptoms and severity levels coming from dif- that 80% of disagreements occurred when raters had different back-
ferent treatment units. We took care not to assign patients to raters grounds (psychiatrists vs psychologists).
coming from the same treatment unit, as this tends to favor reliability As for psychotic disorders, despite the effort to include an expres-
in a misleading way.6 sive number of cases with the different clinical conditions within this
In general, the kappa values found in our study were excellent, spectrum, we were unable to make separate assessments of the diagno-
which has seldom happened in studies on the SCID-I/P for the DSM- ses of schizophreniform, brief psychotic, and substance-induced psy-
IV. For instance, Sharifi et al.7 and Torrens et al.12 reported no kappa chotic disorders. It was also not possible to make separate test–retest
values above 0.70, with most around 0.50. It should be noted that reliability analyses for schizoaffective and delusional disorders, which
Torrens et al.12 compared the performance of the SCID-I/P-DSM-IV can be noted as a limitation of this study. In general, however, the reli-
with the Psychiatric Research Interview for Substance and Mental ability indicators of the diagnoses assessed were excellent, consonant
Disorders (PRISM), concluding for the superiority of the latter in the with previous evidence.3,6,7,9 It should be noted that the telephone
assessment of comorbid conditions in patients with substance abuse interview had highly satisfactory indicators (kappa = 0.94), regardless
disorders in the Spanish context. However, if we compare our results of the fact that the interview contains items that require observation
to those obtained with the PRISM in that study, this conclusion would (for instance, catatonic behavior and reduced emotional expression).
not remain as the kappa values found here were similar or higher. In respect to non-alcohol substance abuse disorders, the reliability
The diagnostic sensitivity and specificity of the SCID-5-CV were values found with the different methods were adequate and consonant
also above 0.70 and 0.80, respectively, which is in line with previous with those reported for previous versions of the SCID,3,6,7,9,10,25,27
findings by Sharifi et al.7 assessing the SCID-I/P-DSM-IV. As in the which was not true for alcohol use disorders in what concerns the reli-
case of those authors, our results support the potential of the instru- ability of face-to-face and telephone interviews (kappa = 0.47). For this
ment to minimize rates of false positive and false negative diagnoses. diagnostic category, discrepancies were again found in connection with
In respect to reliability, previous studies have described the influ- the ratersʼ backgrounds, with psychologists tending to overestimate
ence of methodological issues in the assessment of this psychometric diagnoses in this category in the telephone interview.
758 Psychiatry and Clinical Neurosciences 73: 754–760, 2019
14401819, 2019, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/pcn.12931 by Cochrane Romania, Wiley Online Library on [17/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Psychiatry and
PCN Clinical Neurosciences Clinical validity and reliability: SCID-5-CV

The values found for anxiety disorders were relatively close to experience in clinical psychiatry, which seems to be fundamental in the
those described in the literature, with a slight advantage for indicators rating of psychopathology. For these professionals, extensive training in
related to generalized anxiety and social anxiety disorder in face-to-face the use of the SCID and even the expertise accumulated in its use do
interviews. In the telephone interviews, these indicators suffered a small not seem sufficient for the establishment of correct diagnoses.
reduction. Previous data for comparison exist only in the context of
social anxiety: the kappa coefficient found in the study by Cacciola Acknowledgments
et al.23 with the SCID-I/P-DSM-IV was inadequate (0.29), whereas in The authors wish to thank Elder M. Fagundes and Solange Cisneiros
another study from our group with the same version of the interview, for help with typing the data and André Moreno for assistance in the
the value found was in the excellent range (0.84).24 The intermediate data-collection stage. This study was supported financially by the S~ao
indicator found in the present study may be explained in relation to that Paulo Research Foundation (FAPESP, S~ao Paulo, Brazil; process
reported by Crippa et al.24 by the number of subjects with this diagnosis no. 17/18000–8) and the Brazilian National Council for Scientific and
in the sample assessed here, and also by the fact that this was not a spe- Technological Development (CNPq, Brasília, Brazil; process
cific study to assess the reliability of a single diagnostic category, which 301321/2016–7).
seems to have a positive influence on such indicators.
In regard to agoraphobia, agreement levels were excellent in the Disclosure statement
joint interview, overcoming the previous disappointing findings in the The supporters had no role in the design, analysis, interpretation, or
test–retest assessment. The low test–retest reliability for this diagnosis publication of this study. The authors declare that they have no con-
has been described previously in a study with the SCID-I/P-IV9 and flicts of interest.
may be justified by the low prevalence of agoraphobia and its low
validity as a diagnostic category. In this study, however, poor clinical
Author contributions
experience and rater background also seem to have favored discrepan-
Conception and design of the study: F.L.O., S.R.L., J.E.C.H., and
cies, as 71% of the disagreement occurred between psychologists and
J.A.S.C. Acquisition and analysis of data: F.L.O., J.P.M.S., J.M.U.,
psychiatrists, with the former underestimating agoraphobia diagnoses.
C.V.W.B., T.D.A., M.F.D., L.M.B., T.G., V.S.F., A.P.C.S.R., F.A.P.,
Finally, excellent psychometric indicators were found for the
R.M.S., R.F.S., R.G.S., and R.M.S. Drafting the manuscript or fig-
diagnoses of obsessive–compulsive disorder and post-traumatic stress
ures: F.L.O., J.P.M.S., J.A.S.C., S.R.L., and J.E.C.H.
disorder (kappa > 0.83), regardless of the method used. For attention
deficit hyperactivity disorder, which was not assessed in previous ver-
References
sions of the SCID, indicators were also appropriate (>0.83). Unfortu- 1. Spitzer RL, Williams JB, Gibbon M, First MB. The Structured Clinical
nately, indicators of reliability for adjustment disorder could not be Interview for DSM-III-R (SCID). I: History, rationale, and description.
assessed because of the presence of just a single case in the sample. Arch. Gen. Psychiatry 1992; 49: 624–629.
Differently than expected, the analyses concerning specific diag- 2. Segal DL, Hersen M, Van Hasselt VB. Reliability of the Structured Clin-
nostic categories showed an influence of ratersʼ training backgrounds ical Interview for DSM-III–R: An evaluative review. Compr. Psychiatry
and clinical experience on the use of the SCID. Such differences may 1994; 35: 316–327.
lead to the underestimation of diagnoses, especially less common 3. Williams JB, Gibbon M, First MB et al. The Structured Clinical Inter-
ones, such as persistent depressive disorder and agoraphobia, and to view for DSM-III–R (SCID): II. Multisite test-retest reliability. Arch.
the overestimation of others, such as alcohol abuse, for example, pos- Gen. Psychiatry 1992; 49: 630–636.
4. Weertman A, Arntz A, Dreessen L, van Velzen C, Vertommen S. Short-
sibly because of the difficulty in weighing the relevance of symptoms interval test-retest interrater reliability of the Dutch version of the Struc-
reported in terms of their intensity and impact. tured Clinical Interview for DSM-IV personality disorders (SCID-II).
In respect to the psychometric indicators of the SCID-5-CV, we J. Pers. Disord. 2003; 17: 562–567.
should also mention the possible positive impact of changes made in 5. First MB, Williams JBW, Karg RS, Spitzer RL. Structured Clinical
the DSM-5, especially in the sections of schizophrenia spectrum dis- Interview for DSM-5 Disorders, Clinician Version (SCID-5-CV). Artmed,
orders and post-traumatic stress disorder, which were aimed at Porto Alegre, 2017.
improving the security and scientific basis of clinical diagnosis.28 6. Skre I, Onstad S, Torgersen S, Kringlen E. High interrater reliability for
Despite the positive aspects of this study (including the sample the Structured Clinical Interview for DSM-III–R Axis I (SCID-I). Acta
size and diversity, the inclusion of raters with different backgrounds Psychiatr. Scand. 1991; 84: 167–173.
7. Sharifi V, Assadi SM, Mohammadi MR et al. A Persian translation of
and experience levels, and the use of different psychometric method- the Structured Clinical Interview for Diagnostic and Statistical Manual of
ologies), some limitations should be mentioned. Notwithstanding our Mental Disorders, Fourth Edition: Psychometric properties. Compr. Psy-
efforts, we were unable to include enough patients for the specific chiatry 2009; 50: 86–91.
assessment of less prevalent diagnostic categories. Primary care 8. Del-Ben CM, Rodrigues CRC, Zuardi AW. Reliability of the Portuguese
patients were not included in the sample, which may have had an version of the Structured Clinical Interview for DSM-III-R (SCID) in a
impact on the reliability and validity indicators found due to the lower Brazilian sample of psychiatric outpatients. Braz. J. Med. Biol. Res.
symptom severity in these cases. Accordingly, the data of this study 1996; 29: 1675–1682.
cannot be generalized to the general population. The number of 9. Del-Ben CM, Vilela JAA, Crippa JAS, Hallak JEC, Labate CM,
patients followed up by psychiatric consultation liaison services was Zuardi AW. Reliability of the Structured Clinical Interview for DSM-IV:
Clinical version translated into Portuguese. Rev. Bras. Psiquiatr. 2001;
also small, and research with this particular group should be encour- 23: 156–159.
aged, especially because of the difficulties inherent to the assessment 10. Zanarini MC, Skodol AE, Bender D et al. The Collaborative Longitudi-
of primary and secondary psychiatric symptoms. nal Personality Disorders Study: Reliability of axis I and II diagnoses.
We conclude that the excellent reliability and high specificity of J. Pers. Disord. 2000; 14: 291–299.
the clinical version of the SCID-5 have been confirmed through differ- 11. Steiner JL, Tebes JK, Sledge WH, Walker ML. A comparison of the
ent methodologies, supporting the status of the instrument as the main Structured Clinical Interview for DSM-III-R and clinical diagnoses.
choice for diagnostic assessment in psychiatry, especially in the context J. Nerv. Ment. Dis. 1995; 183: 365–369.
of clinical research. The clinical validity of the latest version of the 12. Torrens M, Serrano D, Astals M, Perez-Dominguez G, Martin-Santos R.
SCID has also been demonstrated, which may stimulate its use in clini- Diagnosing comorbid psychiatric disorders in substance abusers: Validity
of the Spanish versions of the Psychiatric Research Interview for Sub-
cal settings. Two findings of this study deserve to be highlighted: (i) the stance and Mental Disorders and the Structured Clinical Interview for
adequacy of the instrument for telephone use, which may have an DSM-IV. Am. J. Psychiatry 2004; 161: 1231–1237.
important positive impact in the selection of participants for clinical and 13. Terwee CB, Bot SD, de Boer MR et al. Quality criteria were proposed
epidemiological studies in distant or remote areas; and (ii) the need for for measurement properties of health status questionnaires. J. Clin.
caution in the use of the instrument by professionals with little Epidemiol. 2007; 60: 34–42.
Psychiatry and Clinical Neurosciences 73: 754–760, 2019 759
14401819, 2019, 12, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/pcn.12931 by Cochrane Romania, Wiley Online Library on [17/06/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Psychiatry and
Clinical validity and reliability: SCID-5-CV PCN Clinical Neurosciences

14. Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the 22. Zanarini MC, Frankenburg FR. Attainment and maintenance of
process of cross-cultural adaptation of self-report measures. Spine (Phila reliabilityof axis I and II disorders over the course of a longitudinal
Pa 1976) 2000; 25: 3186–3191. study. Compr. Psychiatry 2001; 42: 369–374.
15. Spitzer RL. Psychiatric diagnosis: Are clinicians still necessary? Compr. 23. Cacciola JS, Alterman AI, Rutherford MJ, McKay JR, May DJ. Compa-
Psychiatry 1983; 24: 399–411. rability of telephone and in-person Structured Clinical Interview for
16. First MB, Williams JBW, Karg RS, Spitzer RL. Userʼs Guide for the DSM-III-R (SCID) diagnoses. Assessment 1999; 6: 235–242.
Structured Clinical Interview for DSM-5 Disorders, Clinician Version 24. Crippa JA, Osório FD, Del-Ben CM, Filho AS, Freitas MC,
(SCID-5-CV). Arlington, VA, American Psychiatric Publishing, 2016. Loureiro SR. Comparability between telephone and face-to-face Struc-
17. Fleiss JL. The measurement of interrater agreement. In: Statistical tured Clinical Interview for DSM-IV in assessing social anxiety disorder.
Methods for Rates and Proportions, 2nd edn. John Wiley & Sons, Perspect. Psychiatr. Care 2008; 44: 241–247.
New York, 1981; 212–236. 25. Lobbestael J, Leurgans M, Arntz A. Inter-rater reliability of the Struc-
18. Everitt BS. Measurement in medicine. In: Statistical Methods for Medi- tured Clinical Interview for DSM-IV Axis I disorders (SCID I) and Axis
cal Investigations. Oxford University Press, New York, 1989; 16–27. II disorders (SCID II). Clin. Psychol. Psychother. 2011; 18: 75–79.
19. Miller PR. Inpatient diagnostic assessments: 2. Interrater reliability and 26. Gonda X, Pompili M, Serafini G, Carvalho AF, Rihmer Z, Dome P. The
outcomes of structured vs. unstructured interviews. Psychiatry Res. role of cognitive dysfunction in the symptoms and remission from
2001; 105: 265–271. depression. Ann. Gen. Psychiatry 2015; 14: 27.
20. Flemenbaum A, Zimmermann RL. Inter and intra-rater reliability of the 27. Martin CS, Pollock NK, Bukstein OG, Lynch KG. Inter-rater reliability
Brief Psychiatric Rating Scale. Psychol. Rep. 1973; 36: 783–792. of the SCID alcohol and substance use disorders section among
21. Grove WM, Andreasen NC, McDonald-Scott P, Keller MB, adolescentes. Drug Alcohol Depend. 2000; 59: 173–176.
Shapiro RW. Reliability studies of psychiatric diagnosis. Theory and 28. Araújo AC, Lotufo-Neto F. The new North American classification of men-
practice. Arch. Gen. Psychiatry 1981; 38: 408–413. tal disorders – DSM-5. Rev. Bras. Ter. Comport Cogn. 2014; 16: 67–82.

760 Psychiatry and Clinical Neurosciences 73: 754–760, 2019

You might also like