Rater 2: Present Absent Present

Prepared by: Laserna, Mikee Rose T.
Test–retest reliability Index of the

Date Submission: Dec 14, 2018 consistency of interview scores across
some period of time.
RELIABILITY AND VALIDITY
OF INTERVIEWS Diagnostic Agreement Between
Two Raters
The reliability of an interview is typically
evaluated in terms of the level of Rater 2
agreement between at least two raters Present Absent
who evaluated the same patient or client. Present
30 5
interrater reliability or interjudge Rater 1
reliability The level of agreement a b
between at least two raters who have Absent
evaluated the same patient 5 60
independently. Agreement can refer
to consensus on symptoms assigned, c d
diagnoses assigned, and so on. N =100
kappa coefficient A statistical index of Overall Agreement = a+ d /N =.90

interrater reliability computed to Kappa=
determine how reliably raters judge the (a + d/ N)-(( a+ b)( a+ c)+( c+ d)( b+ d)) N2
presence or absence of a feature or 1-((a+ b)( a+ c)+( c+ d)( b+ d)) N2
diagnosis.
ad - bc
The validity of an interview concerns how ad – bc + N (b + c)/ 2
well the interview measures what it 1775
intends to measure. 2275
Reliability = .78
Standardized (structured) interviews with
clear scoring instructions will be more
reliable than unstructured interviews. The This presents a hypothetical data set
reason is that structured interviews from a study assessing the reliability of
reduce both information variance and alcoholism diagnoses derived from a
criterion variance. structured interview. This example
assesses interrater reliability (the level of
Information variance refers to the agreement between two raters), but the
variation in the questions that clinicians calculations would be the same if one
ask, the observations that are made wanted to assess test–retest reliability. In
during the interview, and the method of that case, the data for Rater 2 would be
integrating the information that is replaced by data for Testing 2 (Retest).
obtained. As can be seen, the two raters evaluated
the same 100 patients for the
Criterion variance refers to the variation presence/absence of an alcoholism
in scoring thresholds among clinicians. diagnosis, using a structured interview.
These two raters agreed in 90% of the
cases [(30 60)/100]. Agreement here the extent that its scores correlate with
refers to coming to the same measures of peer rejection and
conclusion—not just agreeing that the aggressive behavior.
diagnosis is present but also that
the diagnosis is absent. The table also Discriminant validity refers to the
presents the calculation for kappa—a interview’s ability not to correlate with
chance-corrected index of agreement measures that are not theoretically
that is typically lower than overall related to the construct being measured.
agreement. The reason for this lower For example, there is no theoretical
value is that raters will agree on the basis reason a specific phobia (e.g., of heights)
of chance alone in situations where the should be correlated with level of
prevalence rate for a diagnosis is intelligence. Therefore, a demonstration
relatively high or relatively low. In the that the two measures are not
example shown in Table 6-5, we see that significantly correlated would indicate the
the diagnosis of alcoholism specific phobia interview’s discriminant
is relatively infrequent. Therefore, a rater validity.
who always judged the disorder to be
absent would be correct (and likely to Construct validity is used to refer to all
agree with another rater) in many cases. of these aspects of validity. The extent to
The kappa coefficient takes into account which interview scores are correlated
such instances of agreement based on with other measures or behaviors in a
chance alone and adjusts the agreement logical and theoretically consistent way.
index (downward) accordingly. In This will involve a demonstration of both
general, a kappa value between .75 and convergent and discriminant validity.
1.00 is considered to reflect excellent
interrater agreement beyond chance. predictive validity a form of criterion-
related validity. The extent to which
Validity interview scores correlate with scores on
The validity of any type of psychological other relevant measures administered at
measure can take many forms. some point in the future.
Content validity refers to the measure’s

comprehensiveness in assessing the Suggestions for Improving Reliability
variable of interest. and Validity
For example, if an interview is designed 1.Whenever possible, use a structured
to measure depression, then we would interview. A wide variety of structured
expect it to contain multiple questions interviews exist for conducting intake-
assessing various emotional, cognitive, admission, case-history, mental status
and physiological aspects of depression. examination, crisis, and diagnostic
interviews.
Criterion-related validity refers to the
ability of a measure to predict (correlate 2. If a structured interview does not exist
with) scores on other relevant measures. for your purpose, consider developing
For example, an interview assessing one. Generate a standard set of
conduct disorder in childhood may be questions to be used, develop a set of
said to have criterion-related validity to guidelines to score respondents’
answers, administer this interview to a mismatch of interviewer and interviewee
representative sample of subjects, and in terms of race, age, and gender may
use the feedback from subjects and influence the course and outcome of the
interviewers to modify the interview. If interview. Thus, a number of influences
nothing else, completing this process will on the interview process have been
help you better understand what it is that identified.
you are attempting to assess and will If we never assess the validity of our
help you become a better interviewer. diagnoses, if we never check our
reliability against someone else, or if we
3. Whether you are using a structured never measure the efficacy of a specific
interview or not, certain interviewing skills interview technique, then we can easily
are essential: establishing rapport, being develop an ill-placed confidence that will
an effective communicator, being a good ultimately be hard on our patients. It may
listener, knowing when and how to ask be true, as some cynics argue, that 10
additional questions, and being a good studies, all purporting to show that “mm-
observer of nonverbal behavior. hmm” is no more effective than a nod of
4. Be aware of the patient’s motives and the head in expressing interviewer
expectancies with regard to the interview. interest, still fail to disprove that in one
For example, how strong are his or her specific or unique clinical interaction
needs for approval or social desirability? there may indeed be a difference.
Although no single interview study will
5. Be aware of your own expectations, offer an unambiguous solution to an
biases, and cultural values. Periodically, interview problem, these studies have a
have someone else assess the reliability cumulative effect. Research can offer
of the interviews you administer and suggestions about improving the validity
score. of our observations and techniques,
shatter some timeworn illusions, and
THE ART AND SCIENCE splinter a few clichés. By the sheer
OF INTERVIEWING cumulative weight of its controlled,
scientific approach, research can make
Becoming a skilled interviewer requires interviewers more sensitive and effective.
practice. Without the opportunity to A clinician steeped in both the art and the
conduct real interviews, to make science of interviewing will be more
mistakes, or to discuss techniques and effective (though hardly more
strategies with more experienced comfortable) than one who is conscious
interviewers, a simple awareness of of only one of these dual aspects of
scientific investigations of interviewing interviewing.
will not confer great skill.
A major one is to make clinicians more
humble regarding their “intuitive skills.”
Research suggests, for example, that
prior expectancies can color the
interviewer’s observations, that implicit
theories of personality and
psychopathology can influence the focus
of an interview, and that the match or

Rater 2: Present Absent Present

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rater 2: Present Absent Present

Uploaded by

Copyright:

Available Formats

Prepared by: Laserna, Mikee Rose T.

Test–retest reliability Index of the

kappa coefficient A statistical index of Overall Agreement = a+ d /N =.90

Content validity refers to the measure’s

You might also like